www.betterment.com Open in urlscan Pro
2606:2c40::c73c:67fe Public Scan

Back to summary

Submitted URL:
http://app.upsider.ai/engage/SiumrxyS0MVp99mHkUX7slKFUzqwy88A/click?signature=7a84746c2ab8b4f9f1ae37be82af8efb9ee441ba...
Effective URL:
https://www.betterment.com/engineering
Submission Tags: demotag1 demotag2 Search All
Submission: On January 17 via api (January 17th 2022, 2:13:07 pm UTC) from US — Scanned from DE

Form analysis
2 forms found in the DOM

<form class="js-popular-topic-filter" onsubmit="event.preventDefault()">
  <input type="text" class="search-input" name="term" autocomplete="off" aria-label="Search" placeholder="Search your topic">
  <button aria-label="Search" disabled=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" aria-label="Submit search" role="img">
      <title>Submit search</title>
      <g clip-path="url(#ui--search_svg__ui--search_svg__clip0)">
        <path d="M2 5.5a3.5 3.5 0 117 0 3.5 3.5 0 01-7 0zm8 3.2c.7-.9 1-2 1-3.2a5.5 5.5 0 10-2.3 4.6l5.8 5.9 1.4-1.4-5.8-5.9H10z"></path>
      </g>
      <defs>
        <clipPath data-iconid="ui--search_svg__ui--search_svg__clip0">
          <rect width="16" height="16"></rect>
        </clipPath>
      </defs>
    </svg> </button>
</form>

POST https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e

<form novalidate="" accept-charset="UTF-8" action="https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e" enctype="multipart/form-data"
  id="hsForm_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" method="POST"
  class="hs-form stacked hs-custom-form hs-form-private hsForm_6f7868a5-674b-46e4-8c4e-e7018716724e hs-form-6f7868a5-674b-46e4-8c4e-e7018716724e hs-form-6f7868a5-674b-46e4-8c4e-e7018716724e_bc482810-c0cd-471d-b4f3-322f8609890a"
  data-form-id="6f7868a5-674b-46e4-8c4e-e7018716724e" data-portal-id="5274572" target="target_iframe_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0">
  <div class="hs_email hs-email hs-fieldtype-text field hs-form-field" data-reactid=".hbspt-forms-0.1:$0"><label id="label-email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="" placeholder="Enter your Email"
      for="email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0.1:$0.0"><span data-reactid=".hbspt-forms-0.1:$0.0.0">Email</span><span class="hs-form-required" data-reactid=".hbspt-forms-0.1:$0.0.1">*</span></label>
    <legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$0.1"></legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$0.$email"><input id="email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="hs-input" type="email" name="email" required="" placeholder="john.dev@betterment.com" value="" autocomplete="email"
        data-reactid=".hbspt-forms-0.1:$0.$email.0" inputmode="email"></div>
  </div>
  <div class="hs_blog_engineering_blog_57138234906_subscription hs-blog_engineering_blog_57138234906_subscription hs-fieldtype-radio field hs-form-field" style="display:none;" data-reactid=".hbspt-forms-0.1:$1"><label
      id="label-blog_engineering_blog_57138234906_subscription-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="" placeholder="Enter your Notification Frequency"
      for="blog_engineering_blog_57138234906_subscription-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0.1:$1.0"><span data-reactid=".hbspt-forms-0.1:$1.0.0">Notification Frequency</span></label>
    <legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$1.1"></legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$1.$blog_engineering_blog_57138234906_subscription"><input name="blog_engineering_blog_57138234906_subscription" class="hs-input" type="hidden" value=""
        data-reactid=".hbspt-forms-0.1:$1.$blog_engineering_blog_57138234906_subscription.0"></div>
  </div><noscript data-reactid=".hbspt-forms-0.2"></noscript>
  <div class="hs_submit hs-submit" data-reactid=".hbspt-forms-0.5">
    <div class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.5.0"></div>
    <div class="actions" data-reactid=".hbspt-forms-0.5.1"><input type="submit" value="Subscribe" class="hs-button primary large" data-reactid=".hbspt-forms-0.5.1.0"></div>
  </div><noscript data-reactid=".hbspt-forms-0.6"></noscript><input name="hs_context" type="hidden"
    value="{&quot;rumScriptExecuteTime&quot;:5347.5,&quot;rumServiceResponseTime&quot;:5532.10000038147,&quot;rumFormRenderTime&quot;:2,&quot;rumTotalRenderTime&quot;:5846.299999237061,&quot;rumTotalRequestTime&quot;:183.10000038146973,&quot;embedAtTimestamp&quot;:&quot;1642428780097&quot;,&quot;formDefinitionUpdatedAt&quot;:&quot;1639779970100&quot;,&quot;pageUrl&quot;:&quot;https://www.betterment.com/engineering&quot;,&quot;pageTitle&quot;:&quot;Betterment Engineering Blog&quot;,&quot;source&quot;:&quot;FormsNext-static-5.432&quot;,&quot;sourceName&quot;:&quot;FormsNext&quot;,&quot;sourceVersion&quot;:&quot;5.432&quot;,&quot;sourceVersionMajor&quot;:&quot;5&quot;,&quot;sourceVersionMinor&quot;:&quot;432&quot;,&quot;timestamp&quot;:1642428780100,&quot;userAgent&quot;:&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36&quot;,&quot;originalEmbedContext&quot;:{&quot;portalId&quot;:&quot;5274572&quot;,&quot;formId&quot;:&quot;6f7868a5-674b-46e4-8c4e-e7018716724e&quot;,&quot;formInstanceId&quot;:&quot;8405&quot;,&quot;pageId&quot;:&quot;57138234906&quot;,&quot;region&quot;:&quot;na1&quot;,&quot;inlineMessage&quot;:true,&quot;rawInlineMessage&quot;:&quot;Thanks for submitting the form.&quot;,&quot;hsFormKey&quot;:&quot;614aaedb59683c891e5f0c577d21cbd0&quot;,&quot;target&quot;:&quot;#hs_form_target_form_493431243&quot;,&quot;contentType&quot;:&quot;listing-page&quot;,&quot;formsBaseUrl&quot;:&quot;/_hcms/forms/&quot;,&quot;formData&quot;:{&quot;cssClass&quot;:&quot;hs-form stacked hs-custom-form&quot;}},&quot;canonicalUrl&quot;:&quot;https://www.betterment.com/engineering&quot;,&quot;pageId&quot;:&quot;57138234906&quot;,&quot;formInstanceId&quot;:&quot;8405&quot;,&quot;renderedFieldsIds&quot;:[&quot;email&quot;],&quot;rawInlineMessage&quot;:&quot;Thanks for submitting the form.&quot;,&quot;hsFormKey&quot;:&quot;614aaedb59683c891e5f0c577d21cbd0&quot;,&quot;formTarget&quot;:&quot;#hs_form_target_form_493431243&quot;,&quot;correlationId&quot;:&quot;fee6b933-17db-4581-aee7-606ab71c22d7&quot;,&quot;contentType&quot;:&quot;listing-page&quot;,&quot;hutk&quot;:&quot;da3ac8a6abc857df5b696649f350a5a9&quot;,&quot;captchaStatus&quot;:&quot;NOT_APPLICABLE&quot;,&quot;isHostedOnHubspot&quot;:true}"
    data-reactid=".hbspt-forms-0.7"><iframe name="target_iframe_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" style="display:none;" data-reactid=".hbspt-forms-0.8"></iframe>
</form>

Text Content

Skip to main content
Betterment Logo
Open menu
* Careers
* Engineering
* Blogs
* Product and design blog
* Engineering blog

About us
Explore openings

ENGINEERING AT BETTERMENT

High quality code. Beautiful, practical design. Innovative problem solving.
Explore our engineering community and nerd out with us on all things tech.

RECENT ARTICLES

Filter articles
Submit search
* Building culture
* Data & Algorithms
* Designing experiences
* Inclusivity
* Operating software
* Security
* Solving problems
* Testing software
* All articles

No results found

* FINDING A MIDDLE GROUND BETWEEN SCREEN AND UI TESTING IN FLUTTER

Finding a Middle Ground Between Screen and UI Testing in Flutter We outline
the struggles we had testing our flutter app, our approaches to those
challenges, and the solutions we arrived at to solve those problems. Flutter
provides good solutions for both screen testing and UI testing, but what
about the middle-ground? With integration testing being a key level of the
testing pyramid, we needed to find a way to test how features in our app
interacted without the overhead involved with setting up UI tests. I’m going
to take you through our testing journey from a limited native automated
testing suite and heavy dependence on manual testing, to trying flutter’s
integration testing solutions, to ultimately deciding to build out our own
framework to increase confidence in the integration of our components. The
beginning of our Flutter testing journey Up until early 2020, our mobile app
was entirely native with separate android and iOS codebases. At the onset of
our migration to flutter, the major testing pain point was that a large
amount of manual regression testing was required in order to approve each
release. This manual testing was tedious and time consuming for engineers,
whose time is expensive. Alongside this manual testing pain, the automated
testing in the existing iOS and android codebases was inconsistent. iOS had a
larger unit testing suite than android did, but neither had integration
tests. iOS also had some tests that were flaky, causing CI builds to fail
unexpectedly. As we transitioned to flutter, we made unit/screen testing and
code testability a high priority, pushing for thorough coverage. That said,
we still relied heavily on the manual testing checklist to ensure the user
experience was as expected. This led us to pursue an integration testing
solution for flutter. In planning out integration testing, we had a few key
requirements for our integration testing suite: Easily runnable in CI upon
each commit An API that would be familiar to developers who are used to
writing flutter screen tests The ability to test the integration between
features within the system without needing to set up the entire app. The
Flutter integration testing landscape At the very beginning of our transition
to flutter, we started trying to write integration tests for our features
using flutter’s solution at the time: flutter_driver. The benefit we found in
flutter_driver was that we could run it in our production-like environment
against preset test users. This meant there was minimal test environment
setup. We ran into quite a few issues with flutter_driver though. Firstly,
there wasn’t a true entry point we could launch the app into because our app
is add-to-app, meaning that the flutter code is embedded into our iOS and
Android native applications rather than being a pure flutter app runnable
from a main.dart entry point. Second, `flutter_driver` is more about UI/E2E
testing rather than integration testing, meaning we’d need to run an instance
of the app on a device, navigate to a flow we wanted to test, and then test
the flow. Also, the flutter_driver API worked differently than the screen
testing API and was generally more difficult to use. Finally, flutter_driver
is not built to run a suite of tests or to run easily in CI. While possible
to run in CI, it would be incredibly costly to run on each commit since the
tests need to run on actual devices. These barriers led us to not pursue
`flutter_driver` tests as our solution. We then pivoted to investigating
Flutter’s newer replacement for flutter_driver : integation_test.
Unfortunately `integration_test` was very similar to flutter_driver, in that
it took the same UI/E2E approach, which meant that it had the same benefits
and drawbacks that flutter_driver had. The one additional advantage of
`integration_test` is that it uses the same API as screen tests do, so
writing tests with it feels more familiar for developers experienced with
writing screen tests. Regardless, given that it has the same problems that
flutter_driver does, we decided not to pursue `integration_test` as our
framework. Our custom solution to integration testing After trying flutter’s
solutions fruitlessly, we decided to build out a solution of our own. Before
we dive into how we built it, let’s revisit our requirements from above:
Easily runnable in CI upon each commit An API that would be familiar to
developers who are used to writing flutter screen tests The ability to test
the integration between features within the system without needing to set up
the entire app. Given those requirements, we took a step back to make a few
overarching design decisions. First, we needed to decide what pieces of code
we were interested in testing and which parts we were fine with stubbing.
Because we didn’t want to run the whole app with these tests in order to keep
the tests lightweight enough to run on each commit, we decided to stub out a
few problem areas. The first was our flutter/native boundary. With our app
being add-to-app and utilizing plugins, we didn’t want to have to run
anything native in our testing. We stubbed out the plugins by writing
lightweight wrappers around them then providing them to the app at a high
level that we could easily override with fakes for the purpose of integration
testing. The add-to-app boundary was similar. The second area we wanted to
stub out was the network. In order to do this, we built out a fake http
client that allows us to configure network responses for given requests. We
chose to fake the http client since it is the very edge of our network layer.
Faking it left as much of our code as possible under test. The next thing we
needed to decide was what user experiences we actually wanted to test with
our integration tests. Because integration tests are more expensive to write
and maintain than screen tests, we wanted to make sure the flows we were
testing were the most impactful. Knowing this, we decided to focus on “happy
paths” of flows. Happy paths are non-exceptional flows (flows not based on
bad user state or input). On top of being less impactful, these sad paths
usually give feedback on the same screen as the input, meaning those sad path
cases are usually better tested at the screen test level anyway. From here,
we set out to break down responsibilities of the components of our
integration tests. We wanted to have a test harness that we could use to set
up the app under test and the world that the app would run in, however we
knew this configuration code would be mildly complicated and something that
would be in flux. We also wanted a consistent framework by which we could
write these tests. In order to ensure changes to our test harness didn’t have
far reaching effects on the underlying framework, we decided to split out the
testing framework into an independent package that is completely agnostic to
how our app operates. This keeps the tests feeling familiar to normal screen
tests since the exposed interface is very similar to how widget tests are
written. The remaining test harness code was put in our normal codebase where
it can be iterated on freely. The other separation we wanted to make was
between the screen interactions and the tests themselves. For this we used a
modified version of Very Good Venture's robot testing pattern that would
allow us to reuse screen interactions across multiple tests while also making
our tests very readable from even a non-engineering perspective. In order to
fulfill two of our main requirements: being able to run as part of our normal
test suite in CI and having a familiar API, we knew we’d need to build our
framework on top of flutter’s existing screen test framework. Being able to
integrate (ba dum tss) these new tests into our existing test suite is
excellent because it meant that we would get quick feedback when code breaks
while developing. The last of our requirements was to be able to launch into
a specific feature rather than having to navigate through the whole app. We
were able to do this by having our app widget that handles dependency setup
take a child, then pumping the app widget wrapped around whatever feature
widget we wanted to test. With all these decisions made, we arrived at a
well-defined integration testing framework that isolated our concerns and
fulfilled our testing requirements. The Nitty Gritty Details In order to
describe how our integration tests work, let's start by describing an example
app that we may want to test. Let's imagine a simple social network app,
igrastam, that has an activity feed screen, a profile screen, a flow for
updating your profile information, and a flow for posting images. For this
example, we’ll say we’re most interested in testing the profile information
edit flows to start. First, how would we want to make a test harness for this
app? We know it has some sort of network interactions for fetching profile
info and posts as well as for posting images and editing a profile. For that,
our app has a thin wrapper around the http package called HttpClient. We may
also have some interactions with native code through a plugin such as
image_cropper. In order to have control over that plugin, this app has also
made a thin wrapper service for that. This leaves our app looking something
like this: Given that this is approximately what the app looks like, the test
harness needs to grant control of the HttpClient and the ImageCropperService.
We can do that by just passing our own fake versions into the app. Awesome,
now that we have an app and a harness we can use to test it, how are the
tests actually written? Let’s start out by exploring that robot testing
technique I mentioned earlier. Say that we want to start by testing the
profile edit flow. One path through this flow contains a screen for changing
your name and byline, then it bounces out to picking and cropping a profile
image, then allows you to choose a preset border to put on your profile
picture. For the screen for changing your name and byline, we can build a
robot to interact with the screen that looks something like this: By using
this pattern, we are able to reuse test code pertaining to this screen across
many tests. It also keeps the test file clean of WidgetTester interaction,
making the tests read more like a series of human actions rather than a
series of code instructions. Okay, we’ve got an app, a test harness, and
robots to interact with the screens. Let’s put it all together now into an
actual test. The tests end up looking incredibly simple once all of these
things are in place(which was the goal!) This test would go on to have a few
more steps detailing the interactions on the subsequent screens. With that,
we’ve been able to test the integration of all the components for a given
flow, all written in widget-test-like style without needing to build out the
entire app. This test could be added into our suite of other tests and run
with each commit. Back to the bigger picture Integration testing in flutter
can be daunting due to how heavy the `flutter_driver`/`integration_test`
solutions are with their UI testing strategies. We were able to overcome this
and begin filling out the middle level of our testing pyramid by adding
structure on top of the widget testing API that allows us to test full flows
from start to finish. When pursuing this ourselves, we found it valuable to
evaluate our testing strategy deficits, identify clear-cut boundaries around
what code we wanted to test, and establish standards around what flows
through the app should be tested. By going down the path of integration
testing, we’ve been able to increase confidence in everyday changes as well
as map out a plan for eliminating our manual test cases.
11 min read

* WHY (AND HOW) BETTERMENT IS USING JULIA

Why (And How) Betterment Is Using Julia Betterment is using Julia to solve
our own version of the “two-language problem." At Betterment, we’re
using Julia to power the projections and recommendations we provide to help
our customers achieve their financial goals. We’ve found it to be a great
solution to our own version of the “two-language problem”–the idea that the
language in which it is most convenient to write a program is not necessarily
the language in which it makes the most sense to run that program. We’re
excited to share the approach we took to incorporating it into our stack and
the challenges we encountered along the way. Working behind the scenes, the
members of our Quantitative Investing team bring our customers the
projections and recommendations they rely on for keeping their goals
on-track. These hard-working and talented individuals spend a large portion
of their time developing models, researching new investment ideas and
maintaining our research libraries. While they’re not engineers, their jobs
definitely involve a good amount of coding. Historically, the team has
written code mostly in a research environment, implementing proof-of-concept
models that are later translated into production code with help from the
engineering team. Recently, however, we’ve invested significant resources in
modernizing this research pipeline by converting our codebase from R to Julia
and we’re now able to ship updates to our quantitative models quicker, and
with less risk of errors being introduced in translation. Currently, Julia
powers all the projections shown inside our app, as well as a lot of the
advice we provide to our customers. The Julia library we built for this
purpose serves around 18 million requests per day, and very efficiently at
that. Examples of projections and recommendations at Betterment. Does not
reflect any actual portfolio and is not a guarantee of performance. Why
Julia? At QCon London 2019, Steve Klabnik gave a great talk on how the
developers of the Rust programming language view tradeoffs in programming
language design. The whole talk is worth a watch, but one idea that really
resonated with us is that programming language design—and programming
language choice—is a reflection of what the end-users of that language value
and not a reflection of the objective superiority of one language over
another. Julia is a newer language that looked like a perfect fit for the
investing team for a number of reasons: Speed. If you’ve heard one thing
about Julia, it’s probably about it’s blazingly fast performance. For us,
speed is important as we need to be able to provide real-time advice to our
customers by incorporating their most up-to-date financial scenario in our
projections and recommendations. It is also important in our research code
where the iterative nature of research means we often have to re-run
financial simulations or models multiple times with slight tweaks.
Dynamicism. While speed of execution is important, we also require a dynamic
language that allows us to test out new ideas and prototype rapidly. Julia
ticks the box for this requirement as well by using a just-in-time
compiler that accommodates both interactive and non-interactive workflows
well. Julia also has a very rich type system where researchers can build
prototypes without type declarations, and then later refactoring the code
where needed with type declarations for dispatch or clarity. In either case,
Julia is usually able to generate performant compiled code that we can run in
production. Relevant ecosystem. While the nascency of Julia as a language
means that the community and ecosystem is much smaller than those of other
languages, we found that the code and community oversamples on the type of
libraries that we care about. Julia has excellent support for technical
computing and mathematical modelling. Given these reasons, Julia is the
perfect language to serve as a solution to the “two-language problem”. This
concept is oft-quoted in Julian circles and is perfectly exemplified by the
previous workflow of our team: Investing Subject Matter Experts (SMEs) write
domain-specific code that’s solely meant to serve as research code, and that
code then has to be translated into some more performant language for use in
production. Julia solves this issue by making it very simple to take a piece
of research code and refactor it for production use. Our approach We decided
to build our Julia codebase inside a monorepo, with separate packages for
each conceptual project we might work on, such as interest rate models,
projections, social security amount calculations and so on. This works well
from a development perspective, but we soon faced the question of how best to
integrate this code with our production code, which is mostly developed in
Ruby. We identified two viable alternatives: Build a thin web service that
will accept HTTP requests, call the underlying Julia functions, and then
return a HTTP response. Compile the Julia code into a shared library, and
call it directly from Ruby using FFI. Option 1 is a very common pattern, and
actually quite similar to what had been the status quo at Betterment, as most
of the projections and recommendation code existed in a JavaScript service.
It may be surprising then to learn that we actually went with Option 2. We
were deeply attracted to the idea of being able to fully integration-test our
projections and recommendations working within our actual app (i.e. without
the complication of a service boundary). Additionally, we wanted an
integration that we could spin-up quickly and with low ongoing cost; there’s
some fixed cost to getting a FFI-embed working right—but once you do, it’s an
exceedingly low cost integration to maintain. Fully-fledged services require
infrastructure to run and are (ideally) supported by a full team of
engineers. That said, we recognize the attractive properties of the more
well-trodden Option 1 path and believe it could be the right solution in a
lot of scenarios (and may become the right solution for us as our usage of
Julia continues to evolve). Implementation Given how new Julia is, there was
minimal literature on true interoperability with other programming languages
(particularly high-level languages–Ruby, Python, etc). But we saw that the
right building blocks existed to do what we wanted and proceeded with the
confidence that it was theoretically possible. As mentioned earlier, Julia is
a just-in-time compiled language, but it’s possible to compile Julia code
ahead-of-time using PackageCompiler.jl. We built an additional package into
our monorepo whose sole purpose was to expose an API for our Ruby
application, as well as compile that exposed code into a C shared library.
The code in this package is the glue between our pure Julia functions and the
lower level library interface—it’s responsible for defining the functions
that will be exported by the shared library and doing any necessary
conversions on input/output. As an example, consider the following simple
Julia function which sorts an array of numbers using the insertion
sort algorithm: In order to be able to expose this in a shared library, we
would wrap it like this: Here we’ve simplified memory management by requiring
the caller to allocate memory for the result, and implemented primitive
exception handling (see Challenges & Pitfalls below). On the Ruby end, we
built a gem which wraps our Julia library and attaches to it using Ruby-FFI.
The gem includes a tiny Julia project with the API library as it’s only
dependency. Upon gem installation, we fetch the Julia source and compile it
as a native extension. Attaching to our example function with Ruby-FFI is
straightforward: From here, we could begin using our function, but it
wouldn’t be entirely pleasant to work with–converting an input array to a
pointer and processing the result would require some tedious boilerplate.
Luckily, we can use Ruby’s powerful metaprogramming abilities to abstract all
that away–creating a declarative way to wrap an arbitrary Julia function
which results in a familiar and easy-to-use interface for Ruby developers. In
practice, that might look something like this: Resulting in a function for
which the fact that the underlying implementation is in Julia has been
completely abstracted away: Challenges & Pitfalls Debugging an FFI
integration can be challenging; any misconfiguration is likely to result in
the dreaded segmentation fault–the cause of which can be difficult to hunt
down. Here are a few notes for practitioners about some nuanced issues we ran
into, that will hopefully save you some headaches down the line: The Julia
runtime has to be initialized before calling the shared library. When loading
the dynamic library (whether through Ruby-FFI or some other invocation of
`dlopen`), make sure to pass the flags `RTLD_LAZY` and `RTLD_GLOBAL`
(`ffi_lib_flags :lazy, :global` in Ruby-FFI). If embedding your Julia library
into a multi-threaded application, you’ll need additional tooling to only
initialize and make calls into the Julia library from a single thread, as
multiple calls to `jl_init` will error. We use a multi-threaded web server
for our production application, and so when we make a call into the Julia
shared library, we push that call onto a queue where it gets picked up and
performed by a single executor thread which then communicates the result back
to the calling thread using a promise object. Memory management–if you’ll be
passing anything other than primitive types back from Julia to Ruby (e.g.
pointers to more complex objects), you’ll need to take care to ensure the
memory containing the data you’re passing back isn’t cleared by the Julia
garbage collector prior to being read on the Ruby side. Different approaches
are possible. Perhaps the simplest is to have the Ruby side allocate the
memory into which the Julia function should write it’s result (and pass the
Julia function a pointer to that memory). Alternatively, if you want to
actually pass complex objects out, you’ll have to ensure Julia holds a
reference to the objects beyond the life of the function, in order to keep
them from being garbage collected. And then you’ll probably want to expose a
way for Ruby to instruct Julia to clean up that reference (i.e. free the
memory) when it’s done with it (Ruby-FFI has good support for triggering a
callback when an object goes out-of-scope on the Ruby side). Exception
handling–conveying unhandled exceptions across the FFI boundary is generally
not possible. This means any unhandled exception occurring in your Julia code
will result in a segmentation fault. To avoid this, you’ll probably want to
implement catch-all exception handling in your shared library exposed
functions that will catch any exceptions that occur and return some context
about the error to the caller (minimally, a boolean indicator of
success/failure). Tooling To simplify development, we use a lot of tooling
and infrastructure developed both in-house and by the Julia community. Since
one of the draws of using Julia in the first place is the performance of the
code, we make sure to benchmark our code during every pull request for
potential performance regressions using the BenchmarkTools.jl package. To
facilitate versioning and sharing of our Julia packages internally (e.g. to
share a version of the Ruby-API package with the Ruby gem which wraps it) we
also maintain a private package registry. The registry is a separate Github
repository, and we use tooling from the Registrator.jl package to register
new versions. To process registration events, we maintain a registry server
on an EC2 instance provisioned through Terraform, so updates to the
configuration are as easy as running a single `terraform apply` command. Once
a new registration event is received, the registry server opens a pull
request to the Julia registry. There, we have built in automated testing that
resolves the version of the package that is being tested, looks up any
reverse dependencies of that package, resolves the compatibility bounds of
those packages to see if the newly registered version could lead to a
breaking change, and if so, runs the full test suites of the reverse
dependencies. By doing this, we can ensure that when we release a patch or
minor version of one of our packages, we can ensure that it won’t break any
packages that depend on it at registration time. If it would, the user is
instead forced to either fix the changes that lead to a downstream breakage,
or to modify the registration to be a major version increase. Takeaways
Though our venture into the Julia world is still relatively young compared to
most of the other code at Betterment, we have found Julia to be a perfect fit
in solving our two-language problem within the Investing team. Getting the
infrastructure into a production-ready format took a bit of tweaking, but we
are now starting to realize a lot of the benefits we hoped for when setting
out on this journey, including faster development of production ready models,
and a clear separation of responsibilities between the SMEs on the Investing
team who are best suited for designing and specifying the models, and the
engineering team who have the knowledge on how to scale that code into a
production-grade library. The switch to Julia has allowed us not only to
optimize and speed up our code by multiple orders of magnitude, but also has
given us the environment and ecosystem to explore ideas that would simply not
be possible in our previous implementations.
11 min read

* INTRODUCING “DELAYED”: RESILIENT BACKGROUND JOBS ON RAILS

Introducing “Delayed”: Resilient Background Jobs on Rails In the past 24
hours, a Ruby on Rails application at Betterment performed somewhere on the
order of 10 million asynchronous tasks. While many of these tasks merely sent
a transactional email, or fired off an iOS or Android push notification,
plenty involved the actual movement of money—deposits, withdrawals,
transfers, rollovers, you name it—while others kept Betterment’s information
systems up-to-date—syncing customers’ linked account information, logging
events to downstream data consumers, the list goes on. What all of these
tasks had in common (aside from being, well, really important to our
business) is that they were executed via a database-backed job-execution
framework called Delayed, a newly-open-sourced library that we’re excited to
announce… right now, as part of this blog post! And, yes, you heard that
right. We run millions of these so-called “background jobs” daily using a
SQL-backed queue—not Redis, or RabbitMQ, or Kafka, or, um, you get the
point—and we’ve very intentionally made this choice, for reasons that will
soon be explained! But first, let’s back up a little and answer a few basic
questions. Why Background Jobs? In other words, what purpose do these
background jobs serve? And how does running millions of them per day help us?
Well, when building web applications, we (as web application developers)
strive to build pages that respond quickly and reliably to web requests. One
might say that this is the primary goal of any webapp—to provide a set of
HTTP endpoints that reliably handle all the success and failure cases within
a specified amount of time, and that don’t topple over under high-traffic
conditions. This is made possible, at least in part, by the ability to
perform units of work asynchronously. In our case, via background jobs. At
Betterment, we rely on said jobs extensively, to limit the amount of work
performed during the “critical path” of each web request, and also to perform
scheduled tasks at regular intervals. Our reliance on background jobs even
allows us to guarantee the eventual consistency of our distributed systems,
but more on that later. First, let’s take a look at the underlying framework
we use for enqueuing and executing said jobs. Frameworks Galore! And, boy
howdy, are there plenty of available frameworks for doing this kind of thing!
Ruby on Rails developers have the choice of resque, sidekiq, que, good_job,
delayed_job, and now... delayed, Betterment’s own flavor of job queue!
Thankfully, Rails provides an abstraction layer on top of these, in the form
of the Active Job framework. This, in theory, means that all jobs can be
written in more or less the same way, regardless of the job-execution
backend. Write some jobs, pick a queue backend with a few desirable features
(priorities, queues, etc), run some job worker processes, and we’re off to
the races! Sounds simple enough! Unfortunately, if it were so simple we
wouldn’t be here, several paragraphs into a blog post on the topic. In
practice, deciding on a job queue is more complicated than that. Quite a bit
more complicated, because each backend framework provides its own set of
trade-offs and guarantees, many of which will have far-reaching implications
in our codebase. So we’ll need to consider carefully! How To Choose A Job
Framework The delayed rubygem is a fork of both delayed_job and delayed_job
activerecord, with several targeted changes and additions, including numerous
performance & scalability optimizations that we’ll cover towards the end of
this post. But first, in order to explain how Betterment arrived where we
did, we must explain what it is that we need our job queue to be capable of,
starting with the jobs themselves. You see, a background job essentially
represents a tiny contract. Each consists of some action being taken for / by
/ on behalf of / in the interest of one or more of our customers, and that
must be completed within an appropriate amount of time. Betterment’s
engineers decided, therefore, that it was critical to our mission that we be
capable of handling each and every contract as reliably as possible. In other
words, every job we attempt to enqueue must, eventually, reach some form of
resolution. Of course, job “resolution” doesn’t necessarily mean success.
Plenty of jobs may complete in failure, or simply fail to complete, and may
require some form of automated or manual intervention. But the point is that
jobs are never simply dropped, or silently deleted, or lost to the
cyber-aether, at any point, from the moment we enqueue them to their eventual
resolution. This general property—the ability to enqueue jobs safely and
ensure their eventual resolution—is the core feature that we have optimized
for. Let’s call it resilience. Optimizing For Resilience Now, you might be
thinking, shouldn’t all of these ActiveJob backends be, at the very least,
safe to use? Isn’t “resilience” a basic feature of every backend, except
maybe the test/development ones? And, yeah, it’s a fair question. As the
author of this post, my tactful attempt at an answer is that, well, not all
queue backends optimize for the specific kind of end-to-end resilience that
we look for. Namely, the guarantee of at-least-once execution. Granted,
having “exactly-once” semantics would be preferable, but if we cannot be sure
that our jobs run at least once, then we must ask ourselves: how would we
know if something didn’t run at all? What kind of monitoring would be
necessary to detect such a failure, across all the features of our app, and
all the types of jobs it might try to run? These questions open up an
entirely different can of worms, one that we would prefer remained firmly
sealed. Remember, jobs are contracts. A web request was made, code was
executed, and by enqueuing a job, we said we'd eventually do something. Not
doing it would be... bad. Not even knowing we didn't do it... very bad. So,
at the very least, we need the guarantee of at-least-once execution. Building
on at-least-once guarantees If we know for sure that we’ll fully execute all
jobs at least once, then we can write our jobs in such a way that makes the
at-least-once approach reliable and resilient to failure. Specifically, we’ll
want to make our jobs idempotent—basically, safely retryable, or
resumable—and that is on us as application developers to ensure on a
case-by-case basis. Once we solve this very solvable idempotency problem,
then we’re on track for the same net result as an “exactly-once” approach,
even if it takes a couple extra attempts to get there. Furthermore, this
combination of at-least-once execution and idempotency can then be used in a
distributed systems context, to ensure the eventual consistency of changes
across multiple apps and databases. Whenever a change occurs in one system,
we can enqueue idempotent jobs notifying the other systems, and retry them
until they succeed, or until we are left with stuck jobs that must be
addressed operationally. We still concern ourselves with other distributed
systems pitfalls like event ordering, but we don’t have to worry about
messages or events disappearing without a trace due to infrastructure blips.
So, suffice it to say, at-least-once semantics are crucial in more ways than
one, and not all ActiveJob backends provide them. Redis-based queues, for
example, can only be as durable (the “D” in “ACID”) as the underlying
datastore, and most Redis deployments intentionally trade-off some durability
for speed and availability. Plus, even when running in the most durable mode,
Redis-based ActiveJob backends tend to dequeue jobs before they are executed,
meaning that if a worker process crashes at the wrong moment, or is
terminated during a code deployment, the job is lost. These frameworks have
recently begun to move away from this LPOP-based approach, in favor of using
RPOPLPUSH (to atomically move jobs to a queue that can then be monitored for
orphaned jobs), but outside of Sidekiq Pro, this strategy doesn’t yet seem to
be broadly available. And these job execution guarantees aren’t the only area
where a background queue might fail to be resilient. Another big resilience
failure happens far earlier, during the enqueue step. Enqueues and
Transactions See, there’s a major “gotcha” that may not be obvious from the
list of ActiveJob backends. Specifically, it’s that some queues rely on an
app’s primary database connection—they are “database-backed,” against the
app’s own database—whereas others rely on a separate datastore, like Redis.
And therein lies the rub, because whether or not our job queue is colocated
with our application data will greatly inform the way that we write any
job-adjacent code. More precisely, when we make use of database transactions
(which, when we use ActiveRecord, we assuredly do whether we realize it or
not), a database-backed queue will ensure that enqueued jobs will either
commit or roll back with the rest of our ActiveRecord-based changes. This is
extremely convenient, to say the least, since most jobs are enqueued as part
of operations that persist other changes to our database, and we can in turn
rely on the all-or-nothing nature of transactions to ensure that neither the
job nor the data mutation is persisted without the other. Meanwhile, if our
queue existed in a separate datastore, our enqueues will be completely
unaware of the transaction, and we’d run the risk of enqueuing a job that
acts on data that was never committed, or (even worse) we’d fail to enqueue a
job even when the rest of the transactional data was committed. This would
fundamentally undermine our at-least-once execution guarantees! We already
use ACID-compliant datastores to solve these precise kinds of data
persistence issues, so with the exception of really, really high volume
operations (where a lot of noise and data loss can—or must—be tolerated),
there’s really no reason not to enqueue jobs co-transactionally with other
data changes. And this is precisely why, at Betterment, we start each
application off with a database-backed queue, co-located with the rest of the
app’s data, with the guarantee of at-least-once job execution. By the way,
this is a topic I could talk about endlessly, so I’ll leave it there for now.
If you’re interested in hearing me say even more about resilient data
persistence and job execution, feel free to check out Can I break this?, a
talk I gave at RailsConf 2021! But in addition to the resiliency guarantees
outlined above, we’ve also given a lot of attention to the operability and
the scalability of our queue. Let’s cover operability first. Maintaining a
Queue in the Long Run Operating a queue means being able to respond to errors
and recover from failures, and also being generally able to tell when things
are falling behind. (Essentially, it means keeping our on-call engineers
happy.) We do this in two ways: with dashboards, and with alerts. Our
dashboards come in a few parts. Firstly, we host a private fork of
delayedjobweb, a web UI that allows us to see the state of our queues in real
time and drill down to specific jobs. We’ve extended the gem with information
on “erroring” jobs (jobs that are in the process of retrying but have not yet
permanently failed), as well as the ability to filter by additional fields
such as job name, priority, and the owning team (which we store in an
additional column). We also maintain two other dashboards in our cloud
monitoring service, DataDog. These are powered by instrumentation and
continuous monitoring features that we have added directly to the delayed gem
itself. When jobs run, they emit ActiveSupport::Notification events that we
subscribe to and then forward along to a StatsD emitter, typically as
“distribution” or “increment” metrics. Additionally, we’ve included a
continuous monitoring process that runs aggregate queries, tagged and grouped
by queue and priority, and that emits similar notifications that become
“gauge” metrics. Once all of these metrics make it to DataDog, we’re able to
display a comprehensive timeboard that graphs things like average job
runtime, throughput, time spent waiting in the queue, error rates, pickup
query performance, and even some top 10 lists of slowest and most erroring
jobs. On the alerting side, we have DataDog monitors in place for overall
queue statistics, like max age SLA violations, so that we can alert and page
ourselves when queues aren’t working off jobs quickly enough. Our SLAs are
actually defined on a per-priority basis, and we’ve added a feature to the
delayed gem called “named priorities” that allows us to define
priority-specific configs. These represent integer ranges (entirely
orthogonal to queues), and default to “interactive” (0-9), “user visible”
(10-19), “eventual” (20-29), and “reporting” (30+), with default alerting
thresholds focused on retry attempts and runtime. There are plenty of other
features that we’ve built that haven’t made it into the delayed gem quite
yet. These include the ability for apps to share a job queue but run separate
workers (i.e. multi-tenancy), team-level job ownership annotations, resumable
bulk orchestration and batch enqueuing of millions of jobs at once,
forward-scheduled job throttling, and also the ability to encrypt the inputs
to jobs so that they aren’t visible in plaintext in the database. Any of
these might be the topic for a future post, and might someday make their way
upstream into a public release! But Does It Scale? As we've grown, we've had
to push at the limits of what a database-backed queue can accomplish. We’ve
baked several improvements into the delayed gem, including a highly
optimized, SKIP LOCKED-based pickup query, multithreaded workers, and a novel
“max percent of max age” metric that we use to automatically scale our worker
pool up to ~3x its baseline size when queues need additional concurrency.
Eventually, we could explore ways of feeding jobs through to higher
performance queues downstream, far away from the database-backed workers. We
already do something like this for some jobs with our journaled gem, which
uses AWS Kinesis to funnel event payloads out to our data warehouse (while at
the same time benefiting from the same at-least-once delivery guarantees as
our other jobs!). Perhaps we’d want to generalize the approach even further.
But the reality of even a fully "scaled up" queue solution is that, if it is
doing anything particularly interesting, it is likely to be database-bound. A
Redis-based queue will still introduce DB pressure if its jobs execute
anything involving ActiveRecord models, and solutions must exist to throttle
or rate limit these jobs. So even if your queue lives in an entirely separate
datastore, it can be effectively coupled to your DB's IOPS and CPU
limitations. So does the delayed approach scale? To answer that question,
I’ll leave you with one last takeaway. A nice property that we’ve observed at
Betterment, and that might apply to you as well, is that the number of jobs
tends to scale proportionally with the number of customers and accounts. This
means that when we naturally hit vertical scaling limits, we could, for
example, shard or partition our job table alongside our users table. Then,
instead of operating one giant queue, we’ll have broken things down to a
number of smaller queues, each with their own worker pools, emitting metrics
that can be aggregated with almost the same observability story we have
today. But we’re getting into pretty uncharted territory here, and, as
always, your mileage may vary! Try it out! If you’ve read this far, we’d
encourage you to take the leap and test out the delayed gem for yourself!
Again, it combines both DelayedJob and its ActiveRecord backend, and should
be more or less compatible with Rails apps that already use ActiveJob or
DelayedJob. Of course, it may require a bit of tuning on your part, and we’d
love to hear how it goes! We’ve also built an equivalent library in Java,
which may also see a public release at some point. (To any Java devs reading
this: let us know if that interests you!) Already tried it out? Any features
you’d like to see added? Let us know what you think!
14 min read

* FOCUSING ON WHAT MATTERS: USING SLOS TO PURSUE USER HAPPINESS

Focusing on What Matters: Using SLOs to Pursue User Happiness Proper
reliability is the greatest operational requirement for any service. If the
service doesn’t work as intended, no user (or engineer) will be happy. This
is where SLOs come in. The umbrella term “observability” covers all manner of
subjects, from basic telemetry to logging, to making claims about longer-term
performance in the shape of service level objectives (SLOs) and occasionally
service level agreements (SLAs). Here I’d like to discuss some philosophical
approaches to defining SLOs, explain how they help with prioritization, and
outline the tooling currently available to Betterment Engineers to make this
process a little easier. What is an SLO? At a high level, a service level
objective is a way of measuring the performance of, correctness of, validity
of, or efficacy of some component of a service over time by comparing the
functionality of specific service level indicators (metrics of some kind)
against a target goal. For example, 99.9% of requests complete with a 2xx,
3xx or 4xx HTTP code within 2000ms over a 30 day period The service level
indicator (SLI) in this example is a request completing with a status code of
2xx, 3xx or 4xx and with a response time of at most 2000ms. The SLO is the
target percentage, 99.9%. We reach our SLO goal if, during a 30 day period,
99.9% of all requests completed with one of those status codes and within
that range of latency. If our service didn’t succeed at that goal, the
violation overflow — called an “error budget” — shows us by how much we fell
short. With a goal of 99.9%, we have 40 minutes and 19 seconds of downtime
available to us every 28 days. Check out more error budget math here. If we
fail to meet our goals, it’s worthwhile to step back and understand why. Was
the error budget consumed by real failures? Did we notice a number of false
positives? Maybe we need to reevaluate the metrics we’re collecting, or
perhaps we’re okay with setting a lower target goal because there are other
targets that will be more important to our customers. It’s all about the
customer This is where the philosophy of defining and keeping track of SLOs
comes into play. It starts with our users - Betterment users - and trying to
provide them with a certain quality of service. Any error budget we set
should account for our fiduciary responsibilities, and should guarantee that
we do not cause an irresponsible impact to our customers. We also assume that
there is a baseline degree of software quality baked-in, so error budgets
should help us prioritize positive impact opportunities that go beyond these
baselines. Sometimes there are a few layers of indirection between a service
and a Betterment customer, and it takes a bit of creativity to understand
what aspects of the service directly affects them. For example, an engineer
on a backend or data-engineering team provides services that a user-facing
component consumes indirectly. Or perhaps the users for a service are
Betterment engineers, and it’s really unclear how that work affects the
people who use our company’s products. It isn’t that much of a stretch to
claim that an engineer’s level of happiness does have some effect on the
level of service they’re capable of providing a Betterment customer! Let’s
say we’ve defined some SLOs and notice they are falling behind over time. We
might take a look at the metrics we’re using (the SLIs), the failures that
chipped away at our target goal, and, if necessary, re-evaluate the relevancy
of what we’re measuring. Do error rates for this particular endpoint directly
reflect an experience of a user in some way - be it a customer, a
customer-facing API, or a Betterment engineer? Have we violated our error
budget every month for the past three months? Has there been an increase in
Customer Service requests to resolve problems related to this specific aspect
of our service? Perhaps it is time to dedicate a sprint or two to
understanding what’s causing degradation of service. Or perhaps we notice
that what we’re measuring is becoming increasingly irrelevant to a customer
experience, and we can get rid of the SLO entirely! Benefits of measuring the
right things, and staying on target The goal of an SLO based approach to
engineering is to provide data points with which to have a reasonable
conversation about priorities (a point that Alex Hidalgo drives home in his
book Implementing Service Level Objectives). In the case of services not
performing well over time, the conversation might be “focus on improving
reliability for service XYZ.” But what happens if our users are super happy,
our SLOs are exceptionally well-defined and well-achieved, and we’re ahead of
our roadmap? Do we try to get that extra 9 in our target - or do we use the
time to take some creative risks with the product (feature-flagged, of
course)? Sometimes it’s not in our best interest to be too focused on
performance, and we can instead “use up our error budget” by rolling out some
new A/B test, or upgrading a library we’ve been putting off for a while, or
testing out a new language in a user-facing component that we might not
otherwise have had the chance to explore. The tools to get us there Let’s
dive into some tooling that the SRE team at Betterment has built to help
Betterment engineers easily start to measure things. Collecting the SLIs and
Creating the SLOs The SRE team has a web-app and CLI called coach that we use
to manage continuous integration (CI) and continuous delivery (CD), among
other things. We’ve talked about Coach in the past here and here. At a high
level, the Coach CLI generates a lot of yaml files that are used in all sorts
of places to help manage operational complexity and cloud resources for
consumer-facing web-apps. In the case of service level indicators (basically
metrics collection), the Coach CLI provides commands that generate yaml files
to be stored in GitHub alongside application code. At deploy time, the Coach
web-app consumes these files and idempotently create Datadog monitors, which
can be used as SLIs (service level indicators) to inform SLOs, or as
standalone alerts that need immediate triage every time they're triggered. In
addition to Coach explicitly providing a config-driven interface for
monitors, we’ve also written a couple handy runtime specific methods that
result in automatic instrumentation for Rails or Java endpoints. I’ll discuss
these more below. We also manage a separate repository for SLO definitions.
We left this outside of application code so that teams can modify SLO target
goals and details without having to redeploy the application itself. It also
made visibility easier in terms of sharing and communicating different team’s
SLO definitions across the org. Monitors in code Engineers can choose either
StatsD or Micrometer to measure complicated experiences with custom metrics,
and there’s various approaches to turning those metrics directly into
monitors within Datadog. We use Coach CLI driven yaml files to support metric
or APM monitor types directly in the code base. Those are stored in a file
named .coach/datadog_monitors.yml and look like this: monitors: - type:
metric metric: "coach.ci_notification_sent.completed.95percentile" name:
"coach.ci_notification_sent.completed.95percentile SLO" aggregate: max owner:
sre alert_time_aggr: on_average alert_period: last_5m alert_comparison: above
alert_threshold: 5500 - type: apm name: "Pull Requests API endpoint violating
SLO" resource_name: api::v1::pullrequestscontroller_show max_response_time:
900ms service_name: coach page: false slack: false It wasn’t simple to make
this abstraction intuitive between a Datadog monitor configuration and a user
interface. But this kind of explicit, attribute-heavy approach helped us get
this tooling off the ground while we developed (and continue to develop)
in-code annotation approaches. The APM monitor type was simple enough to turn
into both a Java annotation and a tiny domain specific language (DSL) for
Rails controllers, giving us nice symmetry across our platforms. . This owner
method for Rails apps results in all logs, error reports, and metrics being
tagged with the team’s name, and at deploy time it's aggregated by a Coach
CLI command and turned into latency monitors with reasonable defaults for
optional parameters; essentially doing the same thing as our config-driven
approach but from within the code itself class DeploysController <
ApplicationController owner "sre", max_response_time: "10000ms", only:
[:index], slack: false end For Java apps we have a similar interface (with
reasonable defaults as well) in a tidy little annotation. @Sla
@Retention(RetentionPolicy.RUNTIME) @Target(ElementType.METHOD) public
@interface Sla { @AliasFor(annotation = Sla.class) long amount() default
25_000; @AliasFor(annotation = Sla.class) ChronoUnit unit() default
ChronoUnit.MILLIS; @AliasFor(annotation = Sla.class) String service() default
"custody-web"; @AliasFor(annotation = Sla.class) String slackChannelName()
default "java-team-alerts"; @AliasFor(annotation = Sla.class) boolean
shouldPage() default false; @AliasFor(annotation = Sla.class) String owner()
default "java-team"; } Then usage is just as simple as adding the annotation
to the controller: @WebController("/api/stuff/v1/service_we_care_about")
public class ServiceWeCareAboutController { @PostMapping("/search")
@CustodySla(amount = 500) public SearchResponse search(@RequestBody @Valid
SearchRequest request) {...} } At deploy time, these annotations are scanned
and converted into monitors along with the config-driven definitions, just
like our Ruby implementation. SLOs in code Now that we have our metrics
flowing, our engineers can define SLOs. If an engineer has a monitor tied to
metrics or APM, then they just need to plug in the monitor ID directly into
our SLO yaml interface. - last_updated_date: "2021-02-18" approval_date:
"2021-03-02" next_revisit_date: "2021-03-15" category: latency type: monitor
description: This SLO covers latency for our CI notifications system -
whether it's the github context updates on your PRs or the slack
notifications you receive. tags: - team:sre thresholds: - target: 99.5
timeframe: 30d warning_target: 99.99 monitor_ids: - 30842606 The interface
supports metrics directly as well (mirroring Datadog’s SLO types) so an
engineer can reference any metric directly in their SLO definition, as seen
here: # availability - last_updated_date: "2021-02-16" approval_date:
"2021-03-02" next_revisit_date: "2021-03-15" category: availability tags: -
team:sre thresholds: - target: 99.9 timeframe: 30d warning_target: 99.99
type: metric description: 99.9% of manual deploys will complete successfully
over a 30day period. query: # (total_events - bad_events) over total_events
== good_events/total_events numerator:
sum:trace.rack.request.hits{service:coach,env:production,resource_name:deployscontroller_create}.as_count()-sum:trace.rack.request.errors{service:coach,env:production,resource_name:deployscontroller_create}.as_count()
denominator:
sum:trace.rack.request.hits{service:coach,resource_name:deployscontroller_create}.as_count()
We love having these SLOs defined in GitHub because we can track who's
changing them, how they're changing, and get review from peers. It's not
quite the interactive experience of the Datadog UI, but it's fairly
straightforward to fiddle in the UI and then extract the resulting
configuration and add it to our config file. Notifications When we merge our
SLO templates into this repository, Coach will manage creating SLO resources
in Datadog and accompanying SLO alerts (that ping slack channels of our
choice) if and when our SLOs violate their target goals. This is the slightly
nicer part of SLOs versus simple monitors - we aren’t going to be pinged for
every latency failure or error rate spike. We’ll only be notified if, over 7
days or 30 days or even longer, they exceed the target goal we’ve defined for
our service. We can also set a “warning threshold” if we want to be notified
earlier when we’re using up our error budget. Fewer alerts means the alerts
should be something to take note of, and possibly take action on. This is a
great way to get a good signal while reducing unnecessary noise. If, for
example, our user research says we should aim for 99.5% uptime, that’s 3h
21m 36s of downtime available per 28 days. That’s a lot of time we can
reasonably not react to failures. If we aren’t alerting on those 3 hours of
errors, and instead just once if we exceed that limit, then we can direct our
attention toward new product features, platform improvements, or learning and
development. The last part of defining our SLOs is including a date when we
plan to revisit that SLO specification. Coach will send us a message when
that date rolls around to encourage us to take a deeper look at our
measurements and possibly reevaluate our goals around measuring this part of
our service. What if SLOs don’t make sense yet? It’s definitely the case that
a team might not be at the level of operational maturity where defining
product or user-specific service level objectives is in the cards. Maybe
their on-call is really busy, maybe there are a lot of manual interventions
needed to keep their services running, maybe they’re still putting out fires
and building out their team’s systems. Whatever the case may be, this
shouldn’t deter them from collecting data. They can define what is called an
“aspirational” SLO - basically an SLO for an important component in their
system - to start collecting data over time. They don’t need to define an
error budget policy, and they don’t need to take action when they fail their
aspirational SLO. Just keep an eye on it. Another option is to start tracking
the level of operational complexity for their systems. Perhaps they can set
goals around "Bug Tracker Inbox Zero" or "Failed Background Jobs Zero" within
a certain time frame, a week or a month for example. Or they can define some
SLOs around types of on-call tasks that their team tackles each week. These
aren’t necessarily true-to-form SLOs but engineers can use this framework and
tooling provided to collect data around how their systems are operating and
have conversations on prioritization based on what they discover, beginning
to build a culture of observability and accountability Conclusion Betterment
is at a point in its growth where prioritization has become more difficult
and more important. Our systems are generally stable, and feature development
is paramount to business success. But so is reliability and performance.
Proper reliability is the greatest operational requirement for any service2.
If the service doesn’t work as intended, no user (or engineer) will be happy.
This is where SLOs come in. SLOs should align with business objectives and
needs, which will help Product and Engineering Managers understand the direct
business impact of engineering efforts. SLOs will ensure that we have a solid
understanding of the state of our services in terms of reliability, and they
empower us to focus on user happiness. If our SLOs don’t align directly with
business objectives and needs, they should align indirectly via tracking
operational complexity and maturity. So, how do we choose where to spend our
time? SLOs (service level objectives) - including managing their error
budgets - will permit us - our product engineering teams - to have the right
conversations and make the right decisions about prioritization and
resourcing so that we can balance our efforts spent on reliability and new
product features, helping to ensure the long term happiness and confidence of
our users (and engineers). 2 Alex Hidalgo, Implementing Service Level
Objectives
13 min read

* FINDING AND PREVENTING RAILS AUTHORIZATION BUGS

Finding and Preventing Rails Authorization Bugs This article walks through
finding and fixing common Rails authorization bugs. At Betterment, we build
public facing applications without an authorization framework by following
three principles, discussed in another blog post. Those three principles are:
Authorization through Impossibility Authorization through Navigability
Authorization through Application Boundaries This post will explore the first
two principles and provide examples of common patterns that can lead to
vulnerabilities as well as guidance for how to fix them. We will also cover
the custom tools we’ve built to help avoid these patterns before they can
lead to vulnerabilities. If you’d like, you can skip ahead to the tools
before continuing on to the rest of this post. Authorization through
Impossibility This principle might feel intuitive, but it’s worth reiterating
that at Betterment we never build endpoints that allow users to access
another user’s data. There is no /api/socialsecuritynumbers endpoint because
it is a prime target for third-party abuse and developer error. Similarly,
even our authorized endpoints never allow one user to peer into another
user’s object graph. This principle keeps us from ever having the opportunity
to make some of the mistakes addressed in our next section. We acknowledge
that many applications out there can’t make the same design decisions about
users’ data, but as a general principle we recommend reducing the ways in
which that data can be accessed. If an application absolutely needs to be
able to show certain data, consider structuring the endpoint in a way such
that a client can’t even attempt to request another user’s data.
Authorization through Navigability Rule #1: Authorization should happen in
the controller and should emerge naturally from table relationships
originating from the authenticated user, i.e. the “trust root chain”. This
rule is applicable for all controller actions and is a critical component of
our security story. If you remember nothing else, remember this. What is a
“trust root chain”? It’s a term we’ve co-opted from ssl certificate lingo,
and it’s meant to imply a chain of ownership from the authenticated user to a
target resource. We can enforce access rules by using the affordances of our
relational data without the need for any additional “permission” framework.
Note that association does not imply authorization, and the onus is on the
developer to ensure that associations are used properly. Consider the
following controller: So long as a user is authenticated, they can
perform the show action on any document (including documents belonging to
others!) provided they know or can guess its ID - not great! This becomes
even more dangerous if the Documents table uses sequential ids, as that would
make it easy for an attacker to start combing through the entire table. This
is why Betterment has a rule requiring UUIDs for all new tables. This type of
bug is typically referred to as an Insecure Direct Object Reference
vulnerability. In short, these bugs allow attackers to access data directly
using its unique identifiers – even if that data belongs to someone else –
because the application fails to take authorization into account. We can use
our database relationships to ensure that users can only see their own
documents. Assuming a User has many Documents then we would change our
controller to the following: Now any document_id that doesn’t exist in the
user’s object graph will raise a 404 and we’ve provided authorization for
this endpoint without a framework - easy peezy. Rule #2: Controllers should
pass ActiveRecord models, rather than ids, into the model layer. As a
corollary to Rule #1, we should ensure that all authorization happens in the
controller by disallowing model initialization with *_id attributes. This
rule speaks to the broader goal of authorization being obvious in our code.
We want to minimize the hops and jumps required to figure out what we’re
granting access to, so we make sure that it all happens in the controller.
Consider a controller that links attachments to a given document. Let’s
assume that a User has many Attachments that can be attached to a Document
they own. Take a minute and review this controller - what jumps out to you?
At first glance, it looks like the developer has taken the right steps to
adhere to Rule #1 via the document method and we’re using strong params, is
that enough? Unfortunately, it’s not. There’s actually a critical security
bug here that allows the client to specify any attachment_id, even if they
don’t own that attachment - eek! Here’s simple way to resolve our bug: Now
before we create a new AttachmentLink, we verify that the attachment_id
specified actually belongs to the user and our code will raise a 404
otherwise - perfect! By keeping the authorization up front in the controller
and out of the model, we’ve made it easier to reason about. If we buried the
authorization within the model, it would be difficult to ensure that the
trust-root chain is being enforced – especially if the model is used by
multiple controllers that handle authorization inconsistently. Reading the
AttachmentLink model code, it would be clear that it takes an attachment_id
but whether authorization has been handled or not would remain a bit of a
mystery. Automatically Detecting Vulnerabilities At Betterment, we strive to
make it easy for engineers to do the right thing – especially when it comes
to security practices. Given the formulaic patterns of these bugs, we decided
static analysis would be a worthwhile endeavor. Static analysis can help not
only with finding existing instances of these vulnerabilities, but also
prevent new ones from being introduced. By automating detection of these “low
hanging fruit” vulnerabilities, we can free up engineering effort during
security reviews and focus on more interesting and complex issues. We decided
to lean on RuboCop for this work. As a Rails shop, we already make heavy use
of RuboCop. We like it because it’s easy to introduce to a codebase,
violations break builds in clear and actionable ways, and disabling specific
checks requires engineers to comment their code in a way that makes it easy
to surface during code review. Keeping rules #1 and #2 in mind, we’ve created
two cops: Betterment/UnscopedFind and Betterment/AuthorizationInController;
these will flag any models being retrieved and created in potentially unsafe
ways, respectively. At a high level, these cops track user input (via
params.permit et al.) and raise offenses if any of these values get passed
into methods that could lead to a vulnerability (e.g. model initialization,
find calls, etc). You can find these cops here. We’ve been using these cops
for over a year now and have had a lot of success with them. In addition to
these two, the Betterlint repository contains other custom cops we’ve written
to enforce certain patterns -- both security related as well as more general
ones. We use these cops in conjunction with the default RuboCop
configurations for all of our Ruby projects. Let’s run the first cop,
Betterment/UnscopedFind against DocumentsController from above: $ rubocop
app/controllers/documents_controller.rb Inspecting 1 file C Offenses:
app/controllers/documents_controller.rb:3:17: C: Betterment/UnscopedFind:
Records are being retrieved directly using user input. Please query for the
associated record in a way that enforces authorization (e.g. "trust-root
chaining"). INSTEAD OF THIS: Post.find(params[:post_id]) DO THIS:
currentuser.posts.find(params[:postid]) See here for more information on
this error:
https://github.com/Betterment/betterlint/blob/main/README.md#bettermentunscopedfind
@document = Document.find(params[:document_id])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1 file inspected, 1 offense detected
The cop successfully located the vulnerability. If we attempted to deploy
this code, RuboCop would fail the build, preventing the code from going out
while letting reviewers know exactly why. Now let’s try running
Betterment/AuthorizationInController on the AttachmentLink example from
earlier: $ rubocop app/controllers/documents/attachments_controller.rb
Inspecting 1 file C Offenses:
app/controllers/documents/attachments_controller.rb:3:24: C:
Betterment/AuthorizationInController: Model created/updated using unsafe
parameters. Please query for the associated record in a way that enforces
authorization (e.g. "trust-root chaining"), and then pass the resulting
object into your model instead of the unsafe parameter. INSTEAD OF THIS:
postparameters = params.permit(:albumid, :caption) Post.new(post_parameters)
DO THIS: album = currentuser.albums.find(params[:albumid]) post_parameters
= params.permit(:caption).merge(album: album) Post.new(post_parameters) See
here for more information on this error:
https://github.com/Betterment/betterlint/blob/main/README.md#bettermentauthorizationincontroller
AttachmentLink.new(create_params.merge(document: document)).save!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1 file inspected, 1 offense
detected The model initialization was flagged because it was seen using
create_params, which contains user input. Like with the other cop, this would
fail the build and prevent the code from making it to production. You may
have noticed that unlike the previous example, the vulnerable code doesn’t
directly reference a params.permit call or any of the parameter names, but
the code was still flagged. This is because both of the cops keep a little
bit of state to ensure they have the appropriate context necessary when
analyzing potentially unsafe function calls. We also made sure that when
developing these cops that we tested them with real code samples and not just
contrived scenarios that no developer would actually ever attempt. False
Positives With any type of static analysis, there’s bound to be false
positives. When working on these cops, we narrowed down false positives to
two scenarios: The flagged code could be considered insecure only in other
contexts: e.g. the application or models in question don’t have a concept of
“private” data The flagged code isn’t actually insecure: e.g. the
initialization happens to take a parameter whose name ends in _id but it
doesn’t refer to a unique identifier for any objects In both these cases, the
developer should feel empowered to either rewrite the line in question or
locally disable the cop, both of which will prevent the code from being
flagged. Normally we’d consider opting out of security analysis to be an
unsafe thing to do, but we actually like the way RuboCop handles this because
it can help reduce some code review effort; the first solution eliminates the
vulnerable-looking pattern (even if it wasn’t a vulnerability to begin with)
while the second one signals to reviewers that they should confirm this code
is actually safe (making it easy to pinpoint areas of focus). Testing & Code
Review Strategies Rubocop and Rails tooling can only get us so far in
mitigating authorization bugs. The remainder falls on the shoulders of the
developer and their peers to be cognizant of the choices they are making when
shipping new application controllers. In light of that, we’ll cover some
helpful strategies for keeping authorization front of mind. Testing When
writing request specs for a controller action, write a negative test case to
prove that attempts to circumvent your authorization measures return a 404.
For example, consider a request spec for our
Documents::AttachmentsController: These test cases are an inexpensive way to
prove to yourself and your reviewers that you’ve considered the authorization
context of your controller action and accounted for it properly. Like all of
our tests, this functions both as regression prevention and as documentation
of your intent. Code Review Our last line of defense is code review. Security
is the responsibility of every engineer, and it’s critical that our reviewers
keep authorization and security in mind when reviewing code. A few simple
questions can facilitate effective security review of a PR that touches a
controller action: Who is the authenticated user? What resource is the
authenticated user operating on? Is the authenticated user authorized to
operate on the resource in accordance with Rule #1? What parameters is the
authenticated user submitting? Where are we authorizing the user’s access to
those parameters? Do all associations navigated in the controller properly
signify authorization? Getting in the habit of asking these questions during
code review should lead to more frequent conversations about security and
data access. Our hope is that linking out to this post and its associated
Rules will reinforce a strong security posture in our application
development. In Summary Unlike authentication, authorization is context
specific and difficult to “abstract away” from the leaf nodes of application
code. This means that application developers need to consider authorization
with every controller we write or change. We’ve explored two new rules to
encourage best practices when it comes to authorization in our application
controllers: Authorization should happen in the controller and should emerge
naturally from table relationships originating from the authenticated user,
i.e. the “trust root chain”. Controllers should pass ActiveRecord models,
rather than ids, into the model layer. We’ve also covered how our custom cops
can help developers avoid antipatterns, resulting in safer and easier to read
code. Keep these in mind when writing or reviewing application code that an
authenticated user will utilize and remember that authorization should be
clear and obvious.
11 min read

* USING TARGETED UNIVERSALISM TO BUILD INCLUSIVE FEATURES

Using Targeted Universalism To Build Inclusive Features The best products are
inclusive at every stage of the design and engineering process. Here's how we
turned a request for more inclusion into a feature all Betterment customers
can benefit from. Earlier this year, a coworker asked me how difficult it
would be to add a preferred name option into our product. They showed me how
we were getting quite a few requests from trans customers to quit deadnaming
them. The simplest questions tend to be the hardest to answer. For me, simple
questions bring to mind this interesting concept called The Illusion Of
Explanatory Depth, which is when “people feel they understand complex
phenomena with far greater precision, coherence, and depth than they really
do.” Simple questions tend to shed light on subjects shrouded in this
illusion and force you to confront your lack of knowledge. Asking for
someone’s name is simple, but full of assumptions. Deadnaming is when,
intentionally or not, you refer to a trans person by the name they used
before transitioning. For many trans folks like myself, this is the name
assigned at birth which means all legal and government issued IDs and
documents use this non-affirming name. According to Healthline, because legal
name changes are “expensive, inaccessible, and not completely effective at
eliminating deadnaming”, institutions like Betterment can and should make
changes to support our trans customers. This simple question from our trans
customers “Can you quit deadnaming me?” was a sign that our original
understanding of our customers' names was not quite right, and we were
lacking knowledge around how names are commonly used. Now, our work involved
dispelling our previous understanding of what a name is. How to turn simple
questions into solutions. At Betterment, we’re required by the government to
have a record of a customer’s legal first name, but that shouldn’t prevent us
from letting customers share their preferred or chosen first name, and then
using that name in the appropriate places. This was a wonderful opportunity
to practice targeted universalism: a concept that explains how building
features specifically for a marginalized audience not only benefit the people
in that marginalized group, but also people outside of it, which increases
its broad impact. From a design standpoint, executing a preferred name
feature was pretty straightforward—we needed to provide a user with a way to
share their preferred name with us, and then start using it. The lead
designer for this project, Crys, did a lovely job of incorporating
compassionate design into how we show the user which legal name we have on
file for them, without confronting that user with their deadname every time
they go to change their settings. They accomplished that by hiding the user’s
legal name in a dropdown accordion that is toggled closed by default. Crys
also built out a delightful flow that shows the user why we require their
legal name, that answers a few common questions, and allows them to edit
their preferred first name in the future if needed. With a solid plan for
gathering user input, we pivoted to the bigger question: Where should we use
a customer’s preferred first name? From an engineering standpoint, this
question revealed a few hurdles that we needed to clear up. First, I needed
to provide a translation of my own understanding of legal first names and
preferred first names to our codebase. The first step in this translation was
to deprecate our not-very-descriptively named #firstname method and push
engineers to start using two new, descriptive methods called #legalfirstname
and #commonfirstname (#commonfirstname is essentially a defaulting method
that falls back to #legalfirstname if #preferredfirst_name is not present for
that user). To do this, I used a tool built by our own Betterment engineer,
Nathan, called Uncruft, which not only gave engineers a warning whenever they
tried to use the old #first_name method but also created a list of all the
places in our code where we were currently using that old method. This was
essentially a map for us engineers to be able to reference and go update
those old usages in our codebase whenever we wanted. This new map leads us to
our second task: addressing those deprecated usages. At first glance the
places where we used #firstname in-app seemed minimal—emails, in-app
greetings, tax documents. But once we looked under the surface, #firstname
was sprinkled nearly everywhere in our codebase. I identified the most
visible spots where we address a user and changed them, but for less visible
changes I took this new map and delegated cross-squad ownership of each
usage. Then, a group of engineers from each squad began tackling each
deprecation one by one. In order to help these engineers, we provided
guidelines around where it was necessary to use a legal first name, but in
general we pushed to use a customer’s preferred first name wherever possible.
From a high level view I essentially split this large engineering lift into
two different streams of work. There was the feature work stream which
involved: Storing the user’s new name information. Building out the user
interface. Updating the most visible spots in our application. Modifying our
integration with SimonData in order to bulk update our outgoing emails, and
Changing how we share a user’s name with our customer service (CX) team
through a Zendesk integration, as well as in our internal CX application.
Then there was the foundational work stream, which involved mapping out and
addressing every single depreciation. Thanks to Uncruft, once I generated
that initial map of deprecations the large foundational work stream could
then be further split into smaller brooks of work that could be tackled by
different squads at different times. Enabling preferred first names moves us
towards a more inclusive product. Once this feature went live, it was
extremely rewarding to see our targeted universalism approach reveal its
benefits. Our trans customers got the solution they needed, which makes this
work crucial for that fact alone—but because of that, our cis customers also
received a feature that delighted them. Ultimately, we now know that if
people are given a tool to personalize their experience within our product,
folks of many different backgrounds will use it.
6 min read

* GUIDELINES FOR TESTING RAILS APPLICATIONS

Guidelines for Testing Rails Applications Discusses the different
responsibilities of model, request, and system specs, and other high level
guidelines for writing specs using RSpec & Capybara. Testing our Rails
applications allows us to build features more quickly and confidently by
proving that code does what we think it should, catching regression bugs, and
serving as documentation for our code. We write our tests, called “specs”
(short for specification) with RSpec and Capybara. Though there are many
types of specs, in our workflow we focus on only three: model specs, request
specs, and system specs. This blog post discusses the different
responsibilities of these types of specs, and other related high level
guidelines for specs. Model Specs Model specs test business logic. This
includes validations, instance and class method inputs and outputs, Active
Record callbacks, and other model behaviors. They are very specific, testing
a small portion of the system (the model under test), and cover a wide range
of corner cases in that area. They should generally give you confidence that
a particular model will do exactly what you intended it to do across a range
of possible circumstances. Make sure that the bulk of the logic you’re
testing in a model spec is in the method you’re exercising (unless the
underlying methods are private). This leads to less test setup and fewer
tests per model to establish confidence that the code is behaving as
expected. Model specs have a live database connection, but we like to think
of our model specs as unit tests. We lean towards testing with a bit of
mocking and minimal touches to the database. We need to be economical about
what we insert into the database (and how often) to avoid slowing down the
test suite too much over time. Don’t persist a model unless you have to. For
a basic example, you generally won’t need to save a record to the database to
test a validation. Also, model factories shouldn’t by default save associated
models that aren’t required for that model’s persistence. At the same time,
requiring a lot of mocks is generally a sign that the method under test
either is doing too many different things, or the model is too highly coupled
to other models in the codebase. Heavy mocking can make tests harder to read,
harder to maintain, and provide less assurance that code is working as
expected. We try to avoid testing declarations directly in model specs -
we’ll talk more about that in a future blog post on testing model behavior,
not testing declarations. Below is a model spec skeleton with some common
test cases: System Specs System specs are like integration tests. They test
the beginning to end workflow of a particular feature, verifying that the
different components of an application interact with each other as intended.
There is no need to test corner cases or very specific business logic in
system specs (those assertions belong in model specs). We find that there is
a lot of value in structuring a system spec as an intuitively sensible user
story - with realistic user motivations and behavior, sometimes including the
user making mistakes, correcting them, and ultimately being successful. There
is a focus on asserting that the end user sees what we expect them to see.
System specs are more performance intensive than the other spec types, so in
most cases we lean towards fewer system specs that do more things, going
against the convention that tests should be very granular with one assertion
per test. One system spec that asserts the happy path will be sufficient for
most features. Besides the performance benefits, reading a single system spec
from beginning to end ends up being good high-level documentation of how the
software is used. In the end, we want to verify the plumbing of user input
and business logic output through as few large specs per feature that we can
get away with. If there is significant conditional behavior in the view layer
and you are looking to make your system spec leaner, you may want to extract
that conditional behavior to a presenter resource model and test that
separately in a model spec so that you don’t need to worry about testing it
in a system spec. We use SitePrism to abstract away bespoke page interactions
and CSS selectors. It helps to make specs more readable and easier to fix if
they break because of a UI or CSS change. We’ll dive more into system spec
best practices in a future blog post. Below is an example system spec. Note
that the error path and two common success paths are exercised in the same
spec. Request Specs Request specs test the traditional responsibilities of
the controller. These include authentication, view rendering, selecting an
http response code, redirecting, and setting cookies. It’s also ok to assert
that the database was changed in some way in a request spec, but like system
specs, there is no need for detailed assertions around object state or
business logic. When controllers are thin and models are tested heavily,
there should be no need to duplicate business logic test cases from a model
spec in a request spec. Request specs are not mandatory if the controller
code paths are exercised in a system spec and they are not doing something
different from the average controller in your app. For example, a controller
that has different authorization restrictions because the actions it is
performing are more dangerous might require additional testing. The main
exception to these guidelines is when your controller is an API controller
serving data to another app. In that case, your request spec becomes like
your system spec, and you should assert that the response body is correct for
important use cases. API boundary tests are even allowed to be duplicative
with underlying model specs if the behavior is explicitly important and
apparent to the consuming application. Request specs for APIs are owned by
the consuming app’s team to ensure that the invariants that they expect to
hold are not broken. Below is an example request spec. We like to extract
standard assertions such as ones relating to authentication into shared
examples. More on shared examples in the section below. Why don’t we use
Controller Specs? Controller specs are notably absent from our guide. We used
to use controller specs instead of request specs. This was mainly because
they were faster to run than request specs. However, in modern versions of
Rails, that has changed. Under the covers, request specs are just a thin
wrapper around Rails integration tests. In Rails 5+, integration tests have
been made to run very fast. Rails is so confident in the improvements they’ve
made to integration tests that they’ve removed controller tests from Rails
core in Rails 5.1. Additionally, request specs are much more realistic than
controller specs since they actually exercise the full request / response
lifecycle – routing, middleware, etc – whereas controller specs circumvent
much of that process. Given the changes in Rails and the limitations of
controller specs, we’ve changed our stance. We no longer write controller
specs. All of the things that we were testing in controller specs can instead
be tested by some combination of system specs, model specs, and request
specs. Why don’t we use Feature Specs? Feature specs are also absent from our
guide. System specs were added to Rails 5.1 core and it is the core team’s
preferred way to test client-side interactions. In addition, the RSpec team
recommends using system specs instead of feature specs. In system specs, each
test is wrapped in a database transaction because it’s run within a Rails
process, which means we don’t need to use the DatabaseCleaner gem anymore.
This makes the tests run faster, and removes the need for having any special
tables that don’t get cleaned out. Optimal Testing Because we use these three
different categories of specs, it’s important to keep in mind what each type
of spec is for to avoid over-testing. Don’t write the same test three times -
for example, it is unnecessary to have a model spec, request spec, and a
system spec that are all running assertions on the business logic
responsibilities of the model. Over-testing takes more development time, can
add additional work when refactoring or adding new features, slows down the
overall test suite, and sets the wrong example for others when referencing
existing tests. Think critically about what each type of spec is intended to
be doing while writing specs. If you’re significantly exercising behavior not
in the layer you’re writing a test for, you might be putting the test in the
wrong place. Testing requires striking a fine balance - we don’t want to
under-test either. Too little testing doesn’t give any confidence in system
behavior and does not protect against regressions. Every situation is
different and if you are unsure what the appropriate test coverage is for a
particular feature, start a discussion with your team! Other Testing
Recommendations Consider shared examples for last-mile regression coverage
and repeated patterns. Examples include request authorization and common
validation/error handling: Each spec’s description begins with an action
verb, not a helping verb like “should,” “will” or something similar.
8 min read

* WEBVALVE – THE MAGIC YOU NEED FOR HTTP INTEGRATION

WebValve – The Magic You Need for HTTP Integration Struggling with HTTP
integrations locally? Use WebValve to define HTTP service fakes and toggle
between real and fake services in non-production environments. When I started
at Betterment (the company) five years ago, Betterment (the platform) was a
monolithic Java application. As good companies tend to do, it began
growing—not just in terms of users, but in terms of capabilities. And our
platform needed to grow along with it. At the time, our application had no
established patterns or tooling for the kinds of third-party integrations
that customers were increasingly expecting from fintech products (e.g., like
how Venmo connects to your bank to directly deposit and withdraw money). We
were also feeling the classic pain points of a growing team contributing to a
single application. To keep the momentum going, we needed to transition
towards a service-oriented architecture that would allow the engineers of
different business units to run in parallel against their specific business
goals, creating even more demand for repeatable solutions to service
integration. This brought up another problem (and the starting point for this
blog post): in order to ensure tight feedback loops, we strongly believed
that our devs should be able to do their work on a modern, modestly-specced
laptop without internet connectivity. That meant no guaranteed connection to
a cloud service mesh. And unfortunately, it’s not possible to run a local
service mesh on a laptop without it melting. In short, our devs needed to be
able to run individual services in isolation; by default they were set to
communicate with one another, meaning an engineer would have to run all of
the services locally in order to work on any one service. To solve this
problem, we developed WebValve—a tool that allows us to define and register
fake implementations of HTTP services and toggle between real and fake
services in non-production environments. I’m going to walk you through how we
got there. Start with the test Here’s a look at what a test would look like
to see if a deposit from a bank was initiated: The five lines of code on the
bottom is the meat of the test. Easy right? Not quite. Notice the two WebMock
stub_requests calls at the top. The second one has the syntax you’d expect to
execute the test itself. But take a look at the first one—notice the 100+
lines of (omitted) code. Without getting into the gory details, this
essentially requires us, for every test we write, to stub a request for user
data—with differences across minor things like ID values, we can’t share
these stubs between tests. In short it’s a sloppy feature spec. So how do we
narrow this feature spec down to something like this? Through the magic of
libraries. First things first—defining our view of the problem space. The
success of projects like these don’t come down to the code itself—it comes
down to the ‘design’ of the solution based on its specific needs. In this
case, it meant paring the conditions down to making it work using just rails.
Those come to life in four major principles, which guide how we engage with
the problem space for our shift to a service-oriented architecture: We use
HTTP & REST to communicate with collaborator services We define the
boundaries and limit the testing of integrations with contract tests We don't
share code across service boundaries Engineers must remain nimble and
building features must remain enjoyable. A little bit of color on each,
starting with HTTP and REST. For APIs that we build for ourselves (e.g.
internal services) we have full control over how we build them, so using HTTP
and REST is no issue. We have a strong preference to use a single integration
pattern for both internal and external service integrations; this reduces
cognitive overhead for devs. When we’re communicating with external services,
we have less control, but HTTP is the protocol of the web and REST has been
around since 2000—the dawn of modern web applications— so the majority of
integrations we build will use them. REST is semantic, evolvable, limber, and
very familiar to us as Rails developers —a natural ‘other side of the coin’
for HTTP to make up the lingua franca of the web. Secondly, we need to define
the boundaries in terms of ‘contracts.’ Contracts are a point of exchange
between the consumption side (the app) and producer side (the collaborator
service). The contract defines the expectations of input and output for the
exchange. They’re an alternative to the kind of high-level systems
integration tests that would include a critical mass of components that would
render the test slow and non-repeatable. Thirdly, we don't want to have
shared code across service boundaries. Shared code between services creates
shared ownership, and shared ownership leads to undesirable coupling. We want
the API provider to own and version their APIs, and we want the API consumer
to own their integration with each version of a collaborator service's API.
If we were willing to accept tight coupling between our services,
specifically in their API contracts, we'd be well-served by a tool like Pact.
With Pact, you create a contract file based on the consumer's expectations of
an API and you share it with the provider. The contract files themselves are
about the syntax and structure of requests and responses rather than the
interpretation. There's a human conversation and negotiation to be had about
these contracts, and you can fool yourself into thinking you don't need to
have that conversation if you've got a file that guarantees that you and your
collaborator service are speaking the same language; you may be speaking the
same words, but you might not infer the same meaning. Pact's docs encourage
these human conversations, but as a tool it doesn't require them. By avoiding
shared code between services, we force ourselves to have a conversation about
every API we build with the consumers of those APIs. Finally, these tests’
effectiveness is directly related to how we can apply them to reality, so we
need to be simple—we want to be able to test and build features without
connections to other features. We want them to be able to work without an
internet connection, and if we do want to integrate with a real service in
local development, we should be able to do that—meaning we should be able to
test and integrate locally at will, without having to rely on cumbersome,
extra-connected services (think Docker, Kubernetes; anything that pairs cloud
features with the local environment.) Straightforward tests are easy to
write, read, and maintain. That keeps us moving fast and not breaking things.
So, to recap, there are four principles that will drive our solution: Service
interactions happen over HTTP & REST Contract tests ensure that service
interactions behave as expected Providing an API contract requires no shared
code Building features remains fast and fun Okay, okay, but how? So we’ve
established that we don’t want to hit external services in tests, which we
can do through WebMock or similar libraries. The challenge becomes: how do we
replicate the integration environment without the integration environment?
Through fakes. We’ll fake the integration by using Sinatra to build a rack
app that quacks like the real thing. In the rack app, we define the routes we
care about for the things we normally would have stubbed in the tests. From
here, we do the things we couldn’t do before—pull real parameters out of the
requests and feed them back into the fake response to make it more realistic.
Additionally, we can use things like ActiveRecord to make these fake
responses even more realistic based on the data stored in our actual
database. So what does the fake look like? It's a class with a route defined
for each URL we care about faking. We can use WebMock to wire the fake to
requests that match a certain pattern. If we receive a request for a URL we
didn't define, it will 404. Simple. However, this doesn’t allow us to solve
all the things we were working for. What’s missing? First, an idiomatic setup
stance. We want to be able to define fakes in a single place, so when we add
a new one, we can easily find it and change it. In the same vein, we want to
be able to answer similar questions about registering fakes in one spot.
Finally, convention over configuration—if we can load, register, and wire-up
a fake based on its name, for example, that would be handy. Secondly, it’s
missing environment-specific behavior, which in this case, translates into
the ability to toggle the library on and off and separately toggle the
connection to specific collaborator services on and off. We need to be able
to have the library active when running tests or doing local development, but
do not want to have it running in a production environment—if it remains
active in a real environment, it might affect real customer accounts, which
we cannot afford. But, there will also be times when we're running in a local
development environment and we want to communicate with a real collaborator
service to do some true integration testing. Thirdly, we want to be able to
autoload our fakes. If they’re in our codebase, we should be able to iterate
on the fakes without having to restart our server; the behavior isn’t always
right the first time, and restarting is tedious and it's not the Rails Way.
Finally, to bolt this on to an IRL application, we need the ability to define
fakes incrementally and migrate them into existing integrations that we have,
one by one. Okay brass tacks. No existing library allows us to integrate this
way and map HTTP requests to in-process fakes for integration and
development. Hence, WebValve. TL;DR—WebValve is an open-source gem that uses
Sinatra and WebMock to provide fake HTTP service behavior. The special sauce
is that it works for more than just your tests. It allows you to run your
fakes in your dev environment as well, providing functionality akin to real
environments with the toggles we need to access the real thing when we need
to. Let’s run it through the gauntlet to show how it works and how it solves
for all our requirements. First we add the gem to our Gemfile and run bundle
install. With the gem installed, we can use the generator rails g
webvalve:install to bootstrap a default config file where we can register our
fakes. Then we can generate a fake for our "trading" collaborator service
using rails generate webvalve:fake_service Trading. This gives us a class in
a conventional location that inherits from WebValve::FakeService. This looks
very similar to a Sinatra app, and that's because it is one—with some
additional magic baked in. To make this fake work, all we have to do is
define the conventionally-named environment variable, TRADINGAPIURL. That
tells WebValve what requests to intercept and route to this fake. By
inheriting from this WebValve class, we gain the ability to toggle the fake
behavior on or off based on another conventionally-named environment
variable, in this case TRADING_ENABLED. So let’s take our feature spec.
First, we configure out test suite to use WebValve with the RSpec config
helper require 'webvalve/rspec'. Then, we look at the user API call—we define
a new route for user, in FakeTrading. Then we flesh out that fake route by
scooping out our json from the test file and probably making it a little more
dynamic when we drop it into the fake. Then we do the same for the deposit
API call. And now our test, which doesn't care about the specifics of either
of those API calls, is much clearer. It looks just like our ideal spec from
before: We leverage all the power of WebMock and Sinatra through our
conventions and the teeniest configuration to provide all the same
functionality as before, but we can write cleaner tests, we get the ability
to use these fakes in local development instead of the real services—and we
can enable a real service integration without missing a beat. We’ve achieved
our goal—we’ve allowed for all the functionality of integration without the
threats of actual integration. Check it out on GitHub. This article is part
of Engineering at Betterment.
11 min read

* BUILDING FOR BETTER: GENDER INCLUSION AT BETTERMENT

Building for Better: Gender Inclusion at Betterment Betterment sits at the
intersection of two industries with large, historical gender gaps. We’re
working to change that—for ourselves and our industries. Since our founding,
we’ve maintained a commitment to consistently build a better company and
product for our customers and our customers-to-be. Part of that commitment
includes reflecting the diversity of those customers. Betterment sits at the
intersection of finance and technology—two industries with large, historical
diversity gaps, including women and underrepresented populations. We’re far
from perfect, but this is what we’re doing to embrace the International
Women’s Day charge and work toward better gender balance at Betterment and in
our world. Building Diversity And Inclusion At Betterment Change starts at
the heart of the matter. For Betterment, this means working to build a
company of passionate individuals who reflect our customers and bring new and
different perspectives to our work. Our internal Diversity and Inclusion
Committee holds regular meetings to discuss current events and topics,
highlights recognition months (like Black History and Women’s History
Months), and celebrates the many backgrounds and experiences of our
employees. We’ve also developed a partnership with Peoplism. According to
Caitlin Tudor-Savin, HR Business Partner, “This is more than a check-the-box
activity, more than a one-off meeting with an attendance sheet. By partnering
with Peoplism and building a long-term, action-oriented plan, we’re working
to create real change in a sustainable fashion.” One next step we’re excited
about is an examination of our mentorship program to make sure that everyone
at Betterment has access to mentors. The big idea: By building empathy and
connection among ourselves, we can create an inclusive environment that
cultivates innovative ideas and a better product for our customers. Engaging
The Tech Community At Large At Betterment, we’re working to creating change
in the tech industry and bringing women into our space. By hosting meetups
for Women Who Code, a non-profit organization that empowers women through
technology, we’re working to engage this community directly. Rather than
getting together to hear presentations, meetups are designed to have a
group-led dynamic. Members break out and solve problems together, sharing and
honing skills, while building community and support. This also fosters
conversation, natural networking, and the chance for women to get their foot
in the door. Jess Harrelson, a Betterment Software Engineer, not only leads
our hosting events, they found a path to Betterment through Women Who Code.
“Consistency is key,” said Jess. “Our Women Who Code meetups become a way to
track your progression. It’s exciting to see how I’ve developed since I first
started attending meetups, and how some of our long-time attendees have grown
as engineers and as professionals.” Building A Community Of Our Own In 2018,
our Women of Betterment group had an idea. They’d attended a number of
networking and connection events, and the events never felt quite right. Too
often, the events involved forced networking and stodgy PowerPoint
presentations, with takeaways amounting to little more than a free glass of
wine. Enter the SHARE (Support, Hire, Aspire, Relate, Empower) Series.
Co-founder Emily Knutsen wanted “to build a network of diverse individuals
and foster deeper connections among women in our community.” Through the
SHARE Series, we hope to empower future leaders in our industry to reach
their goals and develop important professional connections. While the series
focuses on programming for women and those who identify as women, it is
inclusive to everyone in our community who wish to be allies and support our
mission. We developed the SHARE Series to create an authentic and
conversational environment, one where attendees help guide the conversations
and future event themes. Meetings thus far have included a panel discussion
on breaking into tech from the corporate world and a small-group financial
discussion led by financial experts from Betterment and beyond. “We’re
excited that organizations are already reaching out to collaborate,” Emily
said. “We’ve gotten such an enthusiastic response about designing future
events around issues that women (and everyone!) face, such as salary
negotiations.” Getting Involved Want to join us as we work to build a more
inclusive and dynamic community? Our next SHARE Series event features CBS
News Business Analyst and CFP® professional Jill Schlesinger, as we celebrate
her new book, The Dumb Things Smart People Do with Their Money: Thirteen Ways
to Right Your Financial Wrongs. You can also register to attend our Women Who
Code meetups, and join engineers from all over New York as we grow, solve,
and connect with one another.
4 min read

* CI/CD: STANDARDIZING THE INTERFACE

CI/CD: Standardizing the Interface Meet our CI/CD platform, Coach and learn
how wee increased consistent adoption of Continuous Integration (CI) across
our engineering organization. And why that's important. This is the second
part of a series of posts about our new CI/CD platform, Coach. Part
I explores several design choices we made in building out our notifications
pipeline and describes how those choices are emblematic of our overarching
engineering principles here at Betterment. Today I’d like to talk about how
we increased consistent adoption of Continuous Integration (CI) across our
engineering organization, and why. Our Principles in Action: Standardizing
the Interface At Betterment, we want to empower our engineers to do their
best work. CI plays an important role in all of our teams’ workflows. Over
time, a handful of these teams formed deviating opinions on what kind of
acceptance criteria they had for CI. While we love the concern that our
engineers show toward solving these problems, these deviations became
problematic for applications of the same runtime that should abide by the
same set of rules; for example, all Ruby apps should run RSpec and Rubocop,
not just some of them. In building a platform as a service (PaaS), we
realized that in order to mitigate the problem of nurturing pets vs herding
cattle we would need to identify a firm set of acceptance criteria for
different runtimes. In the first post of this series we mention one of our
principles, Standardize the Pipeline. In this post, we’ll explore that
principle and dive into how we committed 5000 line configuration files to our
repositories with confidence by standardizing CI for different runtimes,
automating configuration generation in code, and testing the process that
generates that configuration. What’s so good about making everything
the same? Our goals in standardizing the CI interface were to: Make it easier
to distribute new CI features more quickly across the organization. Onboard
new applications more quickly. Ensure the same set of acceptance criteria is
in place for all codebases in the org. For example, by assuming that any Java
library will run the PMDlinter and unit tests in a certain way we can
bootstrap a new repository with very little effort. Allow folks outside of
the SRE team to contribute to CI. In general, our CI platform categorizes
projects into applications and libraries and divides those up further by
language runtime. Combined together we call this a project_type. When we make
improvements to one project type’s base configuration, we can flip a switch
and turn it on for everyone in the org at once. This lets us distribute
changes across the org quickly. How we managed to actually execute on this
will become clearer in the next section, but for the sake of
hand-wavy-expediency, we have a way to run a few commands and distribute CI
changes to every project in a matter of minutes. How did we do it? Because we
use CircleCI for our CI pipelines, we knew we would have to define our
workflows using their DSL inside a .circleci/config.yml file at the root of a
project’s repository. With this blank slate in front of us we were able to
iterate quickly by manually adding different jobs and steps to that file. We
would receive immediate feedback in the CircleCI interface when those jobs
ran, and this feedback loop helped us iterate even faster. Soon we were
solving for our acceptance criteria requirements left and right — that Java
app needs the PMD linter! This Ruby app needs to run integration tests! And
then we reached the point where manual changes were hindering our
productivity. The .circleci/config.yml file was getting longer than a
thousand lines fast, partly because we didn’t want to use any YAML shortcuts
to hide away what was being run, and partly because there were no
higher-level mechanisms available at the time for re-use when writing YAML
(e.g. CircleCI’s orbs). Defining the system Our solution to this problem was
to build a system, a Coach CLI for our Coach app, designed according to CLI
12-factor conventions. This system’s primary goal is to
create .circleci/config.yml files for repositories to encapsulate the
necessary configuration for a project’s CI pipeline. The CLI reads a small
project-level configuration definition file (coach.yml) located in a
project’s directory and extrapolates information to create the much larger
repo-level CircleCI specific configuration file (.circleci/config.yml), which
we were previously editing ourselves. To clarify the hierarchy of how we
thought about CI, here are the high level terms and components of our Coach
CLI system: There are projects. Each project needs a configuration definition
file (coach.yml) that declares its project_type. We
support wordpress_app, java_library, java_app, ruby_gem, ruby_app,
and javascript_libraryfor now. There are repos, each repo has one or more
projects of any type. There needs to be a way to set up a new project. There
needs to be a way to idempotently generate the CircleCI configuration
(.circleci/config.yml) for all the projects in a repo at once. Each project
needs to be built, tested, and linted. We realized that the dependency graph
of repository → projects → project jobs was complicated enough that we would
need to recreate the entire .circleci/config.yml file whenever we needed to
update it, instead of just modifying the YAML file in place. This was one
reason for automating the process, but the downsides of human-managed
software were another. Manual updates to this file allow the configuration
for infrequently-modified projects to drift. And leaving it up to engineers
to own their own configuration lets folks modify the file in an unsupported
way which could break their CI process. And then we’re back to square one. We
decided to create that large file by ostensibly concatenating smaller
components together. Each of those smaller components would be the output of
specific functions, and each of those functions would be written in code and
be tested. The end result was a lot of small files that look a little like
this:
https://gist.github.com/agirlnamedsophia/4b4a11acbe5a78022ecba62cb99aa85a
Every time we make a change to the Coach CLI codebase we are confident that
the thousands of lines of YAML that are idempotently generated as a result of
the coach update ci command will work as expected because they’re already
tested in isolation, in unit tests. We also have a few heftier integration
tests to confirm our expectations. And no one needs to manually edit
the .circleci/config.yml file again. Defining the Interface In order to
generate the .circleci/config.yml that details which jobs to run and what
code to execute we first needed to determine what our acceptance criteria
was. For each project type we knew we would need to support: Static code
analysis Unit tests Integration tests Build steps Test reports We define the
specific jobs a project will run during CI by looking at
the projecttype value inside a project’s coach.yml. If the value
for projecttype is ruby_app then the .circleci/config.yml generator will
follow certain conventions for Ruby programs, like including a job to run
tests with RSpec or including a job to run static analysis commands
like Rubocopand Brakeman. For Java apps and libraries we run integration and
unit tests by default as well as PMD as part of our static code analysis.
Here’s an example configuration section for a single job, the linter job for
our Coach repository:
https://gist.github.com/agirlnamedsophia/4b4a11acbe5a78022ecba62cb99aa85a And
here’s an example of the Ruby code that helps generate that result:
https://gist.github.com/agirlnamedsophia/a96f3a79239988298207b7ec72e2ed04 For
each job that is defined in the .circleci/config.yml file, according to the
project type’s list of acceptance criteria, we include additional steps to
handle notifications and test reporting. By knowing that the Coach app is
a ruby_appwe know how many jobs will need to be run and when. By writing that
YAML inside of Ruby classes we can grow and expand our pipeline as needed,
trusting that our tests confirm the YAML looks how we expect it to look. If
our acceptance criteria change, because everything is written in code, adding
a new job involves a simple code change and a few tests, and that’s it. We’ll
go into contributing to our platform in more detail below. Onboarding a
new project One of the main reasons for standardizing the interface and
automating the configuration generation was to onboard new applications more
quickly. To set up a new app all you need to do is be in the directory for
your project and then run coach create project --type $project_type. -> %
coach create project --type ruby_app 'coach.yml' configuration file added --
update it based on your project's needs When you run that, the CLI creates
the small coach.yml configuration definition file discussed earlier. Here’s
what an example Ruby app’s coach.yml looks like:
https://gist.github.com/agirlnamedsophia/2f966ab69ba1c7895ce312aec511aa6b The
CLI will refer back to a project’s coach.yml to decide what kind of CircleCI
DSL needs to be written to the .circleci/config.yml file to wire up the right
jobs to run at the right time. Though our contract with projects of different
types is standardized, we permit some level of customization.
The coach.yml file allows our users to define certain characteristics of
their CI flow that vary and require more domain knowledge about a specific
project: like the level of test parallelism their application test suite
requires, or the list of databases required for tests to run, or an attribute
composed of a matrix of Ruby versions and Gemfiles to run the whole test
suite against. Using this declarative configuration is more extensible and
more user friendly and doesn’t break the contract we’ve put in place for
projects that use our CI platform. Contributing to CI Before, if you wanted
to add an additional linter or CI tool to our pipeline, it would require
adding a few lines of untested bash code to an existing Jenkins job, or
adding a new job to a precarious graph of jobs, and crossing your fingers
that it would “just work.” The addition couldn’t be tested and it was often
only available to one project or one repository at a time. It couldn’t scale
out to the rest of the org with ease. Now, updating CI requires opening a PR
to make the change. We encourage all engineers who want to add to their own
CI pipeline to make changes on a branch from our Coach repository, where all
the configuration generation magic happens, verify its effectiveness for
their use-case, and open a pull request. If it’s a reasonable addition to CI,
our thought is that everyone should benefit. By having these changes in
version control, each addition to the CI pipeline goes through code review
and requires tests be written. We therefore have the added benefit of knowing
that updates to CI have been tested and are deemed valid and working before
they’re distributed, and we can prevent folks from removing a feature without
considering the impact it may have. When a PR is merged, our team takes care
of redistributing the new version of the library so engineers can update
their configuration. CI is now a mechanism for instantly sharing the benefits
of discovery made in isolated exploration, with everyone. Putting it
all together Our configuration generator is doing a lot more than just taping
together jobs in a workflow — we evaluate dependency graphs and only run
certain jobs that have upstream changes or are triggered themselves. We built
our Coach CLI into the Docker images we use in CircleCI and so those Coach
CLI commands are available to us from inside the .circleci/config.yml file.
The CLI handles notifications, artifact generation, and deployment triggers.
As we stated in our requirements for Coach in the first post, we believe
there should be one way to test code, and one way to deploy it. To get there
we had to make all of our Java apps respond to the same set of commands, and
all of our Ruby apps to do the same. Our CLI and the accompanying conventions
make that possible. When before it could take weeks of both product
engineering and SRE time to set up CI for an application or service within a
complex ecosystem of bash scripts and Jenkins jobs and application
configuration, now it takes minutes. When before it could take days or weeks
to add a new step to a CI pipeline, now it takes hours of simple code review.
We think engineers should focus on what they care about the most, shipping
great features quickly and reliably. And we think we made it a little easier
for them (and us) to do just that. What’s Next? Now that we’ve wrangled our
CI process and encoded the best practices into a tool, we’re ready to tackle
our Continuous Deployment pipeline. We’re excited to see how the model of
projects and project types that we built for CI will evolve to help us
templatize our Kubernetes deployments. Stay tuned.
11 min read

* CI/CD: SHORTENING THE FEEDBACK LOOP

CI/CD: Shortening the Feedback Loop As we improve and scale our CD platform,
shortening the feedback loop with notifications was a small, effective, and
important piece. Continuous Delivery (CD) at scale is hard to get right. At
Betterment, we define CD as the process of making every small change to our
system shippable as soon as it’s been built and tested. It’s part of the
CI/CD (continuous integration and continuous delivery) process. We’ve been
doing CD at Betterment for a long time, but it had grown to be quite a
cumbersome process over the last few years because our infrastructure and
tools hadn’t evolved to meet the needs of our growing engineering team. We
reinvented our Site Reliability Engineering (SRE) team last fall with our
sights set on building software to help developers move faster, be happier,
and feel empowered. The focus of our work has been on delivering a platform
as a service to make sense of the complex process of CD. Coach is the
beginning of that platform. Think of something like Heroku, but for engineers
here at Betterment. We wanted to build a thoughtfully composed platform based
on the tried and true principles of 12-factor apps. In order to build this,
we needed to do two overhauls: 1) Build a new CI pipeline and 2) Build a new
CD pipeline. Continuous Integration — Our Principles For years, we
used Jenkins, an open-source tool for automation, and a mess of scripts to
provide CI/CD to our engineers. Jenkins is a powerful tool and well-used in
the industry, but we decided to cut it because the way that we were using it
was wrong, we weren’t pleased with its feature set, and there was too much
technical debt to overcome. Tests were flakey and we didn’t know if it was
our Jenkins setup, the tests themselves, or both. Dozens of engineers
contribute to our biggest repository every day and as the code base and
engineering team have grown, the complexity of our CI story has increased and
our existing pipeline couldn’t keep up. There were task forces cobbled
together to drive up reliability of the test suite, to stamp out flakes, to
rewrite, and to refactor. This put a band-aid on the problem for a short
while. It wasn’t enough. We decided to start fresh with CircleCI, an
alternative to Jenkins that comes with a lot more opinions, far fewer rough
edges, and a lot more stability built-in. We built a tool (Coach) to make the
way that we build and test code conventional across all of our of apps,
regardless of language, application owner, or business unit. As an added
bonus, since our CI process itself was defined in code, if we ever need to
switch platforms again, it would be much easier. Coach was designed and built
with these principles: Standardize the pipeline — there should be one way to
test code, and one way to deploy it Test code often — code should be tested
as often as it’s committed Build artifacts often — code should be built as
often as it’s tested so that it can be deployed at any time Be environment
agnostic — artifacts should be built in an environment-agnostic way with
maximum portability Give consistent feedback — the CI output should be
consistent no matter the language runtime Shorten the feedback
loop — engineers should receive actionable feedback as soon as possible
Standardizing CI was critical to our growth as an organization for a number
of reasons. It ensures that new features can be shipped more quickly, it
allows new services to adopt our standardized CI strategy with ease, and it
lets us recover faster in the face of disaster — a hurricane causing a power
outage at one of our data centers. Our goal was to replace the old way of
building and testing our applications (what we called the “Old World”) and
start fresh with these principles in mind (what we deemed the “New World”).
Using our new platform to build and test code would allow our engineers to
receive automated feedback sooner so they could iterate faster. One of our
primary aims in building this platform was to increase developer velocity, so
we needed to eliminate any friction from commit to deploy. Friction here
refers to ambiguity of CI results and the uncertainty of knowing where your
code is in the CI/CD process. Shortening the feedback loop was one of the
first steps we took in building out our new platform, and we’re excited to
share the story of how we designed that solution. Our Principles in Action:
Shortening the Feedback Loop The feedback loop in the Old World run by
Jenkins was one of the biggest hurdles to overcome. Engineers never really
knew where their code was in the pipeline. We use Slack, like a lot of other
companies, so that part of the messaging story wouldn’t change, but there
were bugs we needed to fix and design flaws we needed to update. How much
feedback should we give? When do we want to give feedback? How detailed
should our messages be? These were some of the questions we asked ourselves
during this part of the design phase. What our Engineers Needed For pull
requests, developers would commit code and push it up to GitHub and then
eventually they would receive a Slack message that said “BAD” for every test
suite that failed, or “GOOD” if everything passed, or nothing at all in the
case of a Jenkins agent getting stuck and hanging forever. The notifications
were slightly more nuanced than good/bad, but you get the idea. We valued
sending Slack messages to our engineers, as that’s how the company
communicates most effectively, but we didn’t like the rate of communication
or the content of those messages. We knew both of those would need to change.
As for merges into master, the way we sent Slack messages to communicate to
engineering teams (as opposed to just individuals) was limited because of how
our CI/CD process was constructed. The entire CI and CD process happened as a
series of interwoven Jenkins freestyle jobs. We never got the logic quite
right around determining whose code was being deployed — the deploy logic was
contingent on a pretty rough shell script called “inside a Jenkins job.” The
best we had was a Slack message that was sent roughly five minutes before a
deploy began, tagging a good estimation of contributors but often missing
someone if their Github email address was different from their Slack email
address. More critically, the one-off script solution wasn’t stored in source
control, therefore it wasn’t tested. We had no idea when it failed or missed
tagging some contributors. We liked notifying engineers when a deploy began,
but we needed to be more accurate about who we were notifying. What our SRE
Team Needed Our design and UX was informed by what our engineers using our
platform needed, but Coach was built based on our needs. What did we need?
Well-tested code stored in version control that could easily be changed and
developed. All of the code that handles changesets and messaging logic in the
New World is written in one central location, and it’s tested in isolation.
Our CI/CD process invokes this code when it needs to, and it works great. We
can be confident that the right people are notified at the right time because
we wrote code that does that and we tested it. It’s no longer just a script
that sometimes works and sometimes doesn’t. Because it’s in source control
and it runs through its own CI process, we can also easily roll out changes
to notifications without breaking things. We wanted to build our platform
around what our engineers would need to know, when they need to know it, and
how often. And so one of the first components we built out was this new
communication pipeline. Next we’ll explore in more detail some of our design
choices regarding the content of our messages and the rate at which we send
them. Make sure our engineers don’t mute their slack notifications In leaving
the Old World of inconsistent and contextually sparse communication we looked
at our blank canvas and initially thought “every time the tests pass, send a
notification! That will reduce friction!” So we tried that. If we merged code
into a tracked branch — a branch that multiple engineers contribute to, like
master — for one of our biggest repos, which contained 20 apps and 20 test
suites, we would be notified at every transition: every rubocop failure,
every flakey occurrence of a feature test. We quickly realized it was too
much. We sat back and thought really hard about what we would want,
considering we were dogfooding our own pipeline. How often did we want to be
notified by the notification system when our tests that tested the code that
built the notification system, succeeded? Sheesh, that’s a mouthful. Our
Slack bot could barely keep up! We decided it was necessary to be told
only once when everything ran successfully. However, for failures, we didn’t
want to sit around for five minutes crossing our fingers hoping that
everything was successful only to be told that we could have known three
minutes earlier that we’d forgotten a newline at the end of one of our files.
Additionally, in CircleCI where we can easily parallelize our test suites, we
realized we wouldn’t want to notify someone for every chunk of the test suite
that failed, just the first time a failure happened for the suite. We came up
with a few rules to design this part of the system: Let the author know as
soon as possible when something is red but don’t overdo it for redundant
failures within the same job (e.g. if unit tests ran on 20 containers and 18
of them saw failures, only notify once) Only notify once about all the green
things Give as much context as possible without being overwhelming:
be concise but clear Next we’ll explore the changes we made in content. What
to say when things fail This is what engineers would see in the Old World
when tests failed for an open pull request: Among other deficiencies, there’s
only one link and it takes us to a Jenkins job. There’s no context to orient
us quickly to what the notification is for. After considering what we were
currently sending our engineers, we realized that 1) context and
2) status were the most important things to communicate, which were the
aspects of our old messaging that were suffering the most. Here’s what we
came up with: Thanks Coach bot! Right away we know what’s happened. A PR
build failed. It failed for a specific GitHub
branch (“what-to-say-when-things-fail-branch”), in a specific
repo (“Betterment/coach”), for a specific PR (#430),for a specific job in the
test suite (“coach_cli — lint (Gemfile)”). We can click on any of these links
and know exactly where they go based on the logo of the service. Messages
about failures are now actionable and full of context,prompting the engineer
to participate in CI, to go directly to their failures or to their PR. And
this bounty of information helps a lot if the engineer has multiple PRs open
and needs to quickly switch context. The messaging that happened for failures
when you merged a pull request into master was a little different in that it
included mentions for the relevant contributors (maybe all of them, if we
were lucky!): The New World is cleaner, easier to grok, and more immediately
helpful: The link title to GitHub is the commit diff itself, and it takes you
to the compare URL for that changeset. The CircleCI info includes the title
of the job that failed (“coach_cli — lint (Gemfile)”), the build number
(“#11389”) to reference for context in case there are multiple occurrences of
the failure in multiple workflows, a link to the top-level “Workflow”, and @s
for each contributor. What to say when things succeed We didn’t change the
frequency of messaging for success — we got that right the first time around.
You got one notification message when everything succeeded and you still do.
But in the Old World there wasn’t enough context to make the message
immediately useful. Another disappointment we had with the old messaging was
that it didn’t make us feel very good when our tests passed. It was just a
moment in time that came and went: In the New World we wanted to proclaim
loudly (or as loudly as you can proclaim in a Slack message) that the pull
request was successful in CI: Tada! We did it! We wanted to maintain the same
format as the new failure messages for consistency and ease of reading. The
links to the various services we use are in the same order as our new failure
messages, but the link to CircleCI only goes to the workflow that shows the
graph of all the tests and jobs that ran. It’s delightful and easy to parse
and has just the right amount of information. What’s next? We have big dreams
for the future of this platform with more and more engineers using our
product. Shortening the feedback loop with notifications is only one small,
but rather important, part of our CD platform. In the next post of this
series on CD, we’ll explore how we committed 5000 line configuration files to
our repositories with confidence by standardizing CI for different runtimes,
automating config generation in code, and testing that code generation. We
believe in a world where shipping code, even in really large codebases with
lots of contributors, should be done dozens of times a day. Where engineers
can experience feedback about their code with delight and simplicity. We’re
building that at Betterment.
12 min read

* SHH… IT’S A SECRET: MANAGING SECRETS AT BETTERMENT

Shh… It’s a Secret: Managing Secrets at Betterment Opinionated secrets
management that helps us sleep at night. Secrets management is one of those
things that is talked about quite frequently, but there seems to be little
consensus on how to actually go about it. In order to understand our journey,
we first have to establish what secrets management means (and doesn’t mean)
to us. What is Secrets Management? Secrets management is the process of
ensuring passwords, API keys, certificates, etc. are kept secure at every
stage of the software development lifecycle. Secrets management does NOT mean
attempting to write our own crypto libraries or cipher algorithms. Rolling
your own crypto isn’t a great idea. Suffice it to say, crypto will not be the
focus of this post. There’s such a wide spectrum of secrets management
implementations out there ranging from powerful solutions that require a
significant amount of operational overhead, like Hashicorp Vault, to
solutions that require little to no operational overhead, like a .env file.
No matter where they fall on that spectrum, each of these solutions has
tradeoffs in its approach. Understanding these tradeoffs is what helped our
Engineering team at Betterment decide on a solution that made the most sense
for our applications. In this post, we’ll be sharing that journey. How it
used to work We started out using Ansible Vault. One thing we liked about
Ansible Vault is that it allows you to encrypt a whole file or just a string.
We valued the ability to encrypt just the secret values themselves and leave
the variable name in plain-text. We believe this is important so that we can
quickly tell which secrets an app is dependent on just by opening the file.
So the string option was appealing to us, but that workflow didn’t have the
best editing experience as it required multiple steps in order to encrypt a
value, insert it into the correct file, and then export it into the
environment like the 12-factor appmethodology tells us we should. At the
time, we also couldn’t find a way to federate permissions with Ansible Vault
in a way that didn’t hinder our workflow by causing a bottleneck for
developers. To assist us in expediting this workflow, we had an alias in our
bash_profiles that allowed us to run a shortcut at the command line to
encrypt the secret value from our clipboard and then insert that secret value
in the appropriate Ansible variables file for the appropriate environment.
alias prod-encrypt="pbpaste | ansible-vault encrypt_string
--vault-password-file=~/ansible-vault/production.key" This wasn’t the worst
setup, but didn’t scale well as we grew. As we created more applications and
hired more engineers, this workflow became a bit much for our small SRE team
to manage and introduced some key-person risk, also known as the Bus Factor.
We needed a workflow with less of a bottleneck, but allowing every developer
access to all the secrets across the organization was not an acceptable
answer. We needed a solution that not only maintained our security posture
throughout the software development lifecycle, but also enforced our opinions
about how secrets should be managed across environments. Decisions,
decisions… While researching our options, we happened upon a tool
called sops. Maintained and open-sourced by Mozilla, sops is a command line
utility written in Go that facilitates slick encryption and decryption
workflows by using your terminal’s default editor. Sops encrypts and decrypts
your secret values using your cloud provider’s Key Management Service (AWS
KMS, GCP KMS, Azure Key Vault) and PGP as a backup in the event those
services are not available. It leaves the variable name in plain-text while
only encrypting the secret value itself and supports YAML, JSON, or binary
format. We use the YAML format because of its readability and terseness. See
a demo of how it works. We think this tool works well with the way we think
about secrets management. Secrets are code. Code defines how your application
behaves. Secrets also define how your application behaves. So if you can
encrypt them safely, you can ship your secrets with your code and have a
single change management workflow. Github pull request reviews do software
change management right. YAML does human readable key/value storage right.
AWS KMS does anchored encryption right. AWS Regions do resilience right. PGP
does irreversible encryption better than anything else readily available and
is broadly supported. In sops, we’ve found a tool that combines all of these
things enabling a workflow that makes secrets management easier. Who’s
allowed to do what? Sops is a great tool by itself, but operations security
is hard. Key handling and authorization policy design is tricky to get right
and sops doesn’t do it all for us. To help us with that, we took things a
step further and wrote a wrapper around sops we call sopsorific. Sopsorific,
also written in Go, makes a few assumptions about application environments.
Most teams need to deploy to multiple environments: production, staging,
feature branches, sales demos, etc. Sopsorific uses the term “ecosystem” to
describe this concept, as well as collectively describe a suite of apps that
make up a working Betterment system. Some ecosystems are ephemeral and some
are durable, but there is only one true production ecosystem holding
sensitive PII (Personally Identifiable Information) and that ecosystem must
be held to a higher standard of access control than all others. To capture
that idea, we introduced a concept we call “security zones” into sopsorific.
There are only two security zones per GitHub repository — sensitive, and
non-sensitive — even if there are multiple apps in a repository. In the case
of mono-repos, if an app in that repository shouldn’t have its secrets
visible to all engineers who work in that repository, then the app belongs in
a different repository. With sopsorific, secrets for the non-sensitive zone
can be made accessible to a broader subset of the app team than sensitive
zone secrets helping to eliminate some of bottleneck issues we’ve experienced
with our previous workflow. By default, sopsorific wants to be configured
with a production (sensitive zone) secrets file and a default (non-sensitive
zone) secrets file. The default file makes it easy to spin up new
non-sensitive one-off ecosystems without having to redefine every secret in
every ecosystem. It should “just work” unless there are secrets that have
different values than already configured in the default file. In that case,
we would just need to define the secrets that have different values in a
separate secrets file like devintest.yml below where devintest is the name of
the ecosystem. Here’s an example of the basic directory structure: .sops.yaml
app/ |_ deployment_secrets/ |_ sensitive/ |_ production.yml |_ nonsensitive/
|_ default.yml |_ devin_test.yml The security zone concept allows a more
granular access control policy as we can federate decrypt permissions on a
per application and per security zone basis by granting or revoking access to
KMS keys with AWS Identity and Access Management (IAM) roles. Sopsorific
bootstraps these KMS keys and IAM roles for a given application. It generates
a secret-editor role that privileged humans can assume to manage the secrets
and an application role for the application to assume at runtime to decrypt
the secrets. Following the principle of least privilege, our engineering team
leads are app owners of the specific applications they maintain. App owners
have permissions to assume the secret-editor role for sensitive ecosystems of
their specific application. Non app owners have the ability to assume the
secret-editor role for non-sensitive ecosystems only. How it works now Now
that we know who can do what, let’s talk about how they can do what they can
do. Explaining how we use sopsorific is best done by exploring how our
secrets management workflow plays out for each stage of the software
development lifecycle. Development Engineers have permissions to assume the
secret-editor role for the security zones they have access to. Secret-editor
roles are named after their corresponding IAM role which includes the
security zone and the name of the GitHub repository. For
example, secreteditorsensitive_coach where coach is the name of the
repository. We use a little command line utility to assume the role and are
dropped into a secret-editor session where they use sops to add or edit
secrets with their editor in the same way they add or edit code in a feature
branch. assuming a secret-editor role The sops command will open and decrypt
the secrets in their editor and, if changed, encrypt them and save them back
to the file’s original location. All of these steps, apart from the editing,
are transparent to the engineer editing the secret. Any changes are then
reviewed in a pull request along with the rest of the code. Editing a file is
as simple as: sops deployment_secrets/sensitive/production.yml Testing We
built a series of validations into sopsorific to further enforce our opinions
about secrets management. Some of these are: Secrets are unguessable — Short
strings like “password” are not really secrets and this check enforces
strings that are at least 128 bits of entropy expressed in unpadded base64.
Each ecosystem defines a comprehensive set of secrets — The 12-factor app
methodology reminds us that all environments should resemble production as
closely as possible. When a secret is added to production, we have a check
that makes sure that same secret is also added to all other ecosystems so
that they continue to function properly. All crypto keys match — There are
checks to ensure the multi-region KMS key ARNs and backup PGP key fingerprint
in the sops config file matches the intended security zones. These
validations are run as a step in our Continuous Integration suite. Running
these checks is a completely offline operation and doesn’t require access to
the KMS keys making it trivially secure. Developers can also run these
validations locally: sopsorific check Deployment The application server is
configured with the instance profile generated by sopsorific so that it can
assume the IAM role that it needs to decrypt the secrets at runtime. Then, we
configure our init system, upstart, to execute the process wrapped in the
sopsorific run command. sopsorific run is another custom command we built to
make our usage of sops seamless. When the app starts up, the decrypted
secrets will be available as environment variables only to the process
running the application instead of being available system wide. This makes
our secrets less likely to unintentionally leak and our security team a
little happier. Here’s a simplified version of our upstart configuration.
start on starting web-app stop on stopping web-app respawn exec su -s
/bin/bash -l -c '\ cd /var/www/web-app; \ exec "$0" "$@"' web-app-owner --
sopsorific run 'bundle exec puma -C config/puma.rb' >> /var/log/upstart.log
2>&1 >Operations The 12-factor app methodology reminds us that sometimes
developers need to be able to run one-off admin tasks by starting up a
console on a live running server. This can be accomplished by establishing a
secure session on the server and running what you would normally run to get a
console with the sopsorific run command. For our Ruby on Rails apps, that
looks like this: sopsorific run 'bundle exec rails c' What did we learn?
Throughout this journey, we learned many things along the way. One of these
things was having an opinionated tool to help us manage secrets helped to
make sure we didn’t accidentally leave around low-entropy secrets from when
we were developing or testing out a feature. Having a tool to protect
ourselves from ourselves is vital to our workflow. Another thing we learned
was that some vendors provide secrets with lower entropy than we’d like for
API tokens or access keys and they don’t provide the option to choose
stronger secrets. As a result, we had to build features into sopsorific to
allow vendor provided secrets that didn’t meet the sopsorific standards by
default to be accepted by sopsorific’s checks. In the process of adopting
sops and building sopsorific, we discovered the welcoming community and
thoughtful maintainers of sops. We had the pleasure of contributing a few
changes to sops, and that left us feeling like we left the community a little
bit better than we found it. In doing all of these things, we’ve reduced
bottlenecks for developers so they can focus more on shipping features and
less on managing secrets.
11 min read

* HOW WE DEVELOP DESIGN COMPONENTS IN RAILS

How We Develop Design Components in Rails Learn how we use Rails components
to keep our code D.R.Y. (Don’t Repeat Yourself) and to implement UX design
changes effectively and uniformly.. A little over a year ago, we rebranded
our entire site . And we've even written on why we did it. We were able to
achieve a polished and consistent visual identity under a tight deadline
which was pretty great, but when we had our project retrospective, we
realized there was a pain point that still loomed over us. We still lacked a
good way to share markup across all our apps. We repeated multiple styles and
page elements throughout the app to make the experience consistent, but we
didn’t have a great way to reuse the common elements. We used Rails
partials in an effort to keep the code DRY (Don’t Repeat Yourself) while
sharing the same chunks of code and that got us pretty far, but it had its
limitations. There were aspects of the page elements (our shared chunks) that
needed to change based on their context or the page where they were being
rendered. Since these contexts change, we found ourselves either altering the
partials or copying and pasting their code into new views where additional
context-specific code could be added. This resulted in app code (the
content-specific code) becoming entangled with “system” (the base HTML) code.
Aside from partials, there was corresponding styling, or CSS, that was being
copied and sometimes changed when these shared partials were altered. This
meant when the designs were changed, we needed to find all of the places this
code was used to update it. Not only was this frustrating, but it was
inefficient. To find a solution, we drew inspiration from the component
approach used by modern design systems and JavaScript frameworks. A component
is a reusable code building block. Pages are built from a collection of
components that are shared across pages, but can be expanded upon or
manipulated in the context of the page they’re on. To implement our component
system, we created our internal gem, Style Closet. There are a few other
advantages and problems this system solves too: We’re able to make global
changes in a pretty painless way. If we need to change our brand colors,
let’s say, we can just change the CSS in Style Closet instead of scraping our
codebase and making sure we catch it everywhere. Reusable parts of code
remove the burden from engineers for things like CSS and allows time to focus
on and tackle other problems. Engineers and designers can be confident
they’re using something that’s been tested and validated across browsers.
We’re able to write tests specific to the component without worrying about
the use-case or increasing testing time for our apps. Every component is on
brand and consistent with every other app, feels polished, high quality and
requires lower effort to implement. It allows room for future growth which
will inevitably happen. The need for new elements in our views is not going
to simply vanish because we rebranded, so this makes us more prepared for the
future. How does it work? Below is an example of one of our components, the
flash. A flash message/warning is something you may use throughout your app
in different colors and with different text, but you want it to look
consistent. In our view, or the page where we write our HTML, we would write
the following to render what you see above: Here’s a breakdown of how that
one line, translates into what you see on the page. The component consists of
3 parts: structure, behavior and appearance. The view (the structure): a
familiar html.erb file that looks very similar to what would exist without a
component but a little more flexible since it doesn’t have its content hard
coded in. These views can also leverage Rails’ view yield functionality when
needed. Here’s the view partial from Style Closet: You can see how the
component.message is passed into the dedicated space/ slot keeping this code
flexible for reuse. A Ruby class (the behavior aside from any JavaScript):
the class holds the “props” the component allows to be passed in as well as
any methods needed for the view, similar to a presenter model. The props are
a fancier attr_accessor with the bonus of being able to assign defaults.
Additionally, all components can take a block, which is typically the content
for the component. This allows the view to be reusable. CSS (the appearance):
In this example, we use it to set things like the color, alignment and the
border. A note on behavior: Currently, if we need to add some JS behavior, we
use unobtrusive JavaScript or UJS sprinkles. When we add new components or
make changes, we update the gem (as well as the docs site associated with
Style Closet) and simply release the new version. As we develop and
experiment with new types of components, we test these bigger changes out in
the real world by putting them behind a feature flag using our open source
split testing framework, Test Track. What does the future hold? We’ve
used UJS sprinkles in similar fashion to the rest of the Rails world over the
years, but that has its limitations as we begin to design more complex
behaviors and elements of our apps. Currently we’re focusing on building more
intricate and and interactive components using React. A bonus of Style Closet
is how well it’s able to host these React components since they can simply be
incorporated into a view by being wrapped in a Style Closet component. This
allows us to continue composing a UI with self contained building blocks.
We’re always iterating on our solutions, so if you’re interested in expanding
on or solving these types of problems with us, check out our career page!
Addition information Since we introduced our internal Rails component code, a
fantastic open-source project emerged, Komponent, as well as a really great
and in-depth blog post on component systems in Rails from Evil Martians.
6 min read

* ENGINEERING THE LAUNCH OF A NEW BRAND FOR BETTERMENT

Engineering the Launch of a New Brand for Betterment In 2017, Betterment set
out to launch a new brand to better define the voice and feel of our product.
After months of planning across all teams at the company, it was time for our
engineering team to implement new and responsive designs across all user
experiences. The key to the success of this project was to keep the build
simple, maintain a low risk of regressions, and ensure a clear path to remove
the legacy brand code after launch. Our team learned a lot, but a few key
takeaways come to mind. Relieving Launch Day Stress with Feature Flags
Embarking on this rebrand project, we wanted to keep our designs under wrap
until launch day. This would entail a lot of code changes, however, as an
engineering team we believe deeply in carving up big endeavors into small
pieces. We’re constantly shipping small, vertical slices of work hidden
behind feature flags and we’ve even built our own open-source
system, TestTrack, to help us do so. This project would be no exception. On
day one, we created a feature flag and started shipping rebranded code to
production. Our team could then use TestTrack’s browser plugin to preview and
QA the new views along the way. When the day of the big reveal arrived, all
that would be left to do was toggle the flag to unveil the code we’d shipped
and tested weeks before. We then turned to the challenge of rebranding our
entire user experience. Isolating New Code with ActionPack Variants
ActionPack variants provide an elegant solution to rolling out significant
front end changes. Typically, variants are prescribed to help render distinct
views for different device types, but they are equally powerful when
rendering distinct HTML/CSS for any significant redesign. We created a
variant for our rebrand, which would be exposed based on the status of our
new feature flag. Our variant also required a new CSS file, where all our new
styles would live. Rails provides rich template resolver logic at every level
of the view hierarchy, and we were able to easily hook into it by simply
modifying the extensions of our new layout files. The rebranded version of
our application’s core layout imported the new CSS file and just like that,
we were in business. Implementing the Rebrand without a Spaghetti of “IF”
Statements Our rebranded experience would become the default at launch time,
so another challenge we faced was maintaining two worlds without creating
unneeded complexity. The “rebrand” variant and correlating template file
helped us avoid a tangled web of conditionals, and instead boiled down the
overhead to a toggle in our ApplicationController. This created a clean
separation between the old and new world and protected us against regressions
between the two. Rebranding a feature involved adding new styles to the
application_rebrand.css and implementing them in new rebrand view files.
Anything that didn’t get a new, rebranded template stayed in the world of
plain old production. This freedom from legacy stylesheets and markup were
critical to building and clearly demonstrating the new brand and value
proposition we wanted to demonstrate to the world. De-scoping with a
Lightweight Reskin To rebrand hundreds of pages in time, we had to iron out
the precise requirements of what it meant for our views to be “on brand”.
Working with our product team, we determined that the minimum amount of
change to consider a page rebranded was adoption of the new header, footer,
colors, and fonts. These guidelines constituted our “opted out”
experience — views that would receive this lightweight reskin immediately but
not the full rebrand treatment. This light coat of paint was applied to our
production layer, so any experience that couldn’t be fully redesigned within
our timeline would still get a fresh header and the fonts and colors that
reflected our new brand. As we neared the finish line, the rebranded world
became our default and this opt-out world became a variant. A
controller-level hook allowed us to easily distinguish which views were to
display opt-out mode with a single line of code. We wrote a controller-level
hook to update the variant and render the new layout files, reskinning
the package. Using a separate CSS manifest with the core changes enumerated
above, we felt free to dedicate resources to more thoroughly rebranding our
high traffic experiences, deferring improvements to pages that received the
initial reskin until after launch. As we’ve circled back to clean up these
lower-traffic views and give them the full rebrand treatment, we’ve come
closer to deleting the opt_out CSS manifest and deprecating our our legacy
stylesheets for good. Designing an Off Ramp Just as we are committed to
rolling out large changes in small portions, we are careful to avoid huge
changesets on the other side of a release. Fortunately, variants made
removing legacy code quite straightforward. After flipping the feature flag
and establishing “rebrand” as the permanent variant context, all that
remained was to destroy the legacy files that were no longer being rendered
and remove the variant name from the file extension of the new primary view
template. Controllers utilizing the opt_out hook made their way onto a to-do
list for this work without the stress of a deadline. The Other Side of the
Launch As the big day arrived, we enjoyed a smooth rebrand launch thanks to
the thoughtful implementation of our existing tools and techniques. We
leveraged ActionPack variants built into Rails and feature flags from
TestTrack in new ways, ensuring we didn’t need to make any architecture
changes. The end result: a completely fresh set of views and a new brand
we’re excited to share with the world at large.
5 min read

* REFLECTING ON OUR ENGINEERING APPRENTICESHIP PROGRAM

Reflecting on Our Engineering Apprenticeship Program Betterment piloted an
Apprentice Program to add junior talent to our engineering organization in
2017, and it couldn’t have been more successful or rewarding for all of us.
One year later, we’ve asked them to reflect on their experiences. In Spring
of 2017, Betterment’s Diversity & Inclusion Steering Committee partnered with
our Engineering Team to bring on two developers with non-traditional
backgrounds. We hired Jess Harrelson (Betterment for Advisors Team) and Fidel
Severino (Retail Team) for a 90 day Apprentice Program. Following their
apprenticeship, they joined us as full-time Junior Engineers. I’m Jess, a
recruiter here at Betterment, and I had the immense pleasure of working
closely with these two. It’s been an incredible journey, so I sat down with
them to hear first hand about their experiences. Tell us a bit about your
life before Betterment. Jess Harrelson: I was born and raised in Wyoming and
spent a lot of time exploring the outdoors. I moved to Nashville to study
songwriting and music business, and started a small label through which I
released my band’s album. I moved to New York after getting an opportunity at
Sony and worked for a year producing video content. Fidel Severino: I’m
originally from the Dominican Republic and moved to the United States at age
15. After graduation from Manhattan Center for Science and Mathematics High
School, I completed a semester at Lehman College before unfortunate family
circumstances required me to go back to the Dominican Republic. When I
returned to the United States, I worked in the retail sector for a few years.
While working, I would take any available time for courses on websites like
Codecademy and Team Treehouse. Can we talk about why you decided to become an
Engineer? Jess Harrelson: Coding became a hobby for me when I would make
websites for my bands in Nashville, but after meeting up with more and more
people in tech in the city, I knew it was something I wanted to do as a
career. I found coding super similar from a composition and structure
perspective, which allowed me to tap into the creative side of coding. I
started applying to every bootcamp scholarship I could find and received a
full scholarship to Flatiron School. I made the jump to start becoming an
engineer. Fidel Severino: While working, I would take any available time for
courses on websites like Codecademy and Team Treehouse. I have always been
interested in technology. I was one of those kids who “broke” their toys in
order to find out how they worked. I’ve always had a curious mind. My
interactions with technology prior to learning about programming had always
been as a consumer. I cherished the opportunity and the challenge that comes
with building with code. The feeling of solving a bug you’ve been stuck on
for a while is satisfaction at its best. Those bootcamps changed all of our
lives! You learned how to be talented, dynamic engineers and we reap the
benefit. Let’s talk about why you chose Betterment. Jess Harrelson: I first
heard of Betterment by attending the Women Who Code — Algorithms meetup
hosted at HQ. Paddy, who hosts the meetups, let us know that Betterment was
launching an apprenticeship program and after the meetup I asked how I could
get involved and applied for the program. I was also applying for another
different apprenticeship program but throughout the transparent,
straightforward interview process, the Betterment apprenticeship quickly
became my first choice. Fidel Severino: The opportunity to join Betterment’s
Apprenticeship program came via the Flatiron School. One of the main reasons
I was ecstatic to join Betterment was how I felt throughout the recruiting
process. At no point did I feel the pressure that’s normally associated with
landing a job. Keep in mind, this was an opportunity unlike any other I had
up to this point in my life, but once I got to talking with the interviewers,
the conversation just flowed. The way the final interview was setup made me
rave about it to pretty much everyone I knew. Here was a company that wasn’t
solely focused on the traditional Computer Science education when hiring an
apprentice/junior engineer. The interview was centered around how well you
communicate,work with others, and problem solve. I had a blast pair
programming with 3 engineers, which I’m glad to say are now my co-workers! We
are so lucky to have you! What would you say has been the most rewarding part
of your experience so far? Jess Harrelson: The direct mentorship during my
apprenticeship and exposure to a large production codebase. Prior to
Betterment, I only had experience with super small codebases that I built
myself or with friends. Working with Betterment’s applications gave me a
hands-on understanding of concepts that are hard to reproduce on a smaller,
personal application level. Being surrounded by a bunch of smart, helpful
people has also been super amazing and helped me grow as an engineer. Fidel
Severino: Oh man! There’s so many things I would love to list here. However,
you asked for the most rewarding, and I would have to say without a
doubt — the mentorship. As someone with only self-taught and Bootcamp
experience, I didn’t know how much I didn’t know. I had two exceptional
mentors who went above and beyond and removed any blocks preventing me from
accomplishing tasks. On a related note, the entire company has a
collaborative culture that is contagious. You want to help others whenever
you can; and it has been the case that I’ve received plenty of help from
others who aren’t even directly on my team. What’s kept you here? Fidel
Severino: The people. The collaborative environment. The culture of learning.
The unlimited supply of iced coffee. Great office dogs. All of the above!
Jess Harrelson: Seriously though, it was the combination of all that plus so
many other things. Getting to work with talented, smart people who want to
make a difference. This article is part of Engineering at Betterment.
6 min read

* A JOURNEY TO TRULY SAFE HTML RENDERING

A Journey to Truly Safe HTML Rendering We leverage Rubocop’s OutputSafety
check to ensure we’re being diligent about safe HTML rendering, so when we
found vulnerabilities, we fixed them. As developers of financial software on
the web, one of our biggest responsibilities is to keep our applications
secure. One area we need to be conscious of is how we render HTML. If we
don’t escape content properly, we could open ourselves and our customers up
to security risks. We take this seriously at Betterment, so we use tools like
Rubocop, the Ruby static analysis tool, to keep us on the right track. When
we found that Rubocop’s OutputSafety check had some holes, we plugged them.
What does it mean to escape content? Escaping content simply means replacing
special characters with entities so that HTML understands to print those
characters rather than act upon their special meanings. For example,
the < character is escaped using <, the >character is escaped using >, and
the & character is escaped using &. What could happen if we don’t
escape content? We escape content primarily to avoid opening ourselves up to
XSS (cross-site scripting) attacks. If we were to inject user-provided
content onto a page without escaping it, we’d be vulnerable to executing
malicious code in the user’s browser, allowing an attacker full control over
a customer’s session.This resource is helpful to learn more about XSS. Rails
makes escaping content easier Rails escapes content by default in some
scenarios, including when tag helpers are used. In addition, Rails has a few
methods that provide help in escaping content. safejoin escapes the content
and returns a SafeBuffer (a String flagged as safe) containing it. On the
other hand, some methods are just a means for us to mark content as already
safe. For example, the <%==interpolation token renders content as is
and raw, htmlsafe, and safe_concat simply return a SafeBuffer containing the
original content as is, which poses a security risk. If content is inside
a SafeBuffer, Rails won’t try to escape it upon rendering. Some examples:
html_safe: [1] pry(main)> include ActionView::Helpers::OutputSafetyHelper =>
Object [2] pry(main)> result = “hi”.html_safe => “hi” [3] pry(main)>
result.class => ActiveSupport::SafeBuffer raw: [1] pry(main)> result =
raw(“hi”) => “hi” [2] pry(main)> result.class => ActiveSupport::SafeBuffer
safe_concat: [1] pry(main)> include ActionView::Helpers::TextHelper => Object
[2] pry(main)> buffer1 = “hi”.html_safe => “hi” [3] pry(main)> result =
buffer1.safe_concat(“bye”) => “hibye” [4] pry(main)> result.class =>
ActiveSupport::SafeBuffer safe_join: [1] pry(main)> include
ActionView::Helpers::OutputSafetyHelper => Object [2] pry(main)> result =
safe_join([“hi”, “bye”]) => “<p>hi</p><p>bye</p>” [3] pry(main)> result.class
=> ActiveSupport::SafeBuffer => ActiveSupport::SafeBuffer Rubocop:
we’re safe! As demonstrated, Rails provides some methods that mark content as
safe without escaping it for us. Rubocop, a popular Ruby static analysis
tool, provides a cop (which is what Rubocop calls a “check”) to alert us when
we’re using these methods: Rails/OutputSafety. At Betterment, we explicitly
enable this cop in our Rubocop configurations so if a developer wants to mark
content as safe, they will need to explicitly disable the cop. This forces
extra thought and extra conversation in code review to ensure that the usage
is in fact safe. … Almost We were thrilled about the introduction of this
cop — we had actually written custom cops prior to its introduction to
protect us against using the methods that don’t escape content. However, we
realized there were some issues with the opinions the cop held about some of
these methods. The first of these issues was that the cop allowed usage
of raw and htmlsafewhen the usages were wrapped in safejoin. The problem with
this is that when raw or htmlsafe are used to mark content as already safe by
putting it in a SafeBuffer as is, safejoin will not actually do anything
additional to escape the content. This means that these usages
of raw and html_safeshould still be violations. The second of these issues
was that the cop prevented usages of raw and htmlsafe, but did not prevent
usages of safeconcat. safeconcat has the same functionality
as raw and htmlsafe — it simply marks the content safe as is by returning it
in a SafeBuffer. Therefore, the cop should hold the same opinions
about safe_concat as it does about the other two methods. So, we fixed it
Rather than continue to use our custom cops, we decided to give back to the
community and fix the issues we had found with the Rails/OutputSafety cop. We
began with this pull request to patch the first issue — change the behavior
of the cop to recognize raw and htmlsafe as violations regardless of being
wrapped in safejoin. We found the Rubocop community to be welcoming, making
only minor suggestions before merging our contribution. We followed up
shortly after with a pull request to patch the second issue — change the
behavior of the cop to disallow usages of safe_concat. This contribution was
merged as well. Contributing to Rubocop was such a nice experience that when
we later found that we’d like to add a configuration option to an unrelated
cop, we felt great about opening a pull request to do so, which was merged as
well. And here we are! Our engineering team here at Betterment takes security
seriously. We leverage tools like Rubocop and Brakeman, a static analysis
tool specifically focused on security, to make our software safe by default
against many of the most common security errors, even for code we haven’t
written yet. We now rely on Rubocop’s Rails/OutputSafety cop (instead of our
custom cop) to help ensure that our team is making good decisions about
escaping HTML content. Along the way, we were able to contribute back to a
great community. This article is part of Engineering at Betterment. These
articles are maintained by Betterment Holdings Inc. and they are not
associated with Betterment, LLC or MTG, LLC. The content on this article is
for informational and educational purposes only. © 2017–2019 Betterment
Holdings Inc.
5 min read

* BUILDING BETTER SOFTWARE FASTER WITH SHARED PRINCIPLES

Building Better Software Faster with Shared Principles Betterment’s playbook
for extending the golden hour of startup innovation at scale. Betterment’s
promise to customers rests on our ability to execute. To fulfill that
promise, we need to deliver the best product and tools available and then
improve them indefinitely, which, when you think about it, sounds incredibly
ambitious or even foolhardy. For a problem space as large as ours, we can’t
fulfill that promise with a single two pizza team. But a scaled engineering
org presents other challenges that could just as easily put the goal out of
reach. Centralizing architectural decision-making would kill ownership and
autonomy, and ensure your best people leave or never join in the first place.
On the other hand, shared-nothing teams can lead to information silos,
wheel-reinventing, and integration nightmares when an initiative is too big
for a squad to deliver alone. To meet those challenges, we believe it’s
essential to share more than languages, libraries, and context-free best
practices. We can collectively build and share a body of interrelated
principles driven by insights that our industry as a whole hasn’t yet
realized or is just beginning to understand. Those principles can form chains
of reasoning that allow us to run fearlessly, in parallel, and arrive at
coherent solutions better than the sum of their parts. I gave a talk about
Betterment’s engineering principles at a Rails at Scale meetup earlier last
year and promised to share them after our diligent legal team finished
reviewing. (Legal helpfully reviewed these principles months ago, but then I
had my first child, and, as you can imagine, priorities shifted.) Without any
further ado, here are Betterment’s Engineering Principles. You can also watch
my Rails at Scale talk to learn why we developed them and how we maintain
them. Parting Thoughts on Our Principles Our principles aren’t permanent
as-written. Our principles are a living document in an actual git repository
that we’ll continue to add to and revise as we learn and grow. Our principles
derive from and are matched to Betterment’s collective experience and
context. We don’t expect these principles to appeal to everybody. But we do
believe strongly that there’s more to agree about than our industry has been
able to establish so far. Consider these principles, along with our current
and future open source work, part of our contribution to that conversation.
What are the principles that your team share?
3 min read

* SUPPORTING FACE ID ON THE IPHONE X

Supporting Face ID on the iPhone X We look at how Betterment's mobile
engineering team developed Face ID for the latest phones, like iPhone X.
Helping people do what’s best with their money requires providing them with
responsible security measures to protect their private financial data. In
Betterment’s mobile apps, this means including trustworthy but convenient
local authentication options for resuming active login sessions. Three years
ago, in 2014, we implemented Touch ID support as an alternative to using PIN
entry in our iOS app. Today, on its first day, we’re thrilled to announce
that the Betterment iOS app fully supports Apple’s new Face ID technology on
the iPhone X. Trusting the Secure Enclave While we’re certainly proud of
shipping this feature quickly, a lot of credit is due to Apple for how
seriously the company takes device security and data privacy as a whole. The
hardware feature of the Secure Enclave included on iPhones since the 5S make
for a readily trustworthy connection to the device and its operating system.
From an application’s perspective, this relationship between a biometric
scanner and the Secure Enclave is simplified to a boolean response. When
requested through the Local Authentication framework, the biometry evaluation
either succeeds or fails separate from any given state of an application. The
“reply” completion closure of evaluatePolicy(_:localizedReason:reply:) This
made testing from the iOS Simulator a viable option for gaining a reasonable
degree of certainty that our application would behave as expected when
running on a device, thus allowing us to prepare a build in advance of having
a device to test on. LABiometryType Since we’ve been securely using Touch ID
for years, adapting our existing implementation to include Face ID was a
relatively minor change. Thanks primarily to the simple addition of
the LABiometryType enum newly available in iOS 11, it’s easy for our
application to determine which biometry feature, if any, is available on a
given device. This is such a minor change, in fact, that we were able to
reuse all of our same view controllers that we had built for Touch ID with
only a handful of string values that are now determined at runtime. One
challenge we have that most existing iOS apps share is the need to still
support older iOS versions. For this reason, we chose to
wrap LABiometryTypebehind our own BiometryType enum. This allows us to
encapsulate both the need to use an iOS 11 compiler flag and the need to
call canEvaluatePolicy(_:error:) on an instance of LAContext before accessing
its biometryType property into a single calculated property: See the Gist.
NSFaceIDUsageDescription The other difference with Face ID is the
new NSFaceIDUsageDescriptionprivacy string that should be included in the
application’s Info.plist file. This is a departure from Touch ID which does
not require a separate privacy permission, and which uses
the localizedReason string parameter when showing its evaluation prompt.
Touch ID evaluation prompt displaying the localized reason While Face ID does
not seem to make a use of that localizedReason string during evaluation,
without the privacy string the iPhone X will run the application’s Local
Authentication feature in compatibility mode. This informs the user that the
application should work with Face ID but may do so imperfectly. Face ID
permissions prompt without (left) and with (right) an
NSFaceIDUsageDescription string included in the Info.plist This compatibility
mode prompt is undesirable enough on its own, but it also clued us into the
need to check for potential security concerns opened up by this
forwards-compatibility-by-default from Apple. Thankfully, the changes to the
Local Authentication framework were done in such a way that we determined
there wasn’t a security risk, but it did leave a problematic user experience
in reaching a potentially-inescapable screen when selecting “Don’t Allow” on
the privacy permission prompt. Since we believe strongly in our users’ right
to say “no”, resolving this design issue was the primary reason we
prioritized shipping this update. Ship It If your mobile iOS app also
displays sensitive information and uses Touch ID for biometry-based local
authentication, join us in making the easy adaption to delight your users
with full support for Face ID on the iPhone X.
4 min read

* FROM 1 TO N: DISTRIBUTED DATA PROCESSING WITH AIRFLOW

From 1 to N: Distributed Data Processing with Airflow Betterment has built a
highly available data processing platform to power new product features and
backend processing needs using Airflow. Betterment’s data platform is unique
in that it not only supports offline needs such as analytics, but also powers
our consumer-facing product. Features such as Time Weighted
Returns and Betterment for Business balances rely on our data platform
working throughout the day. Additionally, we have regulatory obligations to
report complex data to third parties daily, making data engineering a mission
critical part of what we do at Betterment. We originally ran our data
platform on a single machine in 2015 when we ingested far less data with
fewer consumer-facing requirements. However, recent customer and data growth
coupled with new business requirements require us to now scale horizontally
with high availability. Transitioning from Luigi to Airflow Our single-server
approach used Luigi, a Python module created to orchestrate long-running
batch jobs with dependencies. While we could achieve high availability with
Luigi, it’s now 2017 and the data engineering landscape has shifted. We
turned to Airflow because it has emerged as a full-featured workflow
management framework better suited to orchestrate frequent tasks throughout
the day. To migrate to Airflow, we’re deprecating our Luigi solution on two
fronts: cross-database replication and task orchestration. We’re using
Amazon’s Database Migration Service (DMS) to replace our Luigi-implemented
replication solution and re-building all other Luigi workflows in Airflow.
We’ll dive into each of these pieces below to explain how Airflow mediated
this transition. Cross-Database Replication with DMS We used Luigi to extract
and load source data from multiple internal databases into our Redshift data
warehouse on an ongoing basis. We recently adopted Amazon’s DMS for
continuous cross-database replication to Redshift, moving away from our
internally-built solution. The only downside of DMS is that we are not aware
of how recent source data is in Redshift. For example, a task computing all
of a prior day’s activity executed at midnight would be inaccurate if
Redshift were missing data from DMS at midnight due to lag. In Luigi, we knew
when the data was pulled and only then would we trigger a task. However, in
Airflow we reversed our thinking to embrace DMS, using Airflow’s sensor
operators to wait for rows to be pushed from DMS before carrying on with
dependent tasks. High Availability in Airflow While Airflow doesn’t claim to
be highly available out of the box, we built an infrastructure to get as
close as possible. We’re running Airflow’s database on Amazon’s Relational
Database Service and using Amazon’s Elasticache for Redis queuing. Both of
these solutions come with high availability and automatic failover as add-ons
Amazon provides. Additionally, we always deploy multiple baseline Airflow
workers in case one fails, in which case we use automated deploys to stand up
any part of the Airflow cluster on new hardware. There is still one single
point of failure left in our Airflow architecture though: the scheduler.
While we may implement a hot-standby backup in the future, we simply accept
it as a known risk and set our monitoring system to notify a team member of
any deviances. Cost-Effective Scalability Since our processing needs
fluctuate throughout the day, we were paying for computing power we didn’t
actually need during non-peak times on a single machine, as shown in our
Luigi server’s load. Distributed workers used with Amazon’s Auto Scaling
Groups allow us to automatically add and remove workers based on outstanding
tasks in our queues. Effectively, this means maintaining only a baseline
level of workers throughout the day and scaling up during peaks when our
workload increases. Airflow queues allow us to designate certain tasks to run
on particular hardware (e.g. CPU optimized) to further reduce costs. We found
just a few hardware type queues to be effective. For instance, tasks that
saturate CPU are best run on a compute optimized worker with concurrency set
to the number of cores. Non-CPU intensive tasks (e.g. polling a database) can
run on higher concurrency per CPU core to save overall resources. Extending
Airflow Code Airflow tasks that pass data to each other can run on different
machines, presenting a new challenge versus running everything on a single
machine. For example, one Airflow task may write a file and a subsequent task
may need to email the file from the dependent task ran on another machine. To
implement this pattern, we use Amazon S3 as a persistent storage tier.
Fortunately, Airflow already maintains a wide selection of hooks to work with
remote sources such as S3. While S3 is great for production, it’s a little
difficult to work with in development and testing where we prefer to use the
local filesystem. We implemented a “local fallback” mixin for Airflow
maintained hooks that uses the local filesystem for development and testing,
deferring to the actual hook’s remote functionality only on production.
Development & Deployment We mimic our production cluster as closely as
possible for development & testing to identify any issues that may arise with
multiple workers. This is why we adopted Docker to run a production-like
Airflow cluster from the ground up on our development machines. We use
containers to simulate multiple physical worker machines that connect to
officially maintained local Redis and PostgreSQL containers. Development and
testing also require us to stand up the Airflow database with predefined
objects such as connections and pools for the code under test to function
properly. To solve this programmatically, we adopted Alembicdatabase
migrations to manage these objects through code, allowing us to keep our
development, testing, and production Airflow databases consistent. Graceful
Worker Shutdown Upon each deploy, we use Ansible to launch new worker
instances and terminate existing workers. But what happens when our workers
are busy with other work during a deploy? We don’t want to terminate workers
while they’re finishing something up and instead want them to terminate after
the work is done (not accepting new work in the interim).
Fortunately, Celery supports this shutdown behavior and will stop accepting
new work after receiving an initial TERM signal, letting old work finish up.
We use Upstart to define all Airflow services and simply wrap the TERM
behavior in our worker’s post-stop script, sending the TERM signal first,
waiting until we see the Celery process stopped, then finally poweroff the
machine. Conclusion The path to building a highly available data processing
service was not straightforward, requiring us to build a few specific but
critical additions to Airflow. Investing the time to run Airflow as a cluster
versus a single machine allows us to run work in a more elastic manner,
saving costs and using optimized hardware for particular jobs. Implementing a
local fallback for remote hooks made our code much more testable and easier
to work with locally, while still allowing us to run with Airflow-maintained
functionality in production. While migrating from Luigi to Airflow is not yet
complete, Airflow has already offered us a solid foundation. We look forward
to continuing to build upon Airflow and contributing back to the community.
This article is part of Engineering at Betterment. These articles are
maintained by Betterment Holdings Inc. and they are not associated with
Betterment, LLC or MTG, LLC. The content on this article is for informational
and educational purposes only. © 2017–2019 Betterment Holdings Inc.
6 min read

* A FUNCTIONAL APPROACH TO PENNY-PRECISE ALLOCATION

A Functional Approach to Penny-Precise Allocation How we solved the problem
allocating a sum of money proportionally across multiple buckets by leaning
on functional programming. An easy trap to fall into as an object-oriented
developer is to get too caught up in the idea that everything has to be an
object. I work in Ruby, for example, where the first thing you learn is
that everything is an object. Some problems, however, are better solved by
taking a functional approach. For instance, at Betterment, we faced the
challenge of allocating a sum of money proportionally across multiple
buckets. In this post, I’ll share how we solved the problem by leaning on
functional programming to allocate money precisely across proportional
buckets. The Problem Proportional allocation comes up often throughout our
codebase, but it’s easiest to explain using a fictional example: Suppose your
paychecks are $1000 each, and you always allocate them to your different
savings accounts as follows: College savings fund: $310 Buy a car fund: $350
Buy a house fund: $200 Safety net: $140 Now suppose you’re an awesome
employee and received a bonus of $1234.56. You want to allocate your bonus
proportionally in the same way you allocate your regular paychecks. How much
money do you put in each account? You may be thinking, isn’t this a simple
math problem? Let’s say it is. To get each amount, take the ratio of the
contribution from your normal paycheck to the total of your normal paycheck,
and multiply that by your bonus. So, your college savings fund would get:
(310/1000)*1234.56 = 382.7136 We can do the same for your other three
accounts, but you may have noticed a problem. We can’t split a penny into
fractions, so we can’t give your college savings fund the exact proportional
amount. More generally, how do we take an inflow of money and allocate it to
weighted buckets in a fair, penny-precise way? The Mathematical Solution:
Integer Allocation We chose to tackle the problem by working with integers
instead of decimal numbers in order to avoid rounding. This is easy to do
with money — we can just work in cents instead of dollars. Next, we settled
on an algorithm which pays out buckets fairly, and guarantees that the total
payments exactly sum to the desired payout. This algorithm is called
the Largest Remainder Method. Multiply the inflow (or the payout in the
example above) by each weight (where the weights are the integer amounts of
the buckets, so the contributions to the ticket in our example above), and
divide each of these products by the sum of the buckets, finding the integer
quotient and integer remainder Find the number of pennies that will be left
over to allocate by taking the inflow minus the total of the integer
quotients Sort the remainders in descending order and allocate any leftover
pennies to the buckets in this order The idea here is that the quotients
represent the amounts we should give each bucket aside from the leftover
pennies. Then we figure out which bucket deserves the leftover pennies. Let’s
walk through this process for our example: Remember that we’re working in
cents, so our inflow is 123456 and we need to allocate it across bucket
weights of [31000, 35000, 20000, 14000]. We find each integer quotient and
remainder by multiplying the inflow by the weight and dividing by the total
weight. We took advantage of the divmod method in Ruby to grab the integer
quotient and remainder in one shot, like so: buckets.map do |bucket| (inflow
* bucket).divmod(total_bucket_weight) end This gives us 12345631000/100000,
12345635000/100000, 12345620000 /100000 and 12345614000/100000. The integer
quotients with their respective remainders are [38271, 36000], [43209,
60000], [24691, 20000], [17283, 84000]. Next, we find the leftover pennies by
taking the inflow minus the total of the integer quotients, which is
123456 — (38271 + 43209 + 24691 + 17283) = 2. Finally, we sort our buckets in
descending remainder order (because the buckets with the highest remainders
are most deserving of extra pennies) and allocate the leftover pennies we
have in this order. It’s worth noting that in our case, we’re using Ruby’s
sort_by method, which gives us a nondeterministic order in the case where
remainders are equal. In this case, our fourth bucket and second bucket,
respectively, are most deserving. Our final allocations are therefore [38271,
43210, 24691, 17284]. This means that your college savings fund gets $382.71,
your car fund gets $432.10, your house fund gets $246.91, and your safety net
gets $172.84. The Code Solution: Make It Functional Given we have to manage
penny allocations between a person’s goals often throughout our codebase, the
last thing we’d want is to have to bake penny-pushing logic throughout our
domain logic. Therefore, we decided to extract our allocation code into a
module function. Then, we took it even further. Our allocation code doesn’t
need to care that we’re looking to allocate money, just that we’re looking to
allocate integers. What we ended up with was a black box ‘Allocator’ module,
with a public module function to which you could pass 2 arguments: an inflow,
and an array of weightings. Our Ruby code looks like this. The takeaway The
biggest lesson to learn from this experience is that, as an engineer, you
should not be afraid to take a functional approach when it makes sense. In
this case, we were able to extract a solution to a complicated problem and
keep our OO domain-specific logic clean.
5 min read

* HOW WE BUILT TWO-FACTOR AUTHENTICATION FOR BETTERMENT ACCOUNTS

How We Built Two-Factor Authentication for Betterment Accounts Betterment
engineers implemented Two-Factor Authentication across all our apps,
simplifying and strengthening our authentication code in the process. Big
change is more stressful than small change for people and software systems
alike. Dividing a big software project into small pieces is one of the most
effective ways to reduce the risk of introducing bugs. As we incorporated
Two-Factor Authentication (2FA) into our security codebase, we used a phased
rollout strategy to validate portions of the picture before moving on.
Throughout the project, we leaned heavily on our collaborative review
processes to both strengthen and simplify our authentication patterns. Along
the way, we realized that we could integrate our new code more easily if we
reworked surrounding access patterns with 2FA in mind. In other words, the 2F
itself was relatively easy. Getting the surrounding A right was much
trickier. Lead software engineer Chris LoPresto (right) helped lead the team
in building Two-Factor Authentication and App Passwords for Betterment
accounts. What We Built Two-factor authentication is a security scheme in
which users must provide two separate pieces of evidence to verify their
identity prior to being granted access. We recently introduced two different
forms of 2FA for Betterment apps: TOTP (Time-based One-Time Passwords) using
an authenticator app like Google Authenticator or Authy SMS verification
codes While SMS is not as secure as an authenticator app, we decided the
increased 2FA adoption it facilitated was worthwhile. Two authentication
factors are better than one, and it is our hope that all customers consider
taking advantage of TOTP. To Build or Not To Build When designing new
software features, there is a set of tradeoffs between writing your own code
and integrating someone else's. Even if you have an expert team of
developers, it can be quicker and more cost-efficient to use a third-party
service to set up something complex like an authentication service. We don't
suffer from Not Invented Here Syndrome at Betterment, so we evaluated
products like Authy and Duo at the start of this project. Both services offer
a robust set of authentication features that provide tremendous value with
minimal development effort. But as we envisioned integrating either service
into our apps, we realized we had work to do on our end. Betterment has
multiple applications for consumers, financial advisors, and 401(k)
participants that were built at different times with different technologies.
Unifying the authentication patterns in these apps was a necessary first step
in our 2FA project and would involve far more time and thought than building
the 2F handshake itself. This realization, coupled with the desire to build a
tightly integrated user experience, led to our decision to build 2FA
ourselves. Validating the Approach Once we decide to build something, we also
need to learn what not to build. Typically the best way to do that is to
build something disposable, throw it away, and start over atop freshly
learned lessons. To estimate the level of effort involved in building 2FA
user interactions, we built some rough prototypes. For our TOTP prototype we
generated a secret key, formatted it as a TOTP provisioning URI, and ran that
through a QR code gem. SMS required a third-party provider, Twilio, whose
client gem made it almost too easy to text each other "status updates." In
short order, we were confident in our ability to deliver 2FA functionality
that would work well. The quick ramp-up time and successful outcome of such
experiments are among the reasons we value working within the mature,
developer-friendly Rails ecosystem. While our initial prototypes were naive
and didn’t actually integrate with our auth systems, they formed the core of
the two-factor approaches that ultimately landed in our production codebase.
Introducing Concepts Before Behaviors Before 2FA entered the picture, our
authentication systems performed several tasks when a Betterment user
attempted to log in: Verify the provided email address matches an existing
user account Hash the provided password with the user’s salt and verify that
it matches the hashed password stored for the user account Verify the user
account is not locked for security reasons (e.g., too many incorrect password
attempts) Create persistent authorization context (e.g., browser cookie,
mobile token) to allow the user in the door Our authentication codebase
handled all of these tasks in response to a single user action (the act of
providing an email and password). As we began reworking this code to handle a
potential second user action (the act of providing a login challenge code)
the resultant branching logic became overly complex and difficult to
understand. Many of our prior design assumptions no longer held, so we paused
2FA development and spun our chairs around for an impromptu design meeting.
With 2FA requirements in mind, we decided to redesign our existing password
verification as the first of two potential authentication factors. We built,
tested, and released this new code independently. Our test suite gave us
confidence that our existing password and user state validations remained
unchanged within the new notion of a “first authentication factor.” Taking
this remodeling detour enabled us to deliver the concept of authentication
factors separately from any new system behaviors that relied on them. When we
resumed work on 2FA, the proposed “second authentication factor”
functionality now fell neatly into place. As a result, we delivered the new
2FA features far more safely and quickly than we could have if we attempted
to do everything in one fell swoop. Adding App Passwords Betterment customers
have the option of connecting their account to third-party services like
TurboTax and Mint. In keeping with our design principle of authorization
through impossibility, we created a dedicated API authentication strategy for
this use case, separate from our user-focused web authentication strategy.
Dedicated endpoints for these services provide read-only access to the bare
minimum content (e.g., account balances, transaction information). This
strict separation of concerns helps to keep our customers’ data safe and our
code simple. However, in order to connect to third-party services, our
customers still had to share their account password with these third parties.
While these institutions may be trustworthy, it is best to eliminate shared
trust wherever possible when designing secure systems. Because these services
do not support 2FA, it was now time to build a more secure password scheme
for third-party apps. We started by designing a simple process for customers
to generate app passwords for each service they wish to connect. These app
passwords are complex enough for safe usage yet employ an alphabet scheme
easily transcribed by our customers during setup. We then rewrote our API
authentication code to accept app passwords and to reject account passwords
for users with 2FA enabled. Our customers can now provide (and revoke) unique
read-only passwords for third party services they connect to Betterment.
Crucially, our app password scheme is compatible right out of the gate with
the new 2FA features we just launched. Slicing Up Deliverables Building 2FA
and app passwords involved a complex set of coordinated changes to sensitive
security-related code. To minimize the level of risk in this ambitious
project, we used the feature-toggling built into our open-source
split-testing framework TestTrack. By hiding the new functionality behind a
feature flag, we were able to to launch and validate features over the course
of months before publicly unveiling them to retail customers. Even
experienced programmers sometimes resist the “extra” work necessary to devise
a phased approach to a problem. Sometimes we struggle to disentangle pieces
that are ready for a partial launch from pieces that aren’t. But the point
cannot be overstated: Feature flags are our friends. At Betterment, we use
them to orchestrate the partial rollout of big features. We validate new
functionality before unveiling it to our user base at large. By facilitating
a series of small, testable code changes, feature flags provide one of the
most effective means of mitigating risks associated with shipping large
features. At the beginning of the 2FA project, we created a feature flag for
the engineers working on the project. As the project progressed, we flipped
the flag on for Betterment employees followed by a set of external beta
testers. By the time we announced 2FA in the release notes for our mobile
apps, the “new” code had been battle tested for months. Help Us Iterate The
final step of our 2FA project was to delete the aforementioned feature flag
from our codebase. While that was a truly satisfying moment, we all know that
our work is never done. If you’re interested in approaching our next set of
tricky projects in a nimble, iterative fashion, go ahead and apply.
8 min read

* HOW WE ENGINEERED BETTERMENT’S TAX-COORDINATED PORTFOLIO™

How We Engineered Betterment’s Tax-Coordinated Portfolio™ For our latest
tax-efficiency feature, Tax Coordination, Betterment’s solver-based portfolio
management system enabled us to manage and test our most complex algorithms.
Tax efficiency is a key consideration of Betterment’s portfolio management
philosophy. With our new Tax Coordination feature, we’re continuing the
mission to help our customers’ portfolios become as tax efficient as
possible. While new products can often be achieved using our existing
engineering abstractions, TCP brought the engineering team a new level of
complexity that required us to rethink how parts of our portfolio management
system were built. Here’s how we did it. A Primer on Tax Coordination
Betterment’s TCP feature is our very own, fully automated version of an
investment strategy known as asset location. If you’re not familiar with
asset location, it is a strategy designed to optimize after-tax returns by
placing tax-inefficient securities into more tax-advantaged accounts, such as
401(k)s and Individual Retirement Accounts (IRAs). Before we built TCP,
Betterment customers had each account managed as a separate, standalone
portfolio. For example, customers could set up a Roth IRA with a portfolio of
90% stocks and 10% bonds to save for retirement. Separately, they could set
up a taxable retirement account invested likewise in 90% stocks and 10%
bonds. Now, Betterment customers can turn on TCP in their accounts, and their
holdings in multiple investment accounts will be managed as a single
portfolio allocation, but rearranged in such a way that the holdings across
those accounts seek to maximize the overall portfolio’s after-tax returns. To
illustrate, let’s suppose you’re a Betterment customer with three different
accounts: a Roth IRA, a traditional IRA, and a taxable retirement account.
Let’s say that each account holds $50,000, for a total of $150,000 in
investments. Now assume that the $50,000 in each account is invested into a
portfolio of 70% stocks and 30% bonds. For reference, consider the diagram.
The circles represent various asset classes, and the bar shows the allocation
for all the accounts, if added together. Each account has a 70/30 allocation,
and the accounts will add up to 70/30 in the aggregate, but we can do better
when it comes to maximizing after-tax returns. We can maintain the aggregate
70/30 asset allocation, but use the available balances of $50,000 each, to
rearrange the securities in such a way that places the most tax-efficient
holdings into a taxable account, and the most tax-inefficient ones into IRAs.
Here’s a simple animation solely for illustrative purposes: Asset Location in
Action The result is the same 70/30 allocation overall, except TCP has now
redistributed the assets unevenly, to reduce future taxes. How We Modeled the
Problem The fundamental questions the engineering team tried to answer were:
How do we get our customers to this optimal state, and how do we maintain it
in the presence of daily account activity? We could have attempted to
construct a procedural-style heuristic solution to this, but the complexity
of the problem led us to believe this approach would be hard to implement and
challenging to maintain. Instead, we opted to model our problem as a linear
program. This made the problem provably solvable and quick to compute—on the
order of milliseconds per customer. Let’s consider a hypothetical customer
account example. Meet Joe Joe is a hypothetical Betterment customer. When he
signed up for Betterment, he opened a Roth IRA account. As an avid saver, Joe
quickly reached his annual Roth IRA contribution limit of $5,500. Wanting to
save more for his retirement, he decided to open up a Betterment taxable
account, which he funded with an additional $11,000. Note that the
contribution limits mentioned in this example are as of the time this article
was published. Limits are subject to change from year to year, so please
defer to IRS guidelines for current limits. See IRA limits here and 401(k)
limits. Joe isn’t one to take huge risks, so he opted for a moderate asset
allocation of 50% stocks and 50% bonds in both his Roth IRA and taxable
accounts. To make things simple, let’s assume that both portfolios are only
invested in two asset classes: U.S. total market stocks and emerging markets
bonds. In his taxable account, Joe holds $5,500 worth of U.S. total market
stocks in VTI (Vanguard Total Stock Market ETF), and $5,500 worth of emerging
markets bonds in VWOB (Vanguard Emerging Markets Bond ETF). Let’s say that
his Roth IRA holds $2,750 of VTI, and $2,750 of VWOB. Below is a table
summarizing Joe’s holdings: Account Type: VTI (U.S. Total Market) VWOB
(Emerging Markets Bonds) Account Total Taxable $5,500 $5,500 $11,000 Roth
$2,750 $2,750 $5,500 Asset Class Total $8,250 $8,250 $16,500 To begin to
construct our model for an optimal asset location strategy, we need to
consider the relative value of each fund in both accounts. A number of
factors are used to determine this, but most importantly each fund’s tax
efficiency and expected returns. Let’s assume we already know that VTI has a
higher expected value in Joe’s taxable account, and that VWOB has a higher
expected value in his Roth IRA. To be more concrete about this, let’s define
some variables. Each variable represents the expected value of holding a
particular fund in a particular account. For example, we’re representing the
expected value of holding VTI in your Taxable as which we’ve defined to be
0.07. More generally, Let’s let be the expected value of holding fund F in
account A. Circling back to the original problem, we want to rearrange the
holdings in Joe’s accounts in a way that’s maximally valuable in the future.
Linear programs try to optimize the value of an objective function. In this
example, we want to maximize the expected value of the holdings in Joe’s
accounts. The overall value of Joe’s holdings are a function of the specific
funds in which he has investments. Let’s define that objective function.
You’ll notice the familiar terms—measuring the expected value of holding each
fund in each account, but also you’ll notice variables of the form Precisely,
this variable represents the balance of fund F in account A. These are our
decision variables—variables that we’re trying to solve for. Let’s plug in
some balances to see what the expected value of V is with Joe’s current
holdings: V=0.07*5500+0.04*5500+0.06*2750+0.05*2750=907.5 Certainly, we can
do better. We cannot just assign arbitrarily large values to the decision
variables due to two restrictions which cannot be violated: Joe must maintain
$11,000 in his taxable account and $5,500 in his Roth IRA. We cannot assign
Joe more money than he already has, nor can we move money between his Roth
IRA and taxable accounts. Joe’s overall portfolio must also maintain its
allocation of 50% stocks and 50% bonds—the risk profile he selected. We don’t
want to invest all of his money into a single fund. Mathematically, it’s
straightforward to represent the first restriction as two linear constraints.
Simply put, we’ve asserted that the sum of the balances of every fund in
Joe’s taxable account must remain at $11,000. Similarly, the sum of the
balances of every fund in his Roth IRA must remain at $5,500. The second
restriction—maintaining the portfolio allocation of 50% stocks and 50%
bonds—might seem straightforward, but there’s a catch. You might guess that
you can express it as follows: The above statements assert that the sum of
the balances of VTI across Joe’s accounts must be equal to half of his total
balance. Similarly, we’re also asserting that the sum of the balances of VWOB
across Joe’s accounts must be equal to the remaining half of his total
balance. While this will certainly work for this particular example,
enforcing that the portfolio allocation is exactly on target when determining
optimality turns out to be too restrictive. In certain scenarios, it’s
undesirable to buy or to sell a specific fund because of tax consequences.
These restrictions require us to allow for some portfolio drift—some
deviation from the target allocation. We made the decision to maximize the
expected after-tax value of a customer’s holdings after having achieved the
minimum possible drift. To accomplish this, we need to define new decision
variables. Let’s add them to our objective function: is the dollar amount
above the target balance in asset class AC. Similarly, is the dollar amount
below the target balance in asset class AC. For instance, is the dollar
amount above the target balance in emerging markets bonds—the asset class to
where VWOB belongs. We still want to maximize our objective function V.
However, with the introduction of the drift terms, we want every dollar
allocated toward a single fund to incur a penalty if it moves the target
balance for that fund’s asset class below or above its target amount. To do
this, we can relate the terms with the terms using linear constraints. As
shown above, we’ve asserted that the sum of the balances in funds including
U.S. total market stocks (in this case, only VTI), plus some net drift amount
in that asset class, must be equal to the target balance of that asset class
in the portfolio (which in this case, is 50% of Joe’s total holdings).
Similarly, we’ve also done this for emerging markets bonds. This way, if we
can’t achieve perfect allocation, we have a buffer that we can fill—albeit at
a penalty. Now that we have our objective function and constraints set up, we
just need to solve these equations. For this we can use a mathematical
programming solver. Here’s the optimal solution: Managing Engineering
Complexity Reaching the optimal balances would require our system to buy and
sell securities in Joe’s investment accounts. It’s not always free for Joe to
go from his current holdings to optimal ones because buying and selling
securities can have tax consequences. For example, if our system sold
something at a short-term capital gain in Joe’s taxable account, or bought a
security in his Roth IRA that was sold at a loss in the last 30
days—triggering the wash-sale rule, we would be negatively impacting his
after-tax return. In the simple example above with two accounts and two
funds, there are a total of four constraints. Our production model is orders
of magnitude more complex, and considers each Betterment customer’s
individual tax lots, which introduces hundreds of individual constraints to
our model. Generating these constraints that ultimately determine buying and
selling decisions can often involve tricky business logic that examines a
variety of data in our system. In addition, we knew that as our work on TCP
progressed, we were going to need to iterate on our mathematical model.
Before diving head first into the code, we made it a priority to be cognizant
of the engineering challenges we would face. Key Principles for Using Tax
Coordination on a Retirement Goal As a result, we wanted to make sure that
the software we built respected four key principles, which are: Isolation
from third-party solver APIs. Ability to keep pace with changes to the
mathematical model, e.g., adding, removing, and changing the constraints and
the objective function must be quick and painless. Separation of concerns
between how we accessed data in our system and the business logic defining
algorithmic behavior. Easy and comprehensive testing. We built our own
internal framework for modeling mathematical programs that was not tied to
our trading system’s domain-specific business logic. This gave us the
flexibility to switch easily between a variety of third-party mathematical
programming solvers. Our business logic that generates the model knows only
about objects defined by our framework, and not about third-party APIs. To
incorporate a third-party solver into our system, we built a translation
layer that received our system-generated constraints and objective function
as inputs, and utilized those inputs to solve the model using a third-party
API. Switching between third-party solvers simply meant switching
implementations of the interface below. We wanted that same level of
flexibility in changing our mathematical model. Changing the objective
function and adding new constraints needed to be easy to do. We did this by
providing well-defined interfaces that give engineers access to core system
data needed to generate our model. This means that an engineer implementing a
change to the model would only need to worry about implementing algorithmic
behavior, and not about how to retrieve the data needed to do that. To add a
new set of constraints, engineers simply provide an implementation of a
TradingConstraintGenerator. Each TradingConstraintGenerator knows about all
of the system related data it needs to generate constraints. Through
dependency injection, the new generator is included among the set of
generators used to generate constraints. The sample code below illustrates
how we generated the constraints for our model. With hundreds of constraints
and hundreds of thousands of unique tax profiles across our customer base, we
needed to be confident that our system made the right decisions in the right
situations. For us, that meant having clear, readable tests that were a joy
to write. Below is a test written in Groovy, which sets up fixture data that
mimics the exact situation in our “Meet Joe” example. We not only had unit
tests such as the one above to test simple scenarios where a human could
calculate the outcome, but we also ran the optimizer in a simulated
production-like environment, through hundreds of thousands of scenarios that
closely resembled real ones. During testing, we often ran into scenarios
where our model had no feasible solution—usually due to a bug we had
introduced. As soon as the bug was fixed, we wanted to ensure that we had
automated tests to handle a similar issue in the future. However, with so
many sources of input affecting the optimized result, writing tests to cover
these cases was very labor-intensive. Instead, we automated the test setup by
building tools that could snapshot our input data as of the time the error
occurred. The input data was serialized and automatically fed back into our
test fixtures. Striving for Simplicity At Betterment, we aim to build
products that help our customers reach their financial goals. Building new
products can often be done using our existing engineering abstractions.
However, TCP brought a new level of complexity that required us to rethink
the way parts of our trading system were built. Modeling and implementing our
portfolio management algorithms using linear programming was not easy, but it
ultimately resulted in the simplest possible system needed to reliably pursue
optimal after-tax returns. To learn more about engineering at Betterment,
visit the engineering page on the Betterment Resource Center. All return
examples and return figures mentioned above are for illustrative purposes
only. For much more on our TCP research, including additional considerations
on the suitability of TCP to your circumstances, please see our white paper.
See full disclosure for our estimates and Tax Coordination in general.
13 min read

* WHAT’S THE BEST AUTHORIZATION FRAMEWORK? NONE AT ALL

What’s the Best Authorization Framework? None At All Betterment’s engineering
team builds software more securely by forgoing complicated authorization
frameworks. As a financial institution, we take authorization—deciding who is
allowed to do what—extremely seriously. But you don't need an authorization
framework to build an application with robust security and access control. In
fact, the increased complexity and indirection that authorization frameworks
require can actually make your software less secure. At Betterment, we follow
key principles to avoid authorization frameworks altogether in many of our
applications. Of course, it would be impractical to completely avoid
authorization features in internal applications that support our team's
diverse responsibilities. For these apps, Betterment reframed the problem and
built a radically simpler authorization framework by following a few simple
ground rules. The Downside of Frameworks Application security is tough to get
right. Some problems, like cryptography, are so thorny that even implementing
a well-known algorithm yourself would be malpractice. Professional teams lean
on proven libraries and frameworks to solve hard problems, such as NaCl for
crypto and Devise for authentication. But authorization isn't like crypto or
authentication. At Betterment, we've found that authorization rules emerge
naturally from our business logic, and we believe that's where they belong.
Most authorization frameworks blur the lines around this crucial piece of a
business domain, leaving engineers to wonder whether and how to leverage the
authorization framework versus treating a given condition as a regular
business rule. Over time, these hundreds or thousands of successive decisions
can result in a minefield of inconsistent, unauditable semantics, and
ultimately the confusion can lead to bugs and vulnerabilities. Betterment has
structured our entire platform around the security of our customers. By
following the principles in this article, we've simplified the authorization
problem, making decisions easy and accountable, and achieving even higher
confidence in our systems’ safety. Authorization Without the Framework Here
are the principles that keep Betterment's most security-critical apps free of
authorization frameworks: Authorization Through Impossibility The most
fundamental authorization rule of an app like Betterment’s is that users
should only be able to see their own financial profiles. That could be
modeled in an authorization framework by specifying that users only have
access to their own profiles, and then querying the framework before display.
But there's a better way: Make it impossible. We simply don't have an
endpoint to allow somebody to request another user's information. The most
secure code is the code that never got written. Authorization Through
Navigability Most things that could be described as authorization rules
emerge naturally from relationships. For instance, if I'm co-owner of a joint
account opened by my spouse, then I am allowed to see that account. Rather
than add another layer of indirection, we simply rely on our data model, and
only expose data that can be reached through the app's natural relationships.
The app can’t even locate data that should be inaccessible. Authorization
Through Application Boundaries Many arguments for heavyweight authorization
arise from administrative access. What if a Customer Support representative
needs the ability to help a customer make a change to an account? Shouldn’t
there be a simple override available to her? No, at least not within the same
app. Each application should have a single audience. For instance, our
consumer-facing app lets customers view and manage their investments. Our
internal Customer Support app allows our Customer Support representatives to
look up the accounts of customers they’re assisting. Our Ops app gives our
broker-dealer operations team the tools to monitor risk systems and manage
transactions. This isn’t just a boon for security—it's better software. Each
app is built for a specific team with exactly the tools and information it
needs. But Sometimes You Need a Framework The real world is complicated. At
Betterment, where we’re approaching 200 employees across many disciplines, we
know this well. Some tasks require a senior team member. Some trainees only
need limited access to a system. As an engineering organization, you could
build a new app for every single title and level within your company, but
it'd be confusing for team members whose jobs are more similar than they are
different, and mind-bogglingly expensive to maintain. How do you move forward
without going whole-hog on heavyweight authorization? By setting a few ground
rules for ourselves, we were able to design a lightweight, auditable, and
intuitive approach to authorization that has scaled with our team and stayed
dead simple. Here were the rules we followed: 1. Privilege Levels Are Named
After the People Who Use the Software As Phil Karlton once said, naming
things is one of the two hard things in computer science. Software is built
by people, for people. To tend toward security over the long term, the names
we use must be intuitive to both the engineers building the software and its
users. For instance, our Customer Support app has the levels trainee, staff,
and manager. As the organization grows and matures, internal jargon will
change too. It's crucial to make sure these names remain meaningful, updating
them as needed. 2. Privilege Levels Are Linear Once you've built a separate
app for each audience, you don't need to support multiple orthogonal roles—a
single ladder is enough. In our Customer Support app, staff can do a superset
of what trainees can do, and managers can do a superset of what staff can do.
In combination with the naming rule, this means that you can easily add
levels above, below, or between the existing levels as your team grows
without rethinking every privilege in the system. Eventually, you may find
that a single ladder isn’t enough. This is a great opportunity to force the
conversation about how roles within your team are diverging, and build
software to match. 3. REST Resources Are the Only Resources, and HTTP Verbs
Are the Only Actions At their core, all authorization systems determine
whether a user has permission to perform an action on a resource. Much of the
complexity in a traditional authorization system comes from defining those
resources and their relationships to users, usually in terms of database
entities. RESTful applications have the concepts of resources and actions in
their DNA, so there's no need to reinvent that wheel. REST doesn't just give
us the ability to define simple resources like accounts that correspond to
database tables. We can build up semantic concepts like a search resource to
enable basic user lookup, and a secure_search resource that allows senior
team members to query by sensitive details like Social Security number. By
treating HTTP verbs as our actions, we can easily allow a trainee to GET an
account but not PATCH it. 4. Authorization Framework Calls Are Simple, and
Stay in the Controllers and Views If the only way to initiate an action is
through a REST endpoint, there's no need to add complexity to your business
logic layer. The authorization framework only has two features: To answer
whether a user can request a given resource with a given verb. App developers
use this feature to customize views (e.g., show or hide a button). To abort
requests for a resource if the answer is no. And that's all you need. Help Us
Solve the Hard Problems Security is a mindset, philosophy, and practice more
than a set of tools or solutions, and many challenges lie ahead. If you’d
like to help Betterment design, build, and spread radically simpler and more
secure solutions to the hard problems our customers and team face, go ahead
and apply.
7 min read

* THE EVOLUTION OF THE BETTERMENT ENGINEERING INTERVIEW

The Evolution of the Betterment Engineering Interview Betterment’s
engineering interview now includes a pair programming experience where
candidates are tested on their collaboration and technical skills. Building
and maintaining the world’s largest independent robo-advisor requires a
world-class team of human engineers. This means we must continuously iterate
on our recruiting process to remain competitive in attracting and hiring top
talent. As our team has grown impressively from five to more than 50
engineers (and this was just in the last three years), we’ve significantly
improved our abilities to make clearer hiring decisions, as well as shortened
our total hiring timeline. Back in the Day Here’s how our interview process
once looked: Resumé review Initial phone screen Technical phone screen
Onsite: Day 1 Technical interview (computer science fundamentals) Technical
interview (modelling and app design) Hiring manager interview Onsite: Day 2
Product and design interview Company founder interview Company executive
interview While this process helped in growing our engineering team, it began
showing some cracks along the way. The main recurring issue was that hiring
managers were left uncertain as to whether a candidate truly possessed the
technical aptitude and skills to justify making them an employment offer.
While we tried to construct computer science and data modelling problems that
led to informative interviews, watching candidates solve these problems still
wasn’t getting to the heart of whether they’d be successful engineers once at
Betterment. In addition to problems arising from the types of questions
asked, we saw that one of our primary interview tools, the whiteboard, was
actually getting in the way; many candidates struggled to communicate their
solutions using a whiteboard in an interview setting. The last straw for
using whiteboards came from feedback provided by Betterment’s Women in
Technology group. When I sat down with them to solicit feedback on our entire
hiring process, they pointed to the whiteboard problem-solving dynamics (one
to two engineers sitting, observing, and judging the candidate standing at a
whiteboard) as unnatural and awkward. It was clear this part of the
interviewing process needed to go. We decided to allow candidates the choice
of using a whiteboard if they wished, but it would no longer be the default
method for presenting one’s skills. If we did away with the whiteboard, then
what would we use? The most obvious alternative was a computer, but then many
of our engineers expressed concerns with this method, having had bad
experiences with computer-based interviews in the past. After spirited
internal discussions we landed on a simple principle: We should provide
candidates the most natural setting possible to demonstrate their abilities.
As such, our technical interviews switched from whiteboards to computers.
Within the boundaries of that principle, we considered multiple interview
formats, including take-home and online assessments, and several variations
of pair programming interviews. In the end, we landed on our own flavor of a
pair programming interview. Today: A Better Interview Here’s our revised
interview process: Resumé review Initial phone screen Technical phone screen
Onsite: Technical interview 1 Ask the candidate to describe a recent
technical challenge in detail Set up the candidate’s laptop Introduce the
pair programming problem and explore the problem Pair programming (optional,
time permitting) Technical interview 2 Pair programming Technical interview 3
Pair programming Ask-Me-Anything session Product and design interview Hiring
manager interview Company executive interview While an interview setting may
not offer pair programming in its purest sense, our interviewers truly
participate in the process of writing software with the candidates. Instead
of simply instructing and watching candidates as they program, interviewers
can now work with them on a real-world problem, and they take turns in
control of the keyboard. This approach puts candidates at ease, and feels
closer to typical pair programming than one might expect. As a result, in
addition to learning how well a candidate can write code, we learn how well
they collaborate. We also split the main programming portion of our original
interview into separate sections with different interviewers. It’s nice to
give candidates a short break in between interviews, but the main reason for
the separation is to evaluate the handoff. We like to evaluate how well a
candidate explains the design decisions and progress from one interviewer to
the next. Other Improvements We also streamlined our question-asking process
and hiring timeline, and added an opportunity for candidates to speak with
non-interviewers. Questions Interviews are now more prescriptive regarding
non-technical questions. Instead of multiple interviewers asking a candidate
about the same questions based on their resumé, we prescribe topics based on
the most important core competencies of successful (Betterment) engineers.
Each interviewer knows which competencies (e.g., software craftsmanship) to
evaluate. Sample questions, not scripts, are provided, and interviewers are
encouraged to tailor the competency questions to the candidates based on
their backgrounds. Timeline Another change is that the entire onsite
interview is completed in a single day. This can make scheduling difficult,
but in a city as competitive as New York is for engineering talent, we’ve
found it valuable to get to the final offer stage as quickly as possible.
Discussion Finally, we’ve added an Ask-Me-Anything (AMA) session—another idea
provided by our Women in Technology group. While we encourage candidates to
ask questions of everyone they meet, the AMA provides an opportunity to meet
with a Betterment engineer who has zero input on whether or not to hire them.
Those “interviewers” don’t fill out a scorecard, and our hiring managers are
forbidden from discussing candidates with them. Ship It Our first run of this
new process took place in November 2015. Since then, the team has met several
times to gather feedback and implement tweaks, but the broad strokes have
remained unchanged. As of July 2016, all full-stack, mobile, and
site-reliability engineering roles have adopted this new approach. We’re
continually evaluating whether to adopt this process for other roles, as
well. Our hiring managers now report that they have a much clearer
understanding of what each candidate brings to the table. In addition, we’ve
consistently received high marks from candidates and interviewers alike, who
prefer our revamped approach. While we didn’t run a scientifically valid
split-test for the new process versus the old (it would’ve taken years to
reach statistical significance), our hiring metrics have improved across the
board. We’re happy with the changes to our process, and we feel that it does
a great job of fully and honestly evaluating a candidate’s abilities, which
helps Betterment to continue growing its world-class team. For more
information about working at Betterment, please visit our Careers page. More
from Betterment: Server Javascript: A Single-Page App To…A Single-Page App
Going to Work at Betterment Engineering at Betterment: Do You Have to Be a
Financial Expert? Determination of largest independent robo-advisor reflects
Betterment LLC’s distinction of having highest number of assets under
management, based on Betterment’s review of assets self-reported in the SEC’s
Form ADV, across Betterment’s survey of independent robo-advisor investing
services as of March 15, 2016. As used here, “independent” means that a
robo-advisor has no affiliation with the financial products it recommends to
its clients.
6 min read

* SERVER JAVASCRIPT: A SINGLE-PAGE APP TO…A SINGLE-PAGE APP

Server JavaScript: A Single-Page App To…A Single-Page App Betterment
engineers recently migrated a single-page backbone app to a server-driven
Rails experience. Betterment engineers (l-r): Arielle Sullivan, J.P.
Patrizio, Harris Effron, and Paddy Estridge We recently changed the way we
organize our major business objects. All the new features we’re working on
for customers with multiple accounts—be they Individual Retirement Accounts
(IRAs), taxable investment accounts, trusts, joint accounts, or even synced
outside accounts—required this change. We were also required to rename
several core concepts, and make some big changes to the way we display data
to our customers. Currently, our Web application is a JavaScript single-page
app that uses a frontend MVC framework, backed by a JSON api. We use
Marionette.js, a framework built on top of Backbone.js, to help us organize
our JavaScript and manage page state. It was built out over the past few
years, with many different paradigms and patterns. After some time, we found
ourselves with an application that had a lot of complexity and splintered
code practices throughout. The complexity partly arose from the fact that we
needed to duplicate business logic from the backend and the frontend. By only
using the server as a JSON API, the frontend needed to know exactly what to
do with that JSON. It needed to be able to organize the different server
endpoints (and its data) into models, as well as know how to take those
models and render them into views. For example, a core concept such as “an
account has some money in it” needed to be separately represented in the
frontend codebase, as well as the server. This led to maintenance issues, and
it made our application harder to test. The additional layer of frontend
complexity made it even harder for new hires to be productive from day one.
When we first saw this project on the horizon, we realized it would end up
requiring a substantial refactor of our web app. We had a few options:
Rewrite the JavaScript in a way that makes it simpler and easier to use.
Don’t rewrite JavaScript. We went with option 2. Instead of using a client
side MVC framework to help enable us to write a single page app, we opted to
use our Rails server to render views, and we used server generated JavaScript
responses to make the app feel just as snappy for our customers. We achieved
the same UX wins as a single page app with a fraction of the code. Method to
the Madness The crux of our new pattern is this: We use Rails’ unobtrusive
JavaScript (ujs) library to declare that forms and links should be submitted
using AJAX. Our server then gets an AJAX rest request as usual, but instead
of rendering the data as JSON, it responds to the request with a snippet of
JavaScript. That JavaScript gets evaluated by the browser. The “trick” here
is that JavaScript is a simple call to jQuery’s html method, and we use
Rails’ built-in partial view rendering to respond with all the HTML we need.
Now, the frontend just needs to blindly listen to the server, and render the
HTML as instructed. An Example As a simple example, let’s say we want to edit
a user’s home address. Using the JavaScript single page app framework, we
would need a few things. First, we want an address model, which we map to our
“/addresses” endpoint. Next, we need a View, that represents our form for
editing the address. We need a frontend template for that view. Then, we need
a route in our frontend for navigating to this page. And for our server, we
need to add a route, a controller, a model, and a jbuilder to render that
model as JSON. A Better Way With our new paradigm, we can skip most of this.
All we need is the server. We still have our route, controller, and model,
but instead of a jbuilder for returning JSON, we can port our template to
embedded Ruby, and let the server do all the work. Using UJS patterns, our
view can live completely on the server. There are a few major wins here:
Unifying our business logic. The server is responsible for knowing about (1)
our data, (2) how to wrap that data into rich domain models that own our
business logic, (3) how to render those models into views, and (4) how to
render those views on the page. The client needs to know almost nothing. Less
JavaScript. We aren’t getting rid of all the JavaScript in our application.
Certain snappy user experience elements don’t work as well without
JavaScript. Interactive elements, some delightful animations, and other
frontend behaviors still need it. For these things, we are using HTML data
elements to specify behaviors. For example, we can tag an element with a
data-behavior-dropdown, and then we have some simple, well organized global
JavaScript that knows how to wrap that element in some code that makes it
more interactive. We are hoping that by using these patterns, we can limit
our use of JavaScript to only know about how to enhance HTML, not how
to automatically calculate net income when trying to distribute excess tax
year contributions from an IRA (something that our frontend JavaScript used
to know how to do). We can do this migration in small pieces. Even with this
plan, migrating a highly complex web application isn’t easy. We decided to
tackle it using a tab-by-tab approach. We’ve written a few useful helpers
that allow us to easily plug in our new server-driven style into our existing
Marionette application. By doing this piecemeal, we are hoping to bake in
useful patterns early on, which we can iterate and use to make migrating the
next part even simpler. If we do this right, we will be able to swap
everything to a normal Rails app with minimal effort. Once we migrate to
Rails 5, we should even be able to easily take advantage of Turbolinks 3,
which is a conventionalized way to do regional AJAX updates. This new pattern
will make building out newer and even more sophisticated features easier, so
we can focus on encapsulating the business logic once. Onboarding new hires
familiar with the Rails framework will be faster, and those who aren’t
familiar can find great external (and internal) resources to learn it. We
think that our Web app will be just as pleasant to use, and we can more
quickly enhance and build new features going forward.
6 min read

* MODERN DATA ANALYSIS: DON’T TRUST YOUR SPREADSHEET

Modern Data Analysis: Don’t Trust Your Spreadsheet To conduct research in
business, you need statistical computing that you easily reproduce, scale,
and make accessible to many stakeholders. Just as the Ford Motor Company
created efficiency with assembly line production and Pixar opened up new
worlds by computerizing animation, companies now are innovating and improving
the craft of using data to do business. Betterment is one of them. We are
built from the ground up on a foundation of data. It’s only been about three
decades since companies started using any kind of computer-assisted data
analysis. The introduction of the spreadsheet defined the beginning of the
business analytics era, but the scale and complexity of today’s data has
outgrown that origin. To avoid time-consuming manual processes, and the human
error typical of that approach, analytics has become a programming
discipline. Companies like Betterment are hiring data scientists and analysts
who use software development techniques to reliably answer business questions
which have quickly expanded in scale and complexity. To do good data work
today, you need to use a system that is reproducible, versionable, scalable,
and open. Our analytics and data science team at Betterment uses these data
best practices to quickly produce reliable and sophisticated insights to
drive product and business decisions. A Short History of Data in Business
First, a step back in the business time machine. With VisiCalc, the
first-ever spreadsheet program, in 1979 and Excel in 1987, the business world
stepped into two new eras in which any employee could manage large amounts of
data. The bottlenecks in business analytics had been the speed of human
arithmetic or the hours available on corporate mainframes operated by only a
few specialists. With spreadsheet software in every cubicle, analytical
horsepower was commoditized and Excel jockeys were crowned as the arbiters of
truth in business. But the era of the spreadsheet is over. The data is too
large, the analyses are too complex, and mistakes are too dangerous to trust
to our dear old friend the spreadsheet. Ask Carmen Reinhart and Kenneth
Rogoff, two Harvard economists who published an influential paper on
sovereign debt and economic growth, only to find out that the results rested
in part on the accidental omission of five cells from an average. Or ask the
execs at JPMorgan who lost $6 billion in the ‘London Whale’ trading debacle,
also due in part of poor data practices in Excel. More broadly, a 2015 survey
of large businesses in the UK reported that 17% had experienced direct
financial losses because of spreadsheet errors. It’s a new era with a new
scale of data, and it’s time to define new norms around management of and
inferences from business data. Requirements for Modern Data Analysis
Spreadsheets fundamentally lack these properties essential to modern data
work. To do good data work today, you need to use a system that is:
Reproducible It’s not personal, but I don’t trust any number that comes
without supporting code. That code should take me from the raw data to the
conclusions. Most analyses contain too many important detailed steps to
plausibly communicate in an email or during a meeting. Worse yet, it’s
impossible to remember exactly what you’ve done in a point and click
environment, so doing it the same way again next time is a crap shoot.
Reproducible also means efficient. When an input or an assumption changes, it
should be as easy as re-running the whole thing. Versionable Code versioning
frameworks, such as git, are now a staple in the workflow of most technical
teams. Teams without versioning are constantly asking questions like, “Did
Jim send the latest file?”, “Can I be sure that my teammate selected all
columns when he re-sorted?”, or “The bottom line numbers are different in
this report; what exactly changed since the first draft?” These
inefficiencies in collaboration and uncertainties about the calculations can
be deadly to a data team. Sharing code in a common environment also enables
the reuse of modular analysis components. Instead of four analysts all
inventing their own method for loading and cleaning a table of users, you can
share as a group the utils/LoadUsers() function and ensure you are talking
about the same people at every meeting. Scalable There are hard technical
limits to how large an analysis you can do in a spreadsheet. Excel 2013 is
capped at just more than 1 million rows. It doesn’t take a very large
business these days to collect more than 1 million observations of customer
interactions or transactions. There are also feasibility limits. How long
does it take your computer to open a million row spreadsheet? How likely is
it that you’ll spot a copy-paste error at row 403,658? Ideally, the same
tools you build to understand your data when you’re at 10 employees should
scale and evolve through your IPO. Open Many analyses meet the above ideals
but have been produced with expensive, proprietary statistical software that
inhibits sharing and reproducibility. If I do an analysis with open-source
tools like R or Python, I can post full end-to-end instructions that anyone
in the world can reproduce, check, and expand upon. If I do the same in SAS,
only people willing to spend $10,000 (or more if particular modules are
required) can review or extend the project. Platforms that introduce
compatibility problems between versions and save their data in proprietary
formats may limit access to your own work even if you are paying for the
privilege. This may seem less important inside a corporate bubble where
everyone has access to the same proprietary platform, but it is at the very
least a turnoff to most new talent in the field. I don’t hear anyone saying
that expensive proprietary data solutions are the future. What to Use, and
How Short answer: R or Python. Longer answer: Here at Betterment, we use
both. We use Python more for data pipeline processes and R more for modeling,
analyses, and reporting. But this article is not about the relative merits of
these popular modern solutions. It is about the merits of using one of them
(or any of the smaller alternatives). To get the most out of a programmatic
data analysis workflow, it should be truly end-to-end, or as close as you can
get in your environment. If you are new to one or both of these environments,
it can be daunting to sort through all of the tools and figure out what does
what. These are some of the most popular tools in each language organized by
their layer in your full-stack analysis workflow: Full Stack Analysis R
Python Environment RStudio iPython / Jupyter, PyCharm Sourcing Data RMySQL,
rpostgresql, rvest, RCurl, httr MySQLdb, requests, bs4 Cleaning, Reshaping
and Summarizing data.table, dplyr pandas Analysis, Model Building, Learning
see CRAN Task Views NumPy, SciPy, Statsmodels, Scikit-learn Visualization
ggplot2, ggvis, rCharts matplotlib, d3py, Bokeh Reporting RMarkdown, knitr,
shiny, rpubs IPython notebook Sourcing Data If there is any ambiguity in this
step, the whole analysis stack can collapse on the foundation. It must be
precise and clear where you got your data, and I don’t mean conversationally
clear. Whether it’s a database query, a Web-scraping function, a MapReduce
job, or a PDF extraction, script it and include it in your reproducible
process. You’ll thank yourself when you need to update the input data, and
your successors and colleagues will be thankful they know what you’re basing
your conclusions on. Cleaning, Reshaping, Summarizing Every dataset includes
some amount of errant, corrupted, or outlying observations. A good analysis
excludes them based on objective rules from the beginning and then tests for
sensitivity to these exclusions later. Dropping observations is also one of
the easiest ways for two people doing similar analyses to reach different
conclusions. Putting this process in code keeps everyone accountable and
removes ambiguity about how the final analysis set was reached. Analysis,
Model Building, Learning You’ll probably only present one or two of the
scores of models and variants you build and test. Develop a process where
your code organizes and saves these variants rather than discarding the ones
that didn’t work. You never know when you’ll want to circle back. Try to
organize analyses in a structure similar to how you present them so that the
connection from claims to details is easy to make. Visualization, Reporting
Careful, a trap is looming. So many times, the chain of reproducibility is
broken right before the finish line when plots and statistical summaries are
copied onto PowerPoint slides. Doing so introduces errors, breaks the link
between claims and process, and generates huge amounts of work in the
inevitable event of revisions. R and Python both have great tools to produce
finished reports as static HTML or PDF documents, or even interactive
reporting and visualization products. It might take some time to convince the
rest of your organization to receive reports in these more modern formats.
Moving your organization towards these ideals is likely to be an imperfect
and gradual process. If you’re the first convert, absolutism is probably not
the right approach. If you have influence in the hiring process, try to push
for candidates who understand and respect these principles of data science.
In the near term, look for smaller pieces of the analytical workflow which
would benefit especially from the efficiencies of reproducible, programmatic
analysis and reporting. Good candidates are reports that are updated
frequently, require extensive collaboration, or are constantly hung up on
discussions over details of implementation or interpretation. Changing
workflows and acquiring new skills is always an investment, but the dividends
here are better collaboration, efficient iteration, transparency in process
and confidence in the claims and recommendations you make. It’s worth it.
9 min read

* ENGINEERING AT BETTERMENT: DO YOU HAVE TO BE A FINANCIAL EXPERT?

Engineering at Betterment: Do You Have to Be a Financial Expert? When I
started my engineering internship at Betterment, I barely knew anything about
finance. By the end of the summer, I was working on a tool to check for money
launderers and fraudsters. Last summer, I built an avatar creator for K-12
students. Now, a year later, I’m working on a tool to check for money
launderers and fraudsters. How did I go from creating avatars with Pikachu
ears to improving detection of financial criminals? Well, it was one part
versatility of software engineering, one part courage to work in an industry
I knew nothing about, and a dash of eagerness to learn as much as I could. I
was on the verge of taking another internship in educational technology,
commonly referred to as ‘edtech.’ But when I got the opportunity to work at
Betterment, a rapidly growing company, I had to take it. Before my
internship, finance, to me, was a field in which some of my peers would work
more hours than I had hours of consciousness. Definitely not my cup of tea. I
knew I didn’t want to work at a big bank, but I did want to learn more about
the industry that employed 16.6% of my classmates at Yale. The name
Betterment jumped out at me on a job listings page because it sounded like it
would make my life ‘better.’ Betterment is a financial technology, or
‘fintech,’ company; while it provides financial services, it’s an engineering
company at its core. Working here offered me the opportunity to learn about
finance while still being immersed in tech startup culture. I was nervous to
work in an industry I knew nothing about. But I soon realized it was just the
opposite: Knowing less about finance motivated me to learn—quickly. When I
started working at Betterment, I barely knew anything about finance. I
couldn’t tell you what a dividend was. I didn’t know 401(k)s were
employer-sponsored. My first task involved DTC participants, CUSIPs, and
ACATS—all terms that I’d never heard before. (For the record, they stand for
The Depository Trust Company, Committee on Uniform Security Identification
Procedures, and Automated Customer Account Transfer Service, respectively.) A
few days into my internship, I sat through a meeting about traditional and
Roth IRAs wondering, what does IRA stand for? The unfortunate thing is that
this is common for people my age. Personal finance is not something many
college students think about—partially because it’s not taught in school and
partially because we don’t have any money to worry about anyway. (Besides, no
one wants to be an adult, right?) As a result, only 26% of 20-somethings have
any money invested in stocks. At first, I thought my lack of exposure to
finance put me at a disadvantage. I was nervous to work in an industry I knew
nothing about. But I soon realized it was just the opposite: Knowing less
about finance motivated me to learn—quickly. I started reading Robert
Shiller’s Finance and the Good Society, a book my dad recommended to me
months earlier. I searched every new term I came across and, when that wasn’t
enough, asked my co-workers for help. Many of them took the time to draw
diagrams and timelines to accompany their explanations. Soon enough, I had
not only expanded my knowledge of engineering best practices, but I learned
about dividends, tax loss harvesting, and IRAs (it stands for individual
retirement account, in case you were wondering). The friendly atmosphere at
Betterment and the helpfulness of the people here nurtured my nascent
understanding of finance and turned me into someone who is passionate about
investing. Before working at Betterment, I didn’t think finance was relevant
to me. It took eight hours a day of working on a personal finance product for
me to notice that the iceberg was even there. Now, I know that my money
(well, the money I will hopefully have in the future) ideally should work
hard for me instead of just sitting in a savings account. Luckily, I won’t
have to struggle with building an investment portfolio or worry about
unreasonable fees. I’ll just use Betterment.
4 min read

* WOMEN WHO CODE: AN ENGINEERING Q&A WITH VENMO

Women Who Code: An Engineering Q&A with Venmo Betterment recently hosted a
Women in Tech meetup with Venmo developer Cassidy Williams, who spoke about
impostor syndrome. Growing up, I watched my dad work as an electrical
engineer. Every time I went with him on Take Your Child to Work Day, it
became more and more clear that I wanted to be an engineer, too. In 2012, I
graduated from the University of Portland with a degree in computer science
and promptly moved to the Bay Area. I got my first job at Intel, where I
worked as a Scala developer. I stayed there for several years until last May,
when I uprooted my life to New York for Betterment, and I haven’t looked back
since. As an engineer, I not only love building products from the ground up,
but I’m passionate about bringing awareness to diversity in tech, an
important topic that has soared to the forefront of social justice issues.
People nationwide have chimed in on the conversation. Most recently, Isis
Wenger, a San Francisco-based platform engineer, sparked the
#ILookLikeAnEngineer campaign, a Twitter initiative designed to combat gender
inequality in tech. At Betterment, we’re working on our own set of
initiatives to drive the conversation. We’ve started an internal roundtable
to voice our concerns about gender inequality in the workplace, we’ve
sponsored and hosted Women in Tech meetups, and we’re starting to collaborate
with other companies to bring awareness to the issue. Cassidy Williams, a
software engineer at mobile payments company Venmo, recently came in to
speak. She gave a talk on impostor syndrome, a psychological phenomenon in
which people are unable to internalize their accomplishments. The phenomenon,
Williams said, is something that she has seen particularly among
high-achieving women—where self-doubt becomes an obstacle for professional
development. For example, they think they’re ‘frauds,’ or unqualified for
their jobs, regardless of their achievements. Williams’ goal is to help women
recognize the characteristic and empower them to overcome it. Williams has
been included as one of Glamour Magazine's 35 Women Under 35 Who Are Changing
the Tech Industry and listed in the Innotribe Power Women in FinTech Index.
As an engineer myself, I was excited to to speak with her after the event
about coding, women in tech, and fintech trends. Cassidy Williams, Venmo
engineer, said impostor syndrome tends to be more common in high-achieving
women. Photo credit: Christine Meintjes Abi: Can you speak about a time in
your life where ‘impostor syndrome’ was limiting in your own career? How did
you overcome that feeling? Cassidy: For a while at work, I was very nervous
that I was the least knowledgeable person in the room, and that I was going
to get fired because of it. I avoided commenting on projects and making
suggestions because I thought that my insight would just be dumb, and not
necessary. But at one point (fairly recently, honestly), it just clicked that
I knew what I was doing. Someone asked for my help on something, and then I
discussed something with him, and suddenly I just felt so much more secure in
my job. Can you speak to some techniques that have personally proven
effective for you in overcoming impostor syndrome? Asking questions,
definitely. It does make you feel vulnerable, but it keeps you moving
forward. It's better to ask a question and move forward with your problem
than it is to struggle over an answer. As a fellow software engineer, I can
personally attest to experiencing this phenomenon in tech, but I’ve also
heard from friends and colleagues that it can be present in non-technical
backgrounds, as well. What are some ways we can all work together to empower
each other in overcoming imposter syndrome? It's cliché, but just getting to
know one another and sharing how you feel about certain situations at work is
such a great way to empower yourself and empower others. It gets you both
vulnerable, which helps you build a relationship that can lead to a stronger
team overall. Whose Twitter feed do you religiously follow? InfoSec Taylor
Swift. It's a joke feed, but they have some great tech and security points
and articles shared there. In a few anecdotes throughout your talk, you
mentioned the importance of having mentors and role models. Who are your
biggest inspirations in the industry? Jennifer Arguello - I met Jennifer at
the White House Tech Inclusion Summit back in 2013, where we hit it off
talking about diversity in tech and her time with the Latino Startup
Alliance. I made sure to keep in touch because I would be interning in the
Bay Area, where she’s located, and we’ve been chatting ever since. Kelly Hoey
- I met Kelly at a women in tech hackathon during my last summer as a student
in 2013, and then she ended up being on my team on the British Airways
UnGrounded Thinking hackathon. She and I both live in NYC now, and we see
each other regularly at speaking engagements and chat over email about
networking and inclusion. Rane Johnson - I met Rane at the Grace Hopper
Celebration for Women in Computing in 2011, and then again when I interned at
Microsoft in 2012. She and I started emailing and video chatting each other
during my senior year of college, when I started working with her on the Big
Dream Documentary and the International Women’s Hackathon at the USA Science
and Engineering Festival. Ruthe Farmer - I first met Ruthe back in 2010
during my senior year of high school when I won the Illinois NCWIT
Aspirations Award. She and I have been talking with each other at events and
conferences and meetups (and even just online) almost weekly since then about
getting more girls into tech, working, and everything in between. One of the
things we chatted about after the talk was how empowering it is to have the
resources and movements of our generation to bring more diversity to the tech
industry. The solutions that come out of that awareness are game-changing.
What are some specific ways in which companies can contribute to these
movements and promote a healthier and more inclusive work culture? Work
with nonprofits: Groups like NCWIT, the YWCA, the Anita Borg Institute, the
Scientista Foundation, and several others are so great for community outreach
and company morale. Educate everyone, not just women and minorities: When
everyone is aware and discussing inclusion in the workplace, it builds and
maintains a great company culture. Form small groups: People are more open to
talking closely with smaller groups than a large discussion roundtable.
Building those small, tight-knit groups promotes relationships that can help
the company over time. It’s a really exciting time to be a software engineer,
especially in fintech. What do you think are the biggest trends of our time
in this space? Everyone's going mobile! What behavioral and market shifts can
we expect to see from fintech in the next five to 10 years? I definitely
think that even though cash is going nowhere fast, fewer and fewer people
will ever need to make a trip to the bank again, and everything will be on
our devices. What genre of music do you listen to when you’re coding? I
switch between 80s music, Broadway show tunes, Christian music, and classical
music. Depends on my feelings about the problem I'm working on. ;) IDE of
choice? Vim! iOS or Android? Too tough to call.
7 min read

* HOW WE BUILT BETTERMENT'S RETIREMENT PLANNING TOOL IN R AND JAVASCRIPT

How We Built Betterment's Retirement Planning Tool in R and JavaScript
Engineering Betterment’s new retirement planning tool meant finding a way to
translate financial simulations into a delightful Web experience. In this
post, we’ll dive into some of the engineering that took place to build
RetireGuide™ and our strategy for building an accurate, responsive, and
easy-to-use advice tool that implements sophisticated financial calculations.
The most significant engineering challenge in building RetireGuide was
turning a complex, research-driven financial model into a personalized Web
application. If we used a research-first approach to build RetireGuide, the
result could have been a planning tool that was mathematically sound but hard
for our customers to use. On the other hand, only thinking of user experience
might have led to a beautiful design without quantitative substance. At
Betterment, our end goal is to always combine both. Striking the right
balance between these priorities and thoroughly executing both is paramount
to RetireGuide’s success, and we didn’t want to miss the mark on either
dimension. Engineering Background RetireGuide started its journey as a set of
functions written in the R programming language, which Betterment’s
investment analytics team uses extensively for internal research. The team
uses R to rapidly prototype financial simulations and visualize the results,
taking advantage of R’s built-in statistical functions and broad set of
pre-built packages. The investment analytics team combined their R functions
using Shiny, a tool for building user interfaces in R, and released
Betterment’s IRA calculator as a precursor to RetireGuide. The IRA calculator
runs primarily in R, computing its advice on a Shiny server. This interactive
tool was a great start, but it lives in isolation, away from the holistic
Betterment experience. The calculator focuses on just one part of the broader
set of retirement calculations, and doesn’t have the functionality to
automatically import customers’ existing information. It also doesn’t assist
users in acting on the results it gives. From an engineering standpoint, the
end goal was to integrate much of the original IRA calculator’s code, plus
additional calculations, into Betterment’s Web application to create
RetireGuide as a consumer-facing tool. The result would let us offer a
permanent home for our retirement advice that would be “always on” for our
end customers. However, to complete this integration, we needed to migrate
the entire advice tool from our R codebase into the Betterment Web
application ecosystem. We considered two approaches: (1) Run the existing R
code directly server-side, or (2) port our R code to JavaScript to integrate
it into our Web application. Option 1: Continue Running R Directly Our first
plan was to reuse the research code in R and let it continue to run
server-side, building an API on top of the core functions. While this
approach enabled us to reuse our existing R code, it also introduced lag and
server performance concerns. Unlike our original IRA calculator, RetireGuide
needed to follow the core product principles of the Betterment experience:
efficiency, real-time feedback, and delight. Variable server response times
do not provide an optimal user experience, especially when performing
personalized financial projections. Customers looking to fine-tune their
desired annual savings and retirement age in real time would have to wait for
our server to respond to each scenario—those added seconds become noticeable
and can impair functionality. Furthermore, because of the CPU-intensive
nature behind our calculations, heavy bursts of simultaneous customers could
compromise a given server’s response time. While running R server-side is a
win on code-reuse, it’s a loss on scalability and user experience. Even
though code reuse presented itself as a win, the larger concerns behind user
experience, server lag, and new infrastructure overhead motivated us to
rethink our approach, prioritizing the user experience and minimizing
engineering overhead. Option 2: Port the R Code to JavaScript Because our Web
application already makes extensive use of JavaScript, another option was to
implement our R financial models in JavaScript and run all calculations
client-side, on the end user’s Web browser. Eliminating this potential server
lag solved both our CPU-scaling and usability concerns. However,
reimplementing our financial models in a very different language exposed a
number of engineering concerns. It eliminated the potential for any code
reuse and meant it would take us longer to implement. However, in keeping
with the company mission to provide smarter investing, it was clear that
re-engineering our code was essential to creating a better product. Our
process was heavily test-driven, during which product engineering
reimplemented many of the R tests in JavaScript, understood the R code’s
intent, and ported the code while modifying for client-side performance wins.
Throughout the process, we identified several discrepancies between
JavaScript and R function outputs, so we regularly reconciled the
differences. This process added extra validation, testing, and optimizations,
helping us to create the most accurate advice in our end product. The cost of
maintaining a separate codebase is well worth the benefits to our customers
and our code quality. A Win for Customers and Engineering Building
RetireGuide—from R to JavaScript—helped reinforce the fact that no
engineering principle is correct in all cases. While optimizing for code
reuse is generally desirable, rewriting our financial models in JavaScript
benefited the product in two noticeable ways: It increased testing and
organizational understanding. Rewriting R to JavaScript enabled knowledge
sharing and further code vetting across teams to ensure our calculations are
100% accurate. It made an optimal user experience possible. Being able to run
our financial models within our customers’ Web browsers ensures an instant
user experience and eliminates any server lag or CPU-concerns.
5 min read

* MEET BLAZER: A NEW OPEN-SOURCE PROJECT FROM BETTERMENT (VIDEO)

Meet Blazer: A New Open-Source Project from Betterment (video) While we love
the simplicity and flexibility of Backbone, we’ve recently encountered
situations where the Backbone router didn’t perfectly fit the needs of our
increasingly sophisticated application. To meet these needs, we created
Blazer, an extension of the Backbone router. We created an open-source
project called Blazer to work as an extension of the Backbone router. All
teams at Betterment are responsible for teasing apart complex financial
concepts and then presenting them in a coherent manner, enabling our
customers to make informed financial decisions. One of the tools we use to
approach this challenge on the engineering team is a popular Javascript
framework called Backbone. While we love the simplicity and flexibility of
Backbone, we’ve recently encountered situations where the Backbone router
didn’t perfectly fit the needs of our increasingly sophisticated application.
To meet these needs, we created Blazer, an extension of the Backbone router.
In the spirit of open-source software, we are sharing Blazer with the
community. To learn more, we encourage you to watch the below video featuring
Betterment’s Sam Moore, a lead engineer, who reveals the new framework at a
Meetup in Betterment’s NYC offices. Take a look at Blazer.
https://www.youtube.com/embed/F32QhaHFn1k
2 min read

* DEALING WITH THE UNCERTAINTY OF LEGACY CODE

Dealing With the Uncertainty of Legacy Code To complete our portfolio
optimization, we had to tackle a lot of legacy code. And then we applied our
learnings going forward. Last fall, Betterment optimized its portfolio,
moving from the original platform to an upgraded trading platform that
included more asset classes and the ability to weight exposure of each asset
class differently for every level of risk. For Betterment engineers, it meant
restructuring the underlying portfolio data model for increased flexibility.
For our customers, it should result in better expected, risk-adjusted returns
for investments. However, as our data model changed, pieces of the trading
system also had to change to account for the new structure. While most of
this transition was smooth, there were a few cases where legacy code slowed
our progress. To be sure, we don't take changing our system lightly. While we
want to iterate rapidly, we never compromise the security of our customers
nor the correctness of our code. For this reason, we have a robust testing
infrastructure and only peer-reviewed, thoroughly-tested code gets pushed
through to production. What is legacy code? While there are plenty of
metaphors and ways to define legacy code, it has this common feature: It’s
always tricky to work with it. The biggest problem is that sometimes you're
not always sure the original purpose of older code. Either the code is poorly
designed, the code has no tests around it to specify its behavior, or both.
Uncertainty like this makes it hard to build new and awesome features into a
product. Engineers' productivity and happiness decrease as even the smallest
tasks can be frustrating and time-consuming. Thus, it’s important for
engineers to do two things well: (a) be able to remove existing legacy code
and (b) not to write code that is likely to become legacy code in the future.
Legacy code is a form of technical debt—the sooner it gets fixed, the less
time it will take to fix in the future. How to remove legacy code During our
portfolio optimization, we had to come up with a framework for dealing with
pieces of old code. Here’s what we considered: We made sure we knew its
purpose. If the code is not on any active or planned future development
paths and has been working for years, it probably isn't worth it. Legacy
code can take a long time to properly test and remove. We made a good effort
to understand it. We talked to other developers who might be more familiar
with it. During the portfolio update project, we routinely brought a few
engineers together to diagram trading system flow on a whiteboard. We wrote
tests around the methods in question. It's important to have tests in place
before changing code to be as confident as possible that the behavior of the
code is not changing during refactoring. Hopefully, it is possible to write
unit tests for at least a part of the method's behavior. Write unit tests
for a piece of the method, then refactor that piece. Test, repeat, test. Once
the tests are passing, write more tests for the next piece, and repeat the
test, refactor, test, refactor process. Fortunately, we were able to get rid
of most of the legacy code encountered during the portfolio optimization
project using this method. Then there are outliers Yet sometimes even the
best practices still didn’t apply to a piece of legacy code. In fact,
sometimes it was hard to even know where to start to make changes. In my
experience, the best approach was to jump in and rewrite a small piece of
code that was not tested, and then add tests for the rewritten portion
appropriately. Write characterization tests We also experimented with
characterization tests. First proposed by Michael Feathers (who wrote the
bible on working with legacy code) these tests simply take a set of verified
inputs/outputs from the existing production legacy code and then assert that
the output of the new code is the same as the legacy code under the same
inputs. Several times we ran into corner cases around old users, test users,
and other anomalous data that caused false positive failures in our
characterization tests. These in turn led to lengthy investigations that
consumed a lot of valuable development time. For this reason, if you do write
characterization tests, we recommend not going too far with them. Handle a
few basic cases and be done with them. Get better unit or integration tests
in place as soon as possible. Build extra time into project estimates Legacy
code can also be tricky when it comes to project estimates. It is
notoriously hard to estimate the complexity of a task when it needs to be
built into or on top of a legacy system. In our experience, it has always
taken longer than expected. The portfolio optimization project took longer
than initially estimated. Also, if database changes are part of the project
(e.g. dropping a database column that no longer makes sense in the current
code structure), it's safe to assume that there will be data issues that will
consume a significant portion of developer time, especially with older data.
Apply the learnings to future The less legacy code we have, the less we have
to deal with the aforementioned processes. The best way to avoid legacy code
is to make a best effort at not writing in the first place. The best way to
avoid legacy code is to make a best effort at not writing it in the first
place. For example, we follow a set of pragmatic design principles drawn
from SOLID (also created by Michael Feathers) to help ensure code quality.
All code is peer reviewed and does not go to production if there is not
adequate test coverage or if the code is not up to design standards. Our
unit tests are not only to test behavior and drive good design, but should
also be readable to the extent that they help document the code itself. When
writing code, we try to keep in mind that we probably won't come back later
and clean up the code, and that we never know who the next person to touch
this code will be. Betterment has also established a "debt day" where once
every month or two, all developers take one day to pay down technical debt,
including legacy code. The Results It's important to take a pragmatic
approach to refactoring legacy code. Taking the time to understand the code
and write tests before refactoring will save you headaches in the future.
Companies should strive for a fair balance between adding new features and
refactoring legacy code, and should establish a culture where thoughtful code
design is a priority. By incorporating many of these practices, it is
steadily becoming more and more fun to develop on the Betterment platform.
And the Betterment engineering team is avoiding the dreaded productivity and
happiness suck that happens when working on systems with too much legacy
code. Interested in engineering at Betterment? Betterment is an
engineering-driven company that has developed the most-trusted online
financial advisor based on the principles of optimization and efficiency.
Learn more about engineering jobs and our culture. Determination of most
trusted online financial advisor reflects Betterment LLC's distinction of
having the most customers in the industry, made in reliance on customer
counts, self-reported pursuant to SEC rules, across all online-only
registered investment advisors.
7 min read

* THIS IS HOW YOU BOOTSTRAP A DATA TEAM

This Is How You Bootstrap a Data Team Data alone is not enough—we needed the
right storytellers. Six months ago, I packed up my travel-sized toothbrush
kit, my favorite coffee mug now filled with pens and business cards, and a
duffel bag full of gym socks and free conference tee-shirts. With my
start-up survival kit in tow, it was time to move on from my job as a
back-office engineer. From the left: Avi Lederman, data warehousing engineer;
Yuriy Goldman, engineering lead; Jon Mauney, data analyst; Nick Petri, data
analyst; and Andrew Weisgall, marketing analyst. I dragged my chair ten feet
across the office and began my new life as the engineering lead of
Betterment’s nascent data team—my new mates included two talented data
analysts, a data warehousing engineer and a marketing analyst, also the
product owner. I was thrilled. There was a lot for us to do. In our new
roles, we are now informing and guiding many of the ongoing product and
marketing efforts at Betterment. Thinking big, we decided to dub ourselves
Team Polaris after the sky's brightest star. Creating a tighter feedback loop
Even though our move to create an in-house data team was a natural part of
our own engineering team evolution here at Betterment, it’s still something
of a risky unknown for most companies. Business intelligence tooling has
traditionally been something that comes at a great upfront cost to an
organization (it can reach into the millions of dollars)—but as a startup, we
instead looked carefully at how we could leverage our homegrown talent and
resources to build a team to seamlessly integrate into the existing company
architecture. Specifically, we wanted a tight feedback loop between the
business and technology so that we could experiment and figure out what
worked before committing real dollars to a solution—aka high-frequency
hypothesis testing. We needed a team responsible for collecting, curating and
presenting the data—and our data had to be trustworthy for objective
metric-level reporting to the organization. Our work consisted of
collaborating with our marketing, analytics, and product teams to establish
systems and practices that: Measure progress towards high level goals
Optimize growth and conversion Support product and project strategy Improve
customer outcome A guide to tactical decisions With these requirements in
mind, here are some of the tactical decisions we made from the start to get
our new data team off the ground. In the future, expect to read more from our
team about how we use our data insights to drive product and growth
development at Betterment. 1. Define our process For us the obvious first
order of business was to deliver continuous, incremental value and gradual
transition from legacy systems to new ones. Our initial task was to interview
internal stakeholders to get at their data-related pain points. We sent out
questionnaires in advance but collected answers through face-to-face
dialogue. A couple of hours of focused conversation defined a six-month
tactical focus for the team. Then, with our meticulous notes compiled, it
became clear to us that our major challenges lay with the accessibility to
and reliability of key performance metrics. With the interviews in hand, the
team sat down to pen a manifest and define pillars by which we would measure
our progress. We came up with ACES: Automated, Consistent, Efficient, and
Self-serviced as the motifs by which we could create a measurable feedback
loop. 2. Inform the roadmap Within three weeks of operations, it became clear
that we could use turn-around time metrics from ad-hoc or advisory requests
to inform us where we need to invest in project cycles and technology. Yet
busy with data projects we were feeling the pain ourselves. We needed more
easily accessible business measures with sufficient context by which we and
our colleagues could roll up or slice and dice our data. We knew that a star
schema approach would help us clarify a data narrative and give all of us a
consistent view of truth. But there was no way for us to do it all at once.
3. Limit disruption while we build To limit disruption to our colleagues
while delivering incremental improvements, we implemented a clever and
completely practical transition plan within MySQL’s native feature set.
Specifically, we set up a new database server dedicated to reporting and
ad-hoc workloads. This dedicated MySQL instance consisted of three database
schemas we now refer to as our Triumvirate Data Warehouse. The first member
of this triad is betterment_live. This database is a complete, real-time,
read-only replica of our production database. It’s just native MySQL
master-slave replication; easy to set up and maintain on dedicated hardware
or in the cloud. The second member is client_analytics. It is a read-write
schema to which our colleagues have full privileges. The usage pattern is
for folks to connect to client_analytics and from there to: cross-query
against the betterment_live schema, import/export and manipulate custom
datasets with Python or R, perform regression and analysis, etc. Everybody
wins. Our data workers retain their ability to run existing processes until
we can transition them to a “better” way while the engineering team has
successfully expelled business users out of an already busy production
environment. Last but certainly not least is our new baby, the data
warehouse. It is a read-only, star-schema representation of fact and
dimensional tables for growth subject areas. We’ve pushed the aforementioned
nuisance and complexity into our data pipeline (ETL) process and are able to
synthesize atomic and summary metrics in a format that is more intuitive for
our business users. Legacy workloads that are complex and underperforming can
now be transitioned over to the data warehouse schema incrementally.
Further, because all three schemas live in the same MySQL server,
client_analytics becomes a central hub from which our colleagues can join
tables that have not yet been modeled in the warehouse with key dimensions
that have been. They get the best of both worlds while we look to what comes
next Finally, transition is prioritized in-stream with the needs of the
organization and we never bite off more than we can chew. 4. Standardize and
educate A major part of our data warehouse build out was in clarifying
definitions of business terms and key metrics present in our daily parlance.
Maintaining a Data Dictionary wiki became a part of our Definition of Done.
Our dashboards, displayed on large screen TVs and visible by all, were the
first to be relabeled and remodeled. Reports available to the entire office
were next. Cleaning up the most looked at metrics helped the organization
speak to and understand key data in a consistent manner. 5. Maintain a tight
feedback loop The team follows an agile process familiar to modern technology
organizations. We Scrum, we Git, and we Jenkins. We stay in regular contact
with stakeholders throughout a build-out and iterate over MVPs. Now, back to
the future These are just the first few bootstrapping steps. In future posts
I will be tempted to wax technical and provide more color on the choices
we’ve made and why. I will also share our vision for an Event Narrative Data
Warehouse and how we are leveraging start-up friendly partners such as
MixPanel for real-time event processing, funneling, and segmentation.
Finally, we will share some tactics for enabling data scientists to be more
collaborative and presentational with their R or Python visualizations. At
Betterment, our ultimate goal is to continue developing products that change
the investing world—and that starts with data. But data alone is not
enough—we needed the right storytellers. As we see it, the members of Team
Polaris are the bards of a data narrative that help the organization grow
while delivering a top-tier product. Interested in engineering at
Betterment? Betterment is an engineering-driven company that has developed
the most trusted online financial advisor based on the principles of
optimization and efficiency. Learn more about engineering jobs and our
culture. Determination of most trusted online financial advisor reflects
Betterment LLC's distinction of having the most customers in the industry,
made in reliance on customer counts, self-reported pursuant to SEC rules,
across all online-only registered investment advisors.
7 min read

* ONE MASSIVE MONTE CARLO, ONE VERY EFFICIENT SOLUTION

One Massive Monte Carlo, One Very Efficient Solution We optimized our
portfolio management algorithms in six hours for less than $500. Here’s how
we did it. Optimal portfolio management requires managing a portfolio in
real-time, including taxes, rebalancing, risk, and circumstantial variables
like cashflows. It’s our job to fine-tune these to help our clients, and it’s
very important we have these decisions be robust to the widest possible array
of potential futures they might face. We recently re-optimized our portfolio
to include more complex asset allocations and risk models (and it will soon
be available). Next up was optimizing our portfolio management algorithms,
which manage cashflows, rebalances, and tax exposures. It’s as if we
optimized the engine for a car, and now we needed to test it on the race
track with different weather conditions, tires, and drivers. Normally, this
is a process that can literally take years (and may explain why legacy
investing services are slow to switch to algorithmic asset allocation and
advice.) But we did things a little differently, which saved us thousands of
computing hours and hundreds of thousands of dollars. First, the Monte Carlo
The testing framework we used to assess our algorithmic strategies needed to
fulfill a number of criteria to ensure we were making robust and informed
decisions. It needed to: Include many different potential futures Include
many different cash-flow patterns Respect path dependence (taxes you pay this
year can’t be invested next year) Accurately test how the algorithm would
perform if run live. To test our algorithms-as-strategies, we simulated the
thousands of potential futures they might encounter. Each set of strategies
was confronted with both bootstrapped historical data and novel simulated
data. Bootstrapping is a process by which you take random chunks of
historical data and re-order it. This made our results robust to the risk of
solely optimizing for the past, a common error in the analysis of strategies.
We used both historic and simulated data because they complement each other
in making future-looking decisions: The historical data allows us to include
important aspects of return movements, like auto-correlation, volatility
clustering, correlation regimes, skew, and fat tails. It is bootstrapped
(sampled in chunks) to help generate potential futures. The simulated data
allows us to generate novel potential outcomes, like market crashes bigger
than previous ones, and generally, futures different than the past. The
simulations were detailed enough to replicate how they’d run in our live
systems, and included, for example, annual tax payments due to capital gains
over losses, cashflows from dividends and the client saving or withdrawing.
It also showed how an asset allocation would perform over the lifetime of an
investment. During our testing, we ran over 200,000 simulations of 12 daily
level returns of our 12 asset classes for 20 year's worth of returns. We
included realistic dividends at an asset class level. In short, we tested a
heckuva a lot of data. Normally, running this Monte Carlo would have taken
nearly a full year to complete on a single computer, but we created a far
more nimble system by piecing together a number of existing technologies. By
harnessing the power of Amazon Web Services (specifically EC2 and S3) and a
cloud-based message queue called IronMQ we reduced that testing time to just
six hours—and for a total cost of less than $500. How we did it 1. Create an
input queue: We created a bucket with every simulation—more than 200,000—we
wanted to run. We used IronMQ to manage the queue, which allows individual
worker nodes to pull inputs themselves instead of relying on a system to
monitor worker nodes and push work to them. This solved the problem found in
traditional systems where a single node acts as the gatekeeper, which can get
backed up, either breaking the system or leading to idle testing time. 2.
Create 1,000 worker instances: With Amazon Cloud Service, we signed up to
access time on 1,000 virtual machines. This increased our computing power by
a thousandfold, and buying time is cheap on these machines. We employed the
m1.small instances, relying on the quality of quantity. 3. Each machine pulls
a simulation: Thanks the the maturation of modern message queues it is more
advantageous and simple to orchestrate jobs in a pull-based fashion, than the
old push system, as we mentioned above. In this model there is no single
controller. Instead, each worker acts independently. When the worker is idle
and ready for more work, it takes it upon itself to go out and find it. When
there’s no more work to be had, the worker shuts itself down. 4. Store
results in central location: We used another Amazon Cloud service called S3
to store the results of each simulation. Each file — with detailed asset
allocation, tax, trading and returns information — was archived inexpensively
in the cloud. Each file was also named algorithmically to allow us to refer
back to it and do granular audits of each run. 5. Download results for local
analysis: From S3, we could download the summarized results of each of our
simulations for analysis on a "regular" computer. The resulting analytical
master file was still large, but small enough to fit on a regular MacBook
Pro. We ran the Monte Carlo simulations over two weekends. Keeping our
overhead low, while delivering top-of-the-line portfolio analysis and
optimization is a key way we keep investment fees as low as possible. This
is just one more example of where our quest for efficiency—and your
happiness—paid off. This post was written with Dan Egan.
5 min read

* ENGINEERING THE TRADING PLATFORM: INSIDE BETTERMENT’S PORTFOLIO OPTIMIZATION

Engineering the Trading Platform: Inside Betterment’s Portfolio Optimization
To complete the portfolio optimization, Betterment engineers needed enhance
the code in our existing trading platform. Here's how they did it. In just a
few weeks, Betterment is launching an updated portfolio -- one that has been
optimized for better expected returns. The optimization will be partly driven
by a more sophisticated asset allocation algorithm, which will dynamically
vary individual asset allocations within the stock and bond basket based on a
goal’s overall allocation. This new flexible set of asset allocations
significantly affects our current trading processes. Until now, we executed
transactions based on fixed weights or a precise allocation of assets to
every level of risk. Now, in our updated portfolio with a more sophisticated
way to allocate, we are using a matrix to manage asset weights—and that
requires more complex trading logic. From an engineering perspective, this
means we needed to enhance the code in our existing trading platform to
accommodate dynamic asset allocation, with an eye towards future enhancements
in our pipeline. Here's how we did it. 1. Build a killer testing framework
When dealing with legacy code, one of our top priorities is to preserve
existing functionality. Failure to do so could mean anything from creating a
minor inconvenience to blocking trades from executing. That means the next
step was to build a killer testing framework. The novelty of our approach was
to essentially build partial, precise scaffolding around our current
platform. This kind of scaffolding allowed us to go in and out of the
current platform to capture and store precise inputs and outputs, while
isolating them away from any unnecessary stuff that wasn’t relevant to the
core trading processes. 2. Isolate the right information With this
abstraction, we were able to isolate the absolute core objects that we need
to perform trades, and ignore the rest. This did two things: it took testing
off the developers’ plates early in the process, allowing them to focus on
writing production code, and also helped isolate the central objects that
required most of their attention. The parent object of any activity inside
the Betterment platform is a “user transaction” — that includes deposits or
withdrawals to a goal, dividends, allocation changes, transfer of money
between goals and so on. The parent object of any activity inside the
Betterment platform is a “user transaction” — that includes deposits or
withdrawals for a goal, dividends, allocation changes, transfer of money
between goals and so on. These were our inputs. In most cases, a user
transaction will eventually be the parent of several trade objects. These
were our outputs. In our updated portfolio, the number of possible
transactions types did not change. What did change, however, was how each
transaction type was translated into trading activity, which is what we
wanted to test exhaustively. We captured a mass of user transaction objects
from production for use in testing. However, a user transaction object
contains a host of data that isn’t relevant to the trades that will
eventually be created, and is associated with other objects that are also not
relevant. So stripping out all non-trading data was the key to focusing on
the right things to test for this project. 3. Use SQLite database to be
efficient The best way to store the user transaction objects was to use JSON,
a human-readable representation of Java objects. To do this, we used GSON,
which lets you convert Java objects into JSON, and vice versa. We didn’t want
to store the JSON in a MySQL database, because managing it would be
unnecessary overhead for this purpose. Instead, we stored them in a flat
SQLite database. On the way into SQLite, GSON allowed us to “flatten” the
objects, leaving only the bits that pertained to trading and discarding the
rest. Then, we could rearrange these chunks to replicate all sorts of trading
activity patterns. On the way out, GSON would re-inflate the JSON back into
Java objects, using dummy values for the irrelevant fields, providing us with
test inputs ready to be pushed through our system. We did the same for
outputs, which were also full of “noise” for our purposes. We’d shrink the
expected results we got from production, then re-inflate and compare them to
what our tests produced. 4. Do no harm to others' work At Betterment, we are
constantly pushing through new features and enhancements, some visible to
customers, but many not. Development on these is concurrent, sometimes
impacting global objects and schemas, and it was essential to insulate the
team working on core trading functionality from all other development being
done at the company. Just the portfolio transition work alone includes
significant new code for front-end enhancements which have nothing to do with
trading. The GSON/JSON/SQLite testing framework helped the trading team
maintain laser focus on their task, as they worked under the hood. Otherwise,
we’d be putting a sweet new set of tires on a car that won’t start!
5 min read

* THREE THINGS I LEARNED IN MY ENGINEERING INTERNSHIP

Three Things I Learned In My Engineering Internship I knew I had a lot to
learn about how a Web app works, but I never imagined that it involved as
much as it does. This post is part of series of articles written by
Betterment’s 2013 summer interns. This summer, I had the privilege of
participating in a software engineering internship with Betterment. My
assignment was to give everyone in the office a visual snapshot of how the
company is doing. This would be accomplished through the use of dashboards
displayed on TV screens inside the office. We wanted to highlight metrics
such as net deposits, assets under management, and conversions from visitors
to the site into Betterment customers. Coming in with experience in only
Java, this was definitely a challenging project to tackle. Now that the
summer has ended, I have accomplished my goal — I created five dashboards
displaying charts, numbers and maps with valuable data that everyone can see.
From this experience, there are three very important things that I’ve
learned. 1. School has taught me nothing. Maybe this is a bit of an
exaggeration. As a computer science major, school has taught me how to code
in Java, and maybe some of the theoretical stuff that I’ve had drilled into
my head will come in handy at some point in my life. However, writing
mathematical proofs and small Java codes that complete standalone tasks seems
pretty pointless now that I’ve experienced the real world of software
development. There are so many links in the development chain, and what I
have learned in school barely covers half of a link. Not to mention almost
everything else I needed I was able to learn through Google, which makes me
wonder if I could have learned Java through the Internet in a few weeks
rather than spending the past two years in school? Needless to say I
definitely wish I could stay and work with Betterment rather than going back
to school next week, but today’s society is under the strange impression that
a college degree is important, so I guess I’ll finish it out. 2. The
structure of a Web app is a lot more complex than what the user sees on the
page. Before I began my internship, I had never worked on a Web app before. I
knew I had a lot to learn about how it all works, but I never imagined that
it involved as much as it does. There’s a database on the bottom, then the
backend code is layered on top of that — and then that is broken up into
multiple levels in order to keep different kinds of logic separate. And on
top of all that, is the front end code. All of it is kept together with
frameworks that allow the different pieces to communicate with each other,
and there are servers that the app needs to run on.This was extremely
eye-opening for me, and I’m so glad that the engineers at Betterment spent
time during my first week getting me up to speed on all of it. I was able to
build my dashboards as a Web app, so I not only needed to understand this
structure, but I needed to implement it as well. 3. A software engineer needs
to be multilingual. I’m not talking about spoken languages. The different
pieces in the structure of a web app are usually written in different
computer languages. Being that Java only covered a small piece of this
structure, I had a lot of languages to learn. Accessing the database requires
knowledge of SQL, a lot of scripts are written in Python, front end structure
and design is written in HTML and CSS, and front end animation is written in
javascript. In order to effectively work on multiple pieces of an app, an
engineer needs to be fluent in multiple different languages. Thankfully, the
Internet makes learning languages quick and easy, and I was able to pick up
on so many new languages throughout the summer. My experience this summer has
been invaluable, and I will be returning to school with a brand new view on
software development and what a career in this awesome field will be like.
4 min read

* KEEPING OUR CODE BASE SIMPLE, OPTIMALLY

Keeping Our Code Base Simple, Optimally Betterment engineers turned
regulatory compliance rules into an optimization problem to keep the code
base simple. Here's how they did it. At Betterment, staying compliant with
regulators, such as the Securities and Exchange Commission, is a part of
everyday life. We’ve talked before about how making sure everything is
running perfectly -- especially given all the cases we need to handle --
makes us cringe at the cyclomatic complexity of some of our methods. It’s a
constant battle to keep things maintainable, readable, testable, and
efficient. We recently put some code into production that uses an optimizer
to cut down on the amount of code we’re maintaining ourselves, and it turned
out to be pretty darn cool. It makes communicating with our regulators
easier, and is doing so in a pretty impressive fashion. We were tasked with
coming up with an algorithm that, at first pass, made me nervous about all
the different cases it would need to handle in order to do things
intelligently. Late one night, we started bouncing ideas off each other on
how to pull it off. We needed to make decisions at a granular level, test how
they affected the big picture, and then adjust accordingly. To use a
Seinfield analogy, the decisions we would make for Jerry had an effect on
what the best decisions were for Elaine. But, if Elaine was set up a certain
way, we wanted to go back to Jerry and adjust the decisions we made for him.
Then George. Then Newman. Then Kramer. Soon we had thought about so many
if-statements that they no longer seemed like if-statements, and all the
abstractions I was formulating were already leaking. Then a light came on. We
could not only make good decisions for Elaine, Jerry, and Newman, we could
make those decisions optimally. A little bit of disclaimer here before we
start digging in a little more: I can barely scratch the surface of how
solvers work. I just happen to know that it was a tool available to us, and
it happened to model the problem we needed to solve very well. This is meant
as an introduction to using one specific solver as a way to model and solve a
problem. An example Let’s say at the last minute, the Soup Nazi is out to
make the biggest batch of soup he possibly can. For his recipe he needs a
ratio of: 40% chicken 12% carrots 8% thyme 15% onions 15% noodles 5% garlic
5% parsley All of the stores around him only keep limited amounts in stock.
He calls around to all the stores just to see what the have in stock and puts
together each store’s inventory: Ingredients in stock (lbs) Elaine’s George’s
Jerry’s Newman’s Chicken 5 6 2 3 Carrots 1 8 5 2 Thyme 3 19 16 6 Onions 6 12
10 4 Noodles 5 0 3 9 Garlic 2 1 1 0 Parsley 3 6 2 1 Also, the quality of the
bags at all of the stores vary, limiting the total number pounds of food the
Soup Nazi can carry back. (We’re also assuming he only wants to make at most
one visit to each store.) Pound of food limits Elaine’s 12 George’s 8 Jerry’s
15 Newman’s 17 With the optimizer, the function that we are trying to
minimize or maximize is called the objective function. In this example, we
are trying to maximize the number of pounds of ingredients he can buy because
that will result in the most soup. If we say that,
a1 = pounds of chicken purchased from Elaine’s
a2 = pounds of carrots purchased from Elaine’s
a3 = pounds of thyme purchased from Elaine’s …
a7 = pounds of parsley purchased from Elaine’s
b1 = pounds of chicken purchased from George’s …
c1 = pounds of chicken purchased from Jerry’s …
d1 = pounds of chicken purchased from Newman’s … We’re looking to maximize,
a1 + a2 + a3 … + b1 + … + d7 = total pounds We then have to throw in all of
the constraints to our problem. First to make sure the Soup Nazi gets the
ratio of ingredients he needs: .40 * total pounds = a1 + b1 + c1 + d1
.12 * total pounds = a2 + b2 + c2 + d2 .08 * total pounds = a3 + b3 + c3 + d3
.15 * total pounds = a4 + b4 + c4 + d4 .15 * total pounds = a5 + b5 + c5 + d5
.05 * total pounds = a6 + b6 + c6 + d6 .05 * total pounds = a7 + b7 + c7 + d7
Then to make sure that the Soup Nazi doesn’t buy more pounds of food from one
store than he can carry back: a1 + a2 + … + a7 <= 12 b1 + b2 + … + b7 <= 8
c1 + c2 + … + c7 <= 15 d1 + d2 + … + d7 <= 17 We then have to put bounds on
all of our variables to say that we can’t take more pounds of any ingredient
than any store has in stock. 0 <= a1 <= 5 0 <= a2 <= 1 0 <= a3 <= 3
0 <= a4 <= 6 … 0 <= d7 <= 1 That expresses all of the constraints and bounds
to our problem and the optimizer works to maximize or minimize the objective
function subject to those bounds and constraints. The optimization package
we’re using in this example, python’s scipy.optimize, provides a very
expressive interface for specifying all of those bounds and constraints.
Translating the problem into code If you want to jump right in, check out the
full sample code. However, there are still a few more things to note: Get
numpy and scipy installed. The variables we’re solving for are put into a
single list. That means, x = [a1, a2, … , a7, b1, b2 … d7]. With python, it’s
helpful to know that we can pull the pounds of food for a particular
ingredient out of x, i.e, [a1, b1, c1, d1] with
x[ingredient_index :: num_of_ingredients] Likewise, we can pull out the
ingredients for a given store with
x[store_index * num_of_ingredients : store_index * num_of_ingredients + num_of_ingredients]
e.g, [b1, b2, b3, b4, b5, b6, b7] For this example, we’re using the
scipy.optimize.minimize function using the ‘NLSQP’ method. Arguments provided
to the minimize function Objective function With the package we’re using,
there is no option to maximize. This might seem like a show stopper, but we
get around it by negating our objective function, minimizing, and then
negating the results. Therefore our objective function becomes,
−a1 − a2 − a3 − a4 − … − d6 − d7 And expressing that with numpy is pretty
painless: numpy.sum(x) * −1.0 Bounds Bounds make sure that we don’t take more
than any one ingredient than the store has in stock. The minimize function
takes this in as a list of tuples where the indices line up with x. We can’t
take negative ingredients from the store, so the lower bound it always 0.
Therefore, [(0, 5), (0, 1) … (0, 1)] In the code example, for readability, I
threw all of the inputs into the program into some globals dictionaries.
Therefore, we can calculate our bounds with, def calc_bounds(): bounds = []
for s in stores: for i in ingredients:
bounds.append((0, store_inventory[s][i])) return bounds Guess Providing a
good initial guess can go a long way in getting you to a desirable solution.
It can also dramatically reduce the amount of time it takes to solve a
problem. If you’re not seeing numbers you expect, or it is taking a long time
to come up with a solution, the initial guess is often the first place to
start. For this problem, we made our initial guess to be what each store had
in stock, and we supplied it to the minimize method as a list. Constraints
One thing to note is that for the packages we’re using, constraints only deal
with ‘ineq’ and ‘eq’ where ‘ineq’ means greater than. The right hand side of
the equation is assumed to be zero. Also, we are providing the constraints as
tuple of dictionaries. (a1 + b1 + c1 + d1) − (.40 * total pounds) > 0 ...
(a7 + b7 + c7 + d7) − (.05 * total pounds) > 0 Note here that I changed the
constraints from equal-to to greater-than because comparing floats to be
exactly equal is a hard problem when you’re multiplying and adding numbers.
Therefore, to make sure we limit chicken to 40% of the overall ingredients,
one element of the constraints tuple will be, {'type' : 'ineq',
'fun' : lambda x : sum(extract_ingredient_specific_pounds(x, chicken)) − (calc_total_pounds_of_food(x) * .4) }
Making sure the soup nazi is able to carry everything back from the store:
12 − a1 − a2 − … − a7 >= 0 … 17 − d1 − d2 − … − d7 >= 17 Leads to,
{'type' : 'ineq',
'fun' : lambda x : max_per_store[store] − np.sum(extract_store_specific_pounds(x, store))}
Hopefully this gives you enough information to make sense of the code
example. The Results? Pretty awesome. The Soup Nazi should only buy a total
of 40 lbs worth ingredients because Elaine, George, Jerry, and Newman just
don’t have enough chicken.
9.830 lbs of food from Elaine's. Able to carry 12.0 pounds.
chicken: 5.000 lbs (5.0 in stock) carrots: 0.000 lbs (1.0 in stock)
thyme: 0.000 lbs (3.0 in stock) onions: 0.699 lbs (6.0 in stock)
noodles: 1.000 lbs (5.0 in stock) garlic: 1.565 lbs (2.0 in stock)
parsley: 1.565 lbs (3.0 in stock)
7.582 lbs of food from George's. Able to carry 8.0 pounds.
chicken: 6.000 lbs (6.0 in stock) carrots: 0.667 lbs (8.0 in stock)
thyme: 0.183 lbs (19.0 in stock) onions: 0.733 lbs (12.0 in stock)
noodles: 0.000 lbs (0.0 in stock) garlic: 0.000 lbs (1.0 in stock)
parsley: 0.000 lbs (6.0 in stock)
13.956 lbs of food from Jerry's. Able to carry 15.0 pounds.
chicken: 2.000 lbs (2.0 in stock) carrots: 3.501 lbs (5.0 in stock)
thyme: 3.017 lbs (16.0 in stock) onions: 4.568 lbs (10.0 in stock)
noodles: 0.000 lbs (3.0 in stock) garlic: 0.435 lbs (1.0 in stock)
parsley: 0.435 lbs (2.0 in stock)
8.632 lbs of food from Newman's. Able to carry 17.0 pounds.
chicken: 3.000 lbs (3.0 in stock) carrots: 0.632 lbs (2.0 in stock)
thyme: 0.000 lbs (6.0 in stock) onions: 0.000 lbs (4.0 in stock)
noodles: 5.000 lbs (9.0 in stock) garlic: 0.000 lbs (0.0 in stock)
parsley: 0.000 lbs (1.0 in stock)
16.000 lbs of chicken. 16.0 available across all stores. 40.00%
4.800 lbs of carrots. 16.0 available across all stores. 12.00%
3.200 lbs of thyme. 44.0 available across all stores. 8.00%
6.000 lbs of onions. 32.0 available across all stores. 15.00%
6.000 lbs of noodles. 17.0 available across all stores. 15.00%
2.000 lbs of garlic. 4.0 available across all stores. 5.00%
2.000 lbs of parsley. 12.0 available across all stores. 5.00% Bringing it all
together Hopefully this gives you a taste of the types of problems optimizers
can be used for. At Betterment, instead of picking pounds of ingredients from
a given store, we are using it to piece together a mix of securities, in
order to keep us compliant with certain regulatory specifications. While
there was a lot of work involved in making our actual implementation
production-ready (and a lot more work can be done to improve it), being able
to express rules coming out of a regulatory document as a series of bounds
and constraints via anonymous functions was a win for the readability of our
code base. I’m also hoping that it will make tacking on additional rules
painless in comparison to weaving them into a one off algorithm.
8 min read

SUBSCRIBE TO NEW ARTICLES

Email*

Notification Frequency

Add to your RSS Feed

JOIN OUR OPEN SOURCE PROJECTS

* TEST_TRACK

Server app for the TestTrack multi-platform split-testing and feature-gating
system.

See it on GitHub

* WEBVALVE

Betterment’s framework for locally developing and testing service-oriented
apps in isolation with WebMock and Sinatra-based fakes.

See it on GitHub

* BETTER_TEST_REPORTER

Tooling and libraries for processing dart test output into dev-friendly
formats.

See it on GitHub

* DELAYED

A multi-threaded, SQL-driven ActiveJob backend used at Betterment to process
millions of background jobs per day.

See it on GitHub

Triangle illustration

COME BUILD WITH US.

Explore open roles

Betterment Logo Icon SVG Footer
* Accounts
* Investing
* IRAs and 401(k)s
* Roth IRAs
* Cash Reserve
* Checking
* Trusts
* Investments
* Portfolio options
* Socially responsible investing
* Tax-smart investing
* Charitable giving
* 401(k) rollovers
* Retirement income
* Tools
* Retirement planning
* Track your goals
* All-in-one dashboard
* Compare robo-advisors
* Rewards
* Refer-a-friend program
* Help
* Help center
* FAQ
* Expert guidance
* Investment philosophy
* Article library
* Video
* Legal
* Company
* Pricing
* About us
* Mobile app
* How Betterment works
* Product roadmap
* Press
* Betterment shop
* Careers

* Betterment on Instagram
* Betterment on Facebook
* Betterment on Twitter
* Betterment on LinkedIn

*
*

* Contact us
* Betterment 401(k)
* Betterment for Advisors

You are viewing a web property located at Betterment.com. Different properties
may be provided by a different entity with different marketing standards.

* Site Map
* Terms of Use
* Privacy Policy
* Trademark
* Legal Directory

Google Play and the Google Play logo are trademarks of Google, Inc.

Apple, the Apple logo, and iPhone are trademarks of Apple, Inc., registered in
the U.S.

Any links provided to other websites are offered as a matter of convenience and
are not intended to imply that Betterment or its authors endorse, sponsor,
promote, and/or are affiliated with the owners of or participants in those
sites, unless stated otherwise.

www.betterment.com Open in urlscan Pro 2606:2c40::c73c:67fe Public Scan

Form analysis 2 forms found in the DOM

POST https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e

Text Content

www.betterment.com Open in urlscan Pro
2606:2c40::c73c:67fe Public Scan

Form analysis
2 forms found in the DOM