www.betterment.com Open in urlscan Pro
2606:2c40::c73c:67fe  Public Scan

Submitted URL: http://app.upsider.ai/engage/SiumrxyS0MVp99mHkUX7slKFUzqwy88A/click?signature=7a84746c2ab8b4f9f1ae37be82af8efb9ee441ba...
Effective URL: https://www.betterment.com/engineering
Submission Tags: demotag1 demotag2 Search All
Submission: On January 17 via api from US — Scanned from DE

Form analysis 2 forms found in the DOM

<form class="js-popular-topic-filter" onsubmit="event.preventDefault()">
  <input type="text" class="search-input" name="term" autocomplete="off" aria-label="Search" placeholder="Search your topic">
  <button aria-label="Search" disabled=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" aria-label="Submit search" role="img">
      <title>Submit search</title>
      <g clip-path="url(#ui--search_svg__ui--search_svg__clip0)">
        <path d="M2 5.5a3.5 3.5 0 117 0 3.5 3.5 0 01-7 0zm8 3.2c.7-.9 1-2 1-3.2a5.5 5.5 0 10-2.3 4.6l5.8 5.9 1.4-1.4-5.8-5.9H10z"></path>
      </g>
      <defs>
        <clipPath data-iconid="ui--search_svg__ui--search_svg__clip0">
          <rect width="16" height="16"></rect>
        </clipPath>
      </defs>
    </svg> </button>
</form>

POST https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e

<form novalidate="" accept-charset="UTF-8" action="https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e" enctype="multipart/form-data"
  id="hsForm_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" method="POST"
  class="hs-form stacked hs-custom-form hs-form-private hsForm_6f7868a5-674b-46e4-8c4e-e7018716724e hs-form-6f7868a5-674b-46e4-8c4e-e7018716724e hs-form-6f7868a5-674b-46e4-8c4e-e7018716724e_bc482810-c0cd-471d-b4f3-322f8609890a"
  data-form-id="6f7868a5-674b-46e4-8c4e-e7018716724e" data-portal-id="5274572" target="target_iframe_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0">
  <div class="hs_email hs-email hs-fieldtype-text field hs-form-field" data-reactid=".hbspt-forms-0.1:$0"><label id="label-email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="" placeholder="Enter your Email"
      for="email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0.1:$0.0"><span data-reactid=".hbspt-forms-0.1:$0.0.0">Email</span><span class="hs-form-required" data-reactid=".hbspt-forms-0.1:$0.0.1">*</span></label>
    <legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$0.1"></legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$0.$email"><input id="email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="hs-input" type="email" name="email" required="" placeholder="john.dev@betterment.com" value="" autocomplete="email"
        data-reactid=".hbspt-forms-0.1:$0.$email.0" inputmode="email"></div>
  </div>
  <div class="hs_blog_engineering_blog_57138234906_subscription hs-blog_engineering_blog_57138234906_subscription hs-fieldtype-radio field hs-form-field" style="display:none;" data-reactid=".hbspt-forms-0.1:$1"><label
      id="label-blog_engineering_blog_57138234906_subscription-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="" placeholder="Enter your Notification Frequency"
      for="blog_engineering_blog_57138234906_subscription-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0.1:$1.0"><span data-reactid=".hbspt-forms-0.1:$1.0.0">Notification Frequency</span></label>
    <legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$1.1"></legend>
    <div class="input" data-reactid=".hbspt-forms-0.1:$1.$blog_engineering_blog_57138234906_subscription"><input name="blog_engineering_blog_57138234906_subscription" class="hs-input" type="hidden" value=""
        data-reactid=".hbspt-forms-0.1:$1.$blog_engineering_blog_57138234906_subscription.0"></div>
  </div><noscript data-reactid=".hbspt-forms-0.2"></noscript>
  <div class="hs_submit hs-submit" data-reactid=".hbspt-forms-0.5">
    <div class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.5.0"></div>
    <div class="actions" data-reactid=".hbspt-forms-0.5.1"><input type="submit" value="Subscribe" class="hs-button primary large" data-reactid=".hbspt-forms-0.5.1.0"></div>
  </div><noscript data-reactid=".hbspt-forms-0.6"></noscript><input name="hs_context" type="hidden"
    value="{&quot;rumScriptExecuteTime&quot;:5347.5,&quot;rumServiceResponseTime&quot;:5532.10000038147,&quot;rumFormRenderTime&quot;:2,&quot;rumTotalRenderTime&quot;:5846.299999237061,&quot;rumTotalRequestTime&quot;:183.10000038146973,&quot;embedAtTimestamp&quot;:&quot;1642428780097&quot;,&quot;formDefinitionUpdatedAt&quot;:&quot;1639779970100&quot;,&quot;pageUrl&quot;:&quot;https://www.betterment.com/engineering&quot;,&quot;pageTitle&quot;:&quot;Betterment Engineering Blog&quot;,&quot;source&quot;:&quot;FormsNext-static-5.432&quot;,&quot;sourceName&quot;:&quot;FormsNext&quot;,&quot;sourceVersion&quot;:&quot;5.432&quot;,&quot;sourceVersionMajor&quot;:&quot;5&quot;,&quot;sourceVersionMinor&quot;:&quot;432&quot;,&quot;timestamp&quot;:1642428780100,&quot;userAgent&quot;:&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36&quot;,&quot;originalEmbedContext&quot;:{&quot;portalId&quot;:&quot;5274572&quot;,&quot;formId&quot;:&quot;6f7868a5-674b-46e4-8c4e-e7018716724e&quot;,&quot;formInstanceId&quot;:&quot;8405&quot;,&quot;pageId&quot;:&quot;57138234906&quot;,&quot;region&quot;:&quot;na1&quot;,&quot;inlineMessage&quot;:true,&quot;rawInlineMessage&quot;:&quot;Thanks for submitting the form.&quot;,&quot;hsFormKey&quot;:&quot;614aaedb59683c891e5f0c577d21cbd0&quot;,&quot;target&quot;:&quot;#hs_form_target_form_493431243&quot;,&quot;contentType&quot;:&quot;listing-page&quot;,&quot;formsBaseUrl&quot;:&quot;/_hcms/forms/&quot;,&quot;formData&quot;:{&quot;cssClass&quot;:&quot;hs-form stacked hs-custom-form&quot;}},&quot;canonicalUrl&quot;:&quot;https://www.betterment.com/engineering&quot;,&quot;pageId&quot;:&quot;57138234906&quot;,&quot;formInstanceId&quot;:&quot;8405&quot;,&quot;renderedFieldsIds&quot;:[&quot;email&quot;],&quot;rawInlineMessage&quot;:&quot;Thanks for submitting the form.&quot;,&quot;hsFormKey&quot;:&quot;614aaedb59683c891e5f0c577d21cbd0&quot;,&quot;formTarget&quot;:&quot;#hs_form_target_form_493431243&quot;,&quot;correlationId&quot;:&quot;fee6b933-17db-4581-aee7-606ab71c22d7&quot;,&quot;contentType&quot;:&quot;listing-page&quot;,&quot;hutk&quot;:&quot;da3ac8a6abc857df5b696649f350a5a9&quot;,&quot;captchaStatus&quot;:&quot;NOT_APPLICABLE&quot;,&quot;isHostedOnHubspot&quot;:true}"
    data-reactid=".hbspt-forms-0.7"><iframe name="target_iframe_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" style="display:none;" data-reactid=".hbspt-forms-0.8"></iframe>
</form>

Text Content

Skip to main content
Betterment Logo
Open menu
 * Careers
 * Engineering
 * Blogs
   * Product and design blog
   * Engineering blog

About us
Explore openings


ENGINEERING AT BETTERMENT



High quality code. Beautiful, practical design. Innovative problem solving.
Explore our engineering community and nerd out with us on all things tech.





RECENT ARTICLES

Filter articles
Submit search
 * Building culture
 * Data & Algorithms
 * Designing experiences
 * Inclusivity
 * Operating software
 * Security
 * Solving problems
 * Testing software
 * All articles

No results found


 * FINDING A MIDDLE GROUND BETWEEN SCREEN AND UI TESTING IN FLUTTER
   
   Finding a Middle Ground Between Screen and UI Testing in Flutter We outline
   the struggles we had testing our flutter app, our approaches to those
   challenges, and the solutions we arrived at to solve those problems. Flutter
   provides good solutions for both screen testing and UI testing, but what
   about the middle-ground? With integration testing being a key level of the
   testing pyramid, we needed to find a way to test how features in our app
   interacted without the overhead involved with setting up UI tests. I’m going
   to take you through our testing journey from a limited native automated
   testing suite and heavy dependence on manual testing, to trying flutter’s
   integration testing solutions, to ultimately deciding to build out our own
   framework to increase confidence in the integration of our components. The
   beginning of our Flutter testing journey Up until early 2020, our mobile app
   was entirely native with separate android and iOS codebases. At the onset of
   our migration to flutter, the major testing pain point was that a large
   amount of manual regression testing was required in order to approve each
   release. This manual testing was tedious and time consuming for engineers,
   whose time is expensive. Alongside this manual testing pain, the automated
   testing in the existing iOS and android codebases was inconsistent. iOS had a
   larger unit testing suite than android did, but neither had integration
   tests. iOS also had some tests that were flaky, causing CI builds to fail
   unexpectedly.  As we transitioned to flutter, we made unit/screen testing and
   code testability a high priority, pushing for thorough coverage. That said,
   we still relied heavily on the manual testing checklist to ensure the user
   experience was as expected. This led us to pursue an integration testing
   solution for flutter. In planning out integration testing, we had a few key
   requirements for our integration testing suite:  Easily runnable in CI upon
   each commit An API that would be familiar to developers who are used to
   writing flutter screen tests The ability to test the integration between
   features within the system without needing to set up the entire app. The
   Flutter integration testing landscape At the very beginning of our transition
   to flutter, we started trying to write integration tests for our  features
   using flutter’s solution at the time: flutter_driver. The benefit we found in
   flutter_driver was that we could run it in our production-like environment
   against preset test users. This meant there was minimal test environment
   setup. We ran into quite a few issues with flutter_driver though. Firstly,
   there wasn’t a true entry point we could launch the app into because our app
   is add-to-app, meaning that the flutter code is embedded into our iOS and
   Android native applications rather than being a pure flutter app runnable
   from a main.dart entry point. Second, `flutter_driver` is more about UI/E2E
   testing rather than integration testing, meaning we’d need to run an instance
   of the app on a device, navigate to a flow we wanted to test, and then test
   the flow. Also, the flutter_driver API worked differently than the screen
   testing API and was generally more difficult to use. Finally, flutter_driver
   is not built to run a suite of tests or to run easily in CI. While possible
   to run in CI, it would be incredibly costly to run on each commit since the
   tests need to run on actual devices. These barriers led us to not pursue
   `flutter_driver` tests as our solution. We then pivoted to investigating
   Flutter’s newer replacement for flutter_driver : integation_test.
   Unfortunately `integration_test` was very similar to flutter_driver, in that
   it took the same UI/E2E approach, which meant that it had the same benefits
   and drawbacks that flutter_driver had. The one additional advantage of
   `integration_test` is that it uses the same API as screen tests do, so
   writing tests with it feels more familiar for developers experienced with
   writing screen tests. Regardless, given that it has the same problems that
   flutter_driver does, we decided not to pursue `integration_test` as our
   framework. Our custom solution to integration testing After trying flutter’s
   solutions fruitlessly, we decided to build out a solution of our own. Before
   we dive into how we built it, let’s revisit our requirements from above:
   Easily runnable in CI upon each commit An API that would be familiar to
   developers who are used to writing flutter screen tests The ability to test
   the integration between features within the system without needing to set up
   the entire app. Given those requirements, we took a step back to make a few
   overarching design decisions. First, we needed to decide what pieces of code
   we were interested in testing and which parts we were fine with stubbing.
   Because we didn’t want to run the whole app with these tests in order to keep
   the tests lightweight enough to run on each commit, we decided to stub out a
   few problem areas. The first was our flutter/native boundary. With our app
   being add-to-app and utilizing plugins, we didn’t want to have to run
   anything native in our testing. We stubbed out the plugins by writing
   lightweight wrappers around them then providing them to the app at a high
   level that we could easily override with fakes for the purpose of integration
   testing. The add-to-app boundary was similar. The second area we wanted to
   stub out was the network. In order to do this, we built out a fake http
   client that allows us to configure network responses for given requests. We
   chose to fake the http client since it is the very edge of our network layer.
   Faking it left as much of our code as possible under test. The next thing we
   needed to decide was what user experiences we actually wanted to test with
   our integration tests. Because integration tests are more expensive to write
   and maintain than screen tests, we wanted to make sure the flows we were
   testing were the most impactful. Knowing this, we decided to focus on “happy
   paths” of flows. Happy paths are non-exceptional flows (flows not based on
   bad user state or input). On top of being less impactful, these sad paths
   usually give feedback on the same screen as the input, meaning those sad path
   cases are usually better tested at the screen test level anyway. From here,
   we set out to break down responsibilities of the components of our
   integration tests. We wanted to have a test harness that we could use to set
   up the app under test and the world that the app would run in, however we
   knew this configuration code would be mildly complicated and something that
   would be in flux. We also wanted a consistent framework by which we could
   write these tests. In order to ensure changes to our test harness didn’t have
   far reaching effects on the underlying framework, we decided to split out the
   testing framework into an independent package that is completely agnostic to
   how our app operates. This keeps the tests feeling familiar to normal screen
   tests since the exposed interface is very similar to how widget tests are
   written. The remaining test harness code was put in our normal codebase where
   it can be iterated on freely. The other separation we wanted to make was
   between the screen interactions and the tests themselves. For this we used a
   modified version of Very Good Venture's robot testing pattern that would
   allow us to reuse screen interactions across multiple tests while also making
   our tests very readable from even a non-engineering perspective. In order to
   fulfill two of our main requirements: being able to run as part of our normal
   test suite in CI and having a familiar API, we knew we’d need to build our
   framework on top of flutter’s existing screen test framework. Being able to
   integrate (ba dum tss) these new tests into our existing test suite is
   excellent because it meant that we would get quick feedback when code breaks
   while developing. The last of our requirements was to be able to launch into
   a specific feature rather than having to navigate through the whole app. We
   were able to do this by having our app widget that handles dependency setup
   take a child, then pumping the app widget wrapped around whatever feature
   widget we wanted to test. With all these decisions made, we arrived at a
   well-defined integration testing framework that isolated our concerns and
   fulfilled our testing requirements. The Nitty Gritty Details In order to
   describe how our integration tests work, let's start by describing an example
   app that we may want to test. Let's imagine a simple social network app,
   igrastam, that has an activity feed screen, a profile screen, a flow for
   updating your profile information, and a flow for posting images. For this
   example, we’ll say we’re most interested in testing the profile information
   edit flows to start. First, how would we want to make a test harness for this
   app? We know it has some sort of network interactions for fetching profile
   info and posts as well as for posting images and editing a profile. For that,
   our app has a thin wrapper around the http package called HttpClient. We may
   also have some interactions with native code through a plugin such as
   image_cropper. In order to have control over that plugin, this app has also
   made a thin wrapper service for that. This leaves our app looking something
   like this: Given that this is approximately what the app looks like, the test
   harness needs to grant control of the HttpClient and the ImageCropperService.
   We can do that by just passing our own fake versions into the app. Awesome,
   now that we have an app and a harness we can use to test it, how are the
   tests actually written?  Let’s start out by exploring that robot testing
   technique I mentioned earlier. Say that we want to start by testing the
   profile edit flow. One path through this flow contains a screen for changing
   your name and byline, then it bounces out to picking and cropping a profile
   image, then allows you to choose a preset border to put on your profile
   picture. For the screen for changing your name and byline, we can build a
   robot to interact with the screen that looks something like this: By using
   this pattern, we are able to reuse test code pertaining to this screen across
   many tests. It also keeps the test file clean of WidgetTester interaction,
   making the tests read more like a series of human actions rather than a
   series of code instructions. Okay, we’ve got an app, a test harness, and
   robots to interact with the screens. Let’s put it all together now into an
   actual test. The tests end up looking incredibly simple once all of these
   things are in place(which was the goal!) This test would go on to have a few
   more steps detailing the interactions on the subsequent screens. With that,
   we’ve been able to test the integration of all the components for a given
   flow, all written in widget-test-like style without needing to build out the
   entire app. This test could be added into our suite of other tests and run
   with each commit. Back to the bigger picture Integration testing in flutter
   can be daunting due to how heavy the `flutter_driver`/`integration_test`
   solutions are with their UI testing strategies. We were able to overcome this
   and begin filling out the middle level of our testing pyramid by adding
   structure on top of the widget testing API that allows us to test full flows
   from start to finish. When pursuing this ourselves, we found it valuable to
   evaluate our testing strategy deficits, identify clear-cut boundaries around
   what code we wanted to test, and establish standards around what flows
   through the app should be tested. By going down the path of integration
   testing, we’ve been able to increase confidence in everyday changes as well
   as map out a plan for eliminating our manual test cases.
   11 min read


 * WHY (AND HOW) BETTERMENT IS USING JULIA
   
   Why (And How) Betterment Is Using Julia Betterment is using Julia to solve
   our own version of the “two-language problem." At Betterment, we’re
   using Julia to power the projections and recommendations we provide to help
   our customers achieve their financial goals. We’ve found it to be a great
   solution to our own version of the “two-language problem”–the idea that the
   language in which it is most convenient to write a program is not necessarily
   the language in which it makes the most sense to run that program. We’re
   excited to share the approach we took to incorporating it into our stack and
   the challenges we encountered along the way. Working behind the scenes, the
   members of our Quantitative Investing team bring our customers the
   projections and recommendations they rely on for keeping their goals
   on-track. These hard-working and talented individuals spend a large portion
   of their time developing models, researching new investment ideas and
   maintaining our research libraries. While they’re not engineers, their jobs
   definitely involve a good amount of coding. Historically, the team has
   written code mostly in a research environment, implementing proof-of-concept
   models that are later translated into production code with help from the
   engineering team. Recently, however, we’ve invested significant resources in
   modernizing this research pipeline by converting our codebase from R to Julia
   and we’re now able to ship updates to our quantitative models quicker, and
   with less risk of errors being introduced in translation. Currently, Julia
   powers all the projections shown inside our app, as well as a lot of the
   advice we provide to our customers. The Julia library we built for this
   purpose serves around 18 million requests per day, and very efficiently at
   that. Examples of projections and recommendations at Betterment. Does not
   reflect any actual portfolio and is not a guarantee of performance. Why
   Julia? At QCon London 2019, Steve Klabnik gave a great talk on how the
   developers of the Rust programming language view tradeoffs in programming
   language design. The whole talk is worth a watch, but one idea that really
   resonated with us is that programming language design—and programming
   language choice—is a reflection of what the end-users of that language value
   and not a reflection of the objective superiority of one language over
   another. Julia is a newer language that looked like a perfect fit for the
   investing team for a number of reasons: Speed. If you’ve heard one thing
   about Julia, it’s probably about it’s blazingly fast performance. For us,
   speed is important as we need to be able to provide real-time advice to our
   customers by incorporating their most up-to-date financial scenario in our
   projections and recommendations. It is also important in our research code
   where the iterative nature of research means we often have to re-run
   financial simulations or models multiple times with slight tweaks.
   Dynamicism. While speed of execution is important, we also require a dynamic
   language that allows us to test out new ideas and prototype rapidly. Julia
   ticks the box for this requirement as well by using a just-in-time
   compiler that accommodates both interactive and non-interactive workflows
   well. Julia also has a very rich type system where researchers can build
   prototypes without type declarations, and then later refactoring the code
   where needed with type declarations for dispatch or clarity. In either case,
   Julia is usually able to generate performant compiled code that we can run in
   production. Relevant ecosystem. While the nascency of Julia as a language
   means that the community and ecosystem is much smaller than those of other
   languages, we found that the code and community oversamples on the type of
   libraries that we care about. Julia has excellent support for technical
   computing and mathematical modelling. Given these reasons, Julia is the
   perfect language to serve as a solution to the “two-language problem”. This
   concept is oft-quoted in Julian circles and is perfectly exemplified by the
   previous workflow of our team: Investing Subject Matter Experts (SMEs) write
   domain-specific code that’s solely meant to serve as research code, and that
   code then has to be translated into some more performant language for use in
   production. Julia solves this issue by making it very simple to take a piece
   of research code and refactor it for production use. Our approach We decided
   to build our Julia codebase inside a monorepo, with separate packages for
   each conceptual project we might work on, such as interest rate models,
   projections, social security amount calculations and so on. This works well
   from a development perspective, but we soon faced the question of how best to
   integrate this code with our production code, which is mostly developed in
   Ruby. We identified two viable alternatives: Build a thin web service that
   will accept HTTP requests, call the underlying Julia functions, and then
   return a HTTP response. Compile the Julia code into a shared library, and
   call it directly from Ruby using FFI. Option 1 is a very common pattern, and
   actually quite similar to what had been the status quo at Betterment, as most
   of the projections and recommendation code existed in a JavaScript service.
   It may be surprising then to learn that we actually went with Option 2. We
   were deeply attracted to the idea of being able to fully integration-test our
   projections and recommendations working within our actual app (i.e. without
   the complication of a service boundary). Additionally, we wanted an
   integration that we could spin-up quickly and with low ongoing cost; there’s
   some fixed cost to getting a FFI-embed working right—but once you do, it’s an
   exceedingly low cost integration to maintain. Fully-fledged services require
   infrastructure to run and are (ideally) supported by a full team of
   engineers. That said, we recognize the attractive properties of the more
   well-trodden Option 1 path and believe it could be the right solution in a
   lot of scenarios (and may become the right solution for us as our usage of
   Julia continues to evolve). Implementation Given how new Julia is, there was
   minimal literature on true interoperability with other programming languages
   (particularly high-level languages–Ruby, Python, etc). But we saw that the
   right building blocks existed to do what we wanted and proceeded with the
   confidence that it was theoretically possible. As mentioned earlier, Julia is
   a just-in-time compiled language, but it’s possible to compile Julia code
   ahead-of-time using PackageCompiler.jl. We built an additional package into
   our monorepo whose sole purpose was to expose an API for our Ruby
   application, as well as compile that exposed code into a C shared library.
   The code in this package is the glue between our pure Julia functions and the
   lower level library interface—it’s responsible for defining the functions
   that will be exported by the shared library and doing any necessary
   conversions on input/output. As an example, consider the following simple
   Julia function which sorts an array of numbers using the insertion
   sort algorithm: In order to be able to expose this in a shared library, we
   would wrap it like this: Here we’ve simplified memory management by requiring
   the caller to allocate memory for the result, and implemented primitive
   exception handling (see Challenges & Pitfalls below). On the Ruby end, we
   built a gem which wraps our Julia library and attaches to it using Ruby-FFI.
   The gem includes a tiny Julia project with the API library as it’s only
   dependency. Upon gem installation, we fetch the Julia source and compile it
   as a native extension. Attaching to our example function with Ruby-FFI is
   straightforward: From here, we could begin using our function, but it
   wouldn’t be entirely pleasant to work with–converting an input array to a
   pointer and processing the result would require some tedious boilerplate.
   Luckily, we can use Ruby’s powerful metaprogramming abilities to abstract all
   that away–creating a declarative way to wrap an arbitrary Julia function
   which results in a familiar and easy-to-use interface for Ruby developers. In
   practice, that might look something like this: Resulting in a function for
   which the fact that the underlying implementation is in Julia has been
   completely abstracted away: Challenges & Pitfalls Debugging an FFI
   integration can be challenging; any misconfiguration is likely to result in
   the dreaded segmentation fault–the cause of which can be difficult to hunt
   down. Here are a few notes for practitioners about some nuanced issues we ran
   into, that will hopefully save you some headaches down the line: The Julia
   runtime has to be initialized before calling the shared library. When loading
   the dynamic library (whether through Ruby-FFI or some other invocation of
   `dlopen`), make sure to pass the flags `RTLD_LAZY` and `RTLD_GLOBAL`
   (`ffi_lib_flags :lazy, :global` in Ruby-FFI). If embedding your Julia library
   into a multi-threaded application, you’ll need additional tooling to only
   initialize and make calls into the Julia library from a single thread, as
   multiple calls to `jl_init` will error. We use a multi-threaded web server
   for our production application, and so when we make a call into the Julia
   shared library, we push that call onto a queue where it gets picked up and
   performed by a single executor thread which then communicates the result back
   to the calling thread using a promise object. Memory management–if you’ll be
   passing anything other than primitive types back from Julia to Ruby (e.g.
   pointers to more complex objects), you’ll need to take care to ensure the
   memory containing the data you’re passing back isn’t cleared by the Julia
   garbage collector prior to being read on the Ruby side. Different approaches
   are possible. Perhaps the simplest is to have the Ruby side allocate the
   memory into which the Julia function should write it’s result (and pass the
   Julia function a pointer to that memory). Alternatively, if you want to
   actually pass complex objects out, you’ll have to ensure Julia holds a
   reference to the objects beyond the life of the function, in order to keep
   them from being garbage collected. And then you’ll probably want to expose a
   way for Ruby to instruct Julia to clean up that reference (i.e. free the
   memory) when it’s done with it (Ruby-FFI has good support for triggering a
   callback when an object goes out-of-scope on the Ruby side). Exception
   handling–conveying unhandled exceptions across the FFI boundary is generally
   not possible. This means any unhandled exception occurring in your Julia code
   will result in a segmentation fault. To avoid this, you’ll probably want to
   implement catch-all exception handling in your shared library exposed
   functions that will catch any exceptions that occur and return some context
   about the error to the caller (minimally, a boolean indicator of
   success/failure). Tooling To simplify development, we use a lot of tooling
   and infrastructure developed both in-house and by the Julia community. Since
   one of the draws of using Julia in the first place is the performance of the
   code, we make sure to benchmark our code during every pull request for
   potential performance regressions using the BenchmarkTools.jl package. To
   facilitate versioning and sharing of our Julia packages internally (e.g. to
   share a version of the Ruby-API package with the Ruby gem which wraps it) we
   also maintain a private package registry. The registry is a separate Github
   repository, and we use tooling from the Registrator.jl package to register
   new versions. To process registration events, we maintain a registry server
   on an EC2 instance provisioned through Terraform, so updates to the
   configuration are as easy as running a single `terraform apply` command. Once
   a new registration event is received, the registry server opens a pull
   request to the Julia registry. There, we have built in automated testing that
   resolves the version of the package that is being tested, looks up any
   reverse dependencies of that package, resolves the compatibility bounds of
   those packages to see if the newly registered version could lead to a
   breaking change, and if so, runs the full test suites of the reverse
   dependencies. By doing this, we can ensure that when we release a patch or
   minor version of one of our packages, we can ensure that it won’t break any
   packages that depend on it at registration time. If it would, the user is
   instead forced to either fix the changes that lead to a downstream breakage,
   or to modify the registration to be a major version increase. Takeaways
   Though our venture into the Julia world is still relatively young compared to
   most of the other code at Betterment, we have found Julia to be a perfect fit
   in solving our two-language problem within the Investing team. Getting the
   infrastructure into a production-ready format took a bit of tweaking, but we
   are now starting to realize a lot of the benefits we hoped for when setting
   out on this journey, including faster development of production ready models,
   and a clear separation of responsibilities between the SMEs on the Investing
   team who are best suited for designing and specifying the models, and the
   engineering team who have the knowledge on how to scale that code into a
   production-grade library. The switch to Julia has allowed us not only to
   optimize and speed up our code by multiple orders of magnitude, but also has
   given us the environment and ecosystem to explore ideas that would simply not
   be possible in our previous implementations.
   11 min read


 * INTRODUCING “DELAYED”: RESILIENT BACKGROUND JOBS ON RAILS
   
   Introducing “Delayed”: Resilient Background Jobs on Rails In the past 24
   hours, a Ruby on Rails application at Betterment performed somewhere on the
   order of 10 million asynchronous tasks. While many of these tasks merely sent
   a transactional email, or fired off an iOS or Android push notification,
   plenty involved the actual movement of money—deposits, withdrawals,
   transfers, rollovers, you name it—while others kept Betterment’s information
   systems up-to-date—syncing customers’ linked account information, logging
   events to downstream data consumers, the list goes on. What all of these
   tasks had in common (aside from being, well, really important to our
   business) is that they were executed via a database-backed job-execution
   framework called Delayed, a newly-open-sourced library that we’re excited to
   announce… right now, as part of this blog post! And, yes, you heard that
   right. We run millions of these so-called “background jobs” daily using a
   SQL-backed queue—not Redis, or RabbitMQ, or Kafka, or, um, you get the
   point—and we’ve very intentionally made this choice, for reasons that will
   soon be explained! But first, let’s back up a little and answer a few basic
   questions. Why Background Jobs? In other words, what purpose do these
   background jobs serve? And how does running millions of them per day help us?
   Well, when building web applications, we (as web application developers)
   strive to build pages that respond quickly and reliably to web requests. One
   might say that this is the primary goal of any webapp—to provide a set of
   HTTP endpoints that reliably handle all the success and failure cases within
   a specified amount of time, and that don’t topple over under high-traffic
   conditions. This is made possible, at least in part, by the ability to
   perform units of work asynchronously. In our case, via background jobs. At
   Betterment, we rely on said jobs extensively, to limit the amount of work
   performed during the “critical path” of each web request, and also to perform
   scheduled tasks at regular intervals. Our reliance on background jobs even
   allows us to guarantee the eventual consistency of our distributed systems,
   but more on that later. First, let’s take a look at the underlying framework
   we use for enqueuing and executing said jobs. Frameworks Galore! And, boy
   howdy, are there plenty of available frameworks for doing this kind of thing!
   Ruby on Rails developers have the choice of resque, sidekiq, que, good_job,
   delayed_job, and now... delayed, Betterment’s own flavor of job queue!
   Thankfully, Rails provides an abstraction layer on top of these, in the form
   of the Active Job framework. This, in theory, means that all jobs can be
   written in more or less the same way, regardless of the job-execution
   backend. Write some jobs, pick a queue backend with a few desirable features
   (priorities, queues, etc), run some job worker processes, and we’re off to
   the races! Sounds simple enough! Unfortunately, if it were so simple we
   wouldn’t be here, several paragraphs into a blog post on the topic. In
   practice, deciding on a job queue is more complicated than that. Quite a bit
   more complicated, because each backend framework provides its own set of
   trade-offs and guarantees, many of which will have far-reaching implications
   in our codebase. So we’ll need to consider carefully! How To Choose A Job
   Framework The delayed rubygem is a fork of both delayed_job and delayed_job
   activerecord, with several targeted changes and additions, including numerous
   performance & scalability optimizations that we’ll cover towards the end of
   this post. But first, in order to explain how Betterment arrived where we
   did, we must explain what it is that we need our job queue to be capable of,
   starting with the jobs themselves. You see, a background job essentially
   represents a tiny contract. Each consists of some action being taken for / by
   / on behalf of / in the interest of one or more of our customers, and that
   must be completed within an appropriate amount of time. Betterment’s
   engineers decided, therefore, that it was critical to our mission that we be
   capable of handling each and every contract as reliably as possible. In other
   words, every job we attempt to enqueue must, eventually, reach some form of
   resolution. Of course, job “resolution” doesn’t necessarily mean success.
   Plenty of jobs may complete in failure, or simply fail to complete, and may
   require some form of automated or manual intervention. But the point is that
   jobs are never simply dropped, or silently deleted, or lost to the
   cyber-aether, at any point, from the moment we enqueue them to their eventual
   resolution. This general property—the ability to enqueue jobs safely and
   ensure their eventual resolution—is the core feature that we have optimized
   for. Let’s call it resilience. Optimizing For Resilience Now, you might be
   thinking, shouldn’t all of these ActiveJob backends be, at the very least,
   safe to use? Isn’t “resilience” a basic feature of every backend, except
   maybe the test/development ones? And, yeah, it’s a fair question. As the
   author of this post, my tactful attempt at an answer is that, well, not all
   queue backends optimize for the specific kind of end-to-end resilience that
   we look for. Namely, the guarantee of at-least-once execution. Granted,
   having “exactly-once” semantics would be preferable, but if we cannot be sure
   that our jobs run at least once, then we must ask ourselves: how would we
   know if something didn’t run at all? What kind of monitoring would be
   necessary to detect such a failure, across all the features of our app, and
   all the types of jobs it might try to run? These questions open up an
   entirely different can of worms, one that we would prefer remained firmly
   sealed. Remember, jobs are contracts. A web request was made, code was
   executed, and by enqueuing a job, we said we'd eventually do something. Not
   doing it would be... bad. Not even knowing we didn't do it... very bad. So,
   at the very least, we need the guarantee of at-least-once execution. Building
   on at-least-once guarantees If we know for sure that we’ll fully execute all
   jobs at least once, then we can write our jobs in such a way that makes the
   at-least-once approach reliable and resilient to failure. Specifically, we’ll
   want to make our jobs idempotent—basically, safely retryable, or
   resumable—and that is on us as application developers to ensure on a
   case-by-case basis. Once we solve this very solvable idempotency problem,
   then we’re on track for the same net result as an “exactly-once” approach,
   even if it takes a couple extra attempts to get there. Furthermore, this
   combination of at-least-once execution and idempotency can then be used in a
   distributed systems context, to ensure the eventual consistency of changes
   across multiple apps and databases. Whenever a change occurs in one system,
   we can enqueue idempotent jobs notifying the other systems, and retry them
   until they succeed, or until we are left with stuck jobs that must be
   addressed operationally. We still concern ourselves with other distributed
   systems pitfalls like event ordering, but we don’t have to worry about
   messages or events disappearing without a trace due to infrastructure blips.
   So, suffice it to say, at-least-once semantics are crucial in more ways than
   one, and not all ActiveJob backends provide them. Redis-based queues, for
   example, can only be as durable (the “D” in “ACID”) as the underlying
   datastore, and most Redis deployments intentionally trade-off some durability
   for speed and availability. Plus, even when running in the most durable mode,
   Redis-based ActiveJob backends tend to dequeue jobs before they are executed,
   meaning that if a worker process crashes at the wrong moment, or is
   terminated during a code deployment, the job is lost. These frameworks have
   recently begun to move away from this LPOP-based approach, in favor of using
   RPOPLPUSH (to atomically move jobs to a queue that can then be monitored for
   orphaned jobs), but outside of Sidekiq Pro, this strategy doesn’t yet seem to
   be broadly available. And these job execution guarantees aren’t the only area
   where a background queue might fail to be resilient. Another big resilience
   failure happens far earlier, during the enqueue step. Enqueues and
   Transactions See, there’s a major “gotcha” that may not be obvious from the
   list of ActiveJob backends. Specifically, it’s that some queues rely on an
   app’s primary database connection—they are “database-backed,” against the
   app’s own database—whereas others rely on a separate datastore, like Redis.
   And therein lies the rub, because whether or not our job queue is colocated
   with our application data will greatly inform the way that we write any
   job-adjacent code. More precisely, when we make use of database transactions
   (which, when we use ActiveRecord, we assuredly do whether we realize it or
   not), a database-backed queue will ensure that enqueued jobs will either
   commit or roll back with the rest of our ActiveRecord-based changes. This is
   extremely convenient, to say the least, since most jobs are enqueued as part
   of operations that persist other changes to our database, and we can in turn
   rely on the all-or-nothing nature of transactions to ensure that neither the
   job nor the data mutation is persisted without the other. Meanwhile, if our
   queue existed in a separate datastore, our enqueues will be completely
   unaware of the transaction, and we’d run the risk of enqueuing a job that
   acts on data that was never committed, or (even worse) we’d fail to enqueue a
   job even when the rest of the transactional data was committed. This would
   fundamentally undermine our at-least-once execution guarantees! We already
   use ACID-compliant datastores to solve these precise kinds of data
   persistence issues, so with the exception of really, really high volume
   operations (where a lot of noise and data loss can—or must—be tolerated),
   there’s really no reason not to enqueue jobs co-transactionally with other
   data changes. And this is precisely why, at Betterment, we start each
   application off with a database-backed queue, co-located with the rest of the
   app’s data, with the guarantee of at-least-once job execution. By the way,
   this is a topic I could talk about endlessly, so I’ll leave it there for now.
   If you’re interested in hearing me say even more about resilient data
   persistence and job execution, feel free to check out Can I break this?, a
   talk I gave at RailsConf 2021! But in addition to the resiliency guarantees
   outlined above, we’ve also given a lot of attention to the operability and
   the scalability of our queue. Let’s cover operability first. Maintaining a
   Queue in the Long Run Operating a queue means being able to respond to errors
   and recover from failures, and also being generally able to tell when things
   are falling behind. (Essentially, it means keeping our on-call engineers
   happy.) We do this in two ways: with dashboards, and with alerts. Our
   dashboards come in a few parts. Firstly, we host a private fork of
   delayedjobweb, a web UI that allows us to see the state of our queues in real
   time and drill down to specific jobs. We’ve extended the gem with information
   on “erroring” jobs (jobs that are in the process of retrying but have not yet
   permanently failed), as well as the ability to filter by additional fields
   such as job name, priority, and the owning team (which we store in an
   additional column). We also maintain two other dashboards in our cloud
   monitoring service, DataDog. These are powered by instrumentation and
   continuous monitoring features that we have added directly to the delayed gem
   itself. When jobs run, they emit ActiveSupport::Notification events that we
   subscribe to and then forward along to a StatsD emitter, typically as
   “distribution” or “increment” metrics. Additionally, we’ve included a
   continuous monitoring process that runs aggregate queries, tagged and grouped
   by queue and priority, and that emits similar notifications that become
   “gauge” metrics. Once all of these metrics make it to DataDog, we’re able to
   display a comprehensive timeboard that graphs things like average job
   runtime, throughput, time spent waiting in the queue, error rates, pickup
   query performance, and even some top 10 lists of slowest and most erroring
   jobs. On the alerting side, we have DataDog monitors in place for overall
   queue statistics, like max age SLA violations, so that we can alert and page
   ourselves when queues aren’t working off jobs quickly enough. Our SLAs are
   actually defined on a per-priority basis, and we’ve added a feature to the
   delayed gem called “named priorities” that allows us to define
   priority-specific configs. These represent integer ranges (entirely
   orthogonal to queues), and default to “interactive” (0-9), “user visible”
   (10-19), “eventual” (20-29), and “reporting” (30+), with default alerting
   thresholds focused on retry attempts and runtime. There are plenty of other
   features that we’ve built that haven’t made it into the delayed gem quite
   yet. These include the ability for apps to share a job queue but run separate
   workers (i.e. multi-tenancy), team-level job ownership annotations, resumable
   bulk orchestration and batch enqueuing of millions of jobs at once,
   forward-scheduled job throttling, and also the ability to encrypt the inputs
   to jobs so that they aren’t visible in plaintext in the database. Any of
   these might be the topic for a future post, and might someday make their way
   upstream into a public release! But Does It Scale? As we've grown, we've had
   to push at the limits of what a database-backed queue can accomplish. We’ve
   baked several improvements into the delayed gem, including a highly
   optimized, SKIP LOCKED-based pickup query, multithreaded workers, and a novel
   “max percent of max age” metric that we use to automatically scale our worker
   pool up to ~3x its baseline size when queues need additional concurrency.
   Eventually, we could explore ways of feeding jobs through to higher
   performance queues downstream, far away from the database-backed workers. We
   already do something like this for some jobs with our journaled gem, which
   uses AWS Kinesis to funnel event payloads out to our data warehouse (while at
   the same time benefiting from the same at-least-once delivery guarantees as
   our other jobs!). Perhaps we’d want to generalize the approach even further.
   But the reality of even a fully "scaled up" queue solution is that, if it is
   doing anything particularly interesting, it is likely to be database-bound. A
   Redis-based queue will still introduce DB pressure if its jobs execute
   anything involving ActiveRecord models, and solutions must exist to throttle
   or rate limit these jobs. So even if your queue lives in an entirely separate
   datastore, it can be effectively coupled to your DB's IOPS and CPU
   limitations. So does the delayed approach scale? To answer that question,
   I’ll leave you with one last takeaway. A nice property that we’ve observed at
   Betterment, and that might apply to you as well, is that the number of jobs
   tends to scale proportionally with the number of customers and accounts. This
   means that when we naturally hit vertical scaling limits, we could, for
   example, shard or partition our job table alongside our users table. Then,
   instead of operating one giant queue, we’ll have broken things down to a
   number of smaller queues, each with their own worker pools, emitting metrics
   that can be aggregated with almost the same observability story we have
   today. But we’re getting into pretty uncharted territory here, and, as
   always, your mileage may vary! Try it out! If you’ve read this far, we’d
   encourage you to take the leap and test out the delayed gem for yourself!
   Again, it combines both DelayedJob and its ActiveRecord backend, and should
   be more or less compatible with Rails apps that already use ActiveJob or
   DelayedJob. Of course, it may require a bit of tuning on your part, and we’d
   love to hear how it goes! We’ve also built an equivalent library in Java,
   which may also see a public release at some point. (To any Java devs reading
   this: let us know if that interests you!) Already tried it out? Any features
   you’d like to see added? Let us know what you think!
   14 min read


 * FOCUSING ON WHAT MATTERS: USING SLOS TO PURSUE USER HAPPINESS
   
   Focusing on What Matters: Using SLOs to Pursue User Happiness Proper
   reliability is the greatest operational requirement for any service. If the
   service doesn’t work as intended, no user (or engineer) will be happy. This
   is where SLOs come in. The umbrella term “observability” covers all manner of
   subjects, from basic telemetry to logging, to making claims about longer-term
   performance in the shape of service level objectives (SLOs) and occasionally
   service level agreements (SLAs). Here I’d like to discuss some philosophical
   approaches to defining SLOs, explain how they help with prioritization, and
   outline the tooling currently available to Betterment Engineers to make this
   process a little easier. What is an SLO? At a high level, a service level
   objective is a way of measuring the performance of, correctness of, validity
   of, or efficacy of some component of a service over time by comparing the
   functionality of specific service level indicators (metrics of some kind)
   against a target goal. For example, 99.9% of requests complete with a 2xx,
   3xx or 4xx HTTP code within 2000ms over a 30 day period The service level
   indicator (SLI) in this example is a request completing with a status code of
   2xx, 3xx or 4xx and with a response time of at most 2000ms. The SLO is the
   target percentage, 99.9%. We reach our SLO goal if, during a 30 day period,
   99.9% of all requests completed with one of those status codes and within
   that range of latency. If our service didn’t succeed at that goal, the
   violation overflow — called an “error budget” — shows us by how much we fell
   short. With a goal of 99.9%, we have 40 minutes and 19 seconds of downtime
   available to us every 28 days. Check out more error budget math here. If we
   fail to meet our goals, it’s worthwhile to step back and understand why. Was
   the error budget consumed by real failures? Did we notice a number of false
   positives? Maybe we need to reevaluate the metrics we’re collecting, or
   perhaps we’re okay with setting a lower target goal because there are other
   targets that will be more important to our customers. It’s all about the
   customer This is where the philosophy of defining and keeping track of SLOs
   comes into play. It starts with our users - Betterment users - and trying to
   provide them with a certain quality of service. Any error budget we set
   should account for our fiduciary responsibilities, and should guarantee that
   we do not cause an irresponsible impact to our customers. We also assume that
   there is a baseline degree of software quality baked-in, so error budgets
   should help us prioritize positive impact opportunities that go beyond these
   baselines. Sometimes there are a few layers of indirection between a service
   and a Betterment customer, and it takes a bit of creativity to understand
   what aspects of the service directly affects them. For example, an engineer
   on a backend or data-engineering team provides services that a user-facing
   component consumes indirectly. Or perhaps the users for a service are
   Betterment engineers, and it’s really unclear how that work affects the
   people who use our company’s products. It isn’t that much of a stretch to
   claim that an engineer’s level of happiness does have some effect on the
   level of service they’re capable of providing a Betterment customer! Let’s
   say we’ve defined some SLOs and notice they are falling behind over time. We
   might take a look at the metrics we’re using (the SLIs), the failures that
   chipped away at our target goal, and, if necessary, re-evaluate the relevancy
   of what we’re measuring. Do error rates for this particular endpoint directly
   reflect an experience of a user in some way - be it a customer, a
   customer-facing API, or a Betterment engineer? Have we violated our error
   budget every month for the past three months? Has there been an increase in
   Customer Service requests to resolve problems related to this specific aspect
   of our service? Perhaps it is time to dedicate a sprint or two to
   understanding what’s causing degradation of service. Or perhaps we notice
   that what we’re measuring is becoming increasingly irrelevant to a customer
   experience, and we can get rid of the SLO entirely! Benefits of measuring the
   right things, and staying on target The goal of an SLO based approach to
   engineering is to provide data points with which to have a reasonable
   conversation about priorities (a point that Alex Hidalgo drives home in his
   book Implementing Service Level Objectives). In the case of services not
   performing well over time, the conversation might be “focus on improving
   reliability for service XYZ.” But what happens if our users are super happy,
   our SLOs are exceptionally well-defined and well-achieved, and we’re ahead of
   our roadmap? Do we try to get that extra 9 in our target - or do we use the
   time to take some creative risks with the product (feature-flagged, of
   course)? Sometimes it’s not in our best interest to be too focused on
   performance, and we can instead “use up our error budget” by rolling out some
   new A/B test, or upgrading a library we’ve been putting off for a while, or
   testing out a new language in a user-facing component that we might not
   otherwise have had the chance to explore. The tools to get us there Let’s
   dive into some tooling that the SRE team at Betterment has built to help
   Betterment engineers easily start to measure things. Collecting the SLIs and
   Creating the SLOs The SRE team has a web-app and CLI called coach that we use
   to manage continuous integration (CI) and continuous delivery (CD), among
   other things. We’ve talked about Coach in the past here and here. At a high
   level, the Coach CLI generates a lot of yaml files that are used in all sorts
   of places to help manage operational complexity and cloud resources for
   consumer-facing web-apps. In the case of service level indicators (basically
   metrics collection), the Coach CLI provides commands that generate yaml files
   to be stored in GitHub alongside application code. At deploy time, the Coach
   web-app consumes these files and idempotently create Datadog monitors, which
   can be used as SLIs (service level indicators) to inform SLOs, or as
   standalone alerts that need immediate triage every time they're triggered. In
   addition to Coach explicitly providing a config-driven interface for
   monitors, we’ve also written a couple handy runtime specific methods that
   result in automatic instrumentation for Rails or Java endpoints. I’ll discuss
   these more below. We also manage a separate repository for SLO definitions.
   We left this outside of application code so that teams can modify SLO target
   goals and details without having to redeploy the application itself. It also
   made visibility easier in terms of sharing and communicating different team’s
   SLO definitions across the org. Monitors in code Engineers can choose either
   StatsD or Micrometer to measure complicated experiences with custom metrics,
   and there’s various approaches to turning those metrics directly into
   monitors within Datadog. We use Coach CLI driven yaml files to support metric
   or APM monitor types directly in the code base. Those are stored in a file
   named .coach/datadog_monitors.yml and look like this: monitors: - type:
   metric metric: "coach.ci_notification_sent.completed.95percentile" name:
   "coach.ci_notification_sent.completed.95percentile SLO" aggregate: max owner:
   sre alert_time_aggr: on_average alert_period: last_5m alert_comparison: above
   alert_threshold: 5500 - type: apm name: "Pull Requests API endpoint violating
   SLO" resource_name: api::v1::pullrequestscontroller_show max_response_time:
   900ms service_name: coach page: false slack: false It wasn’t simple to make
   this abstraction intuitive between a Datadog monitor configuration and a user
   interface. But this kind of explicit, attribute-heavy approach helped us get
   this tooling off the ground while we developed (and continue to develop)
   in-code annotation approaches. The APM monitor type was simple enough to turn
   into both a Java annotation and a tiny domain specific language (DSL) for
   Rails controllers, giving us nice symmetry across our platforms. . This owner
   method for Rails apps results in all logs, error reports, and metrics being
   tagged with the team’s name, and at deploy time it's aggregated by a Coach
   CLI command and turned into latency monitors with reasonable defaults for
   optional parameters; essentially doing the same thing as our config-driven
   approach but from within the code itself class DeploysController <
   ApplicationController owner "sre", max_response_time: "10000ms", only:
   [:index], slack: false end For Java apps we have a similar interface (with
   reasonable defaults as well) in a tidy little annotation. @Sla
   @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.METHOD) public
   @interface Sla { @AliasFor(annotation = Sla.class) long amount() default
   25_000; @AliasFor(annotation = Sla.class) ChronoUnit unit() default
   ChronoUnit.MILLIS; @AliasFor(annotation = Sla.class) String service() default
   "custody-web"; @AliasFor(annotation = Sla.class) String slackChannelName()
   default "java-team-alerts"; @AliasFor(annotation = Sla.class) boolean
   shouldPage() default false; @AliasFor(annotation = Sla.class) String owner()
   default "java-team"; } Then usage is just as simple as adding the annotation
   to the controller: @WebController("/api/stuff/v1/service_we_care_about")
   public class ServiceWeCareAboutController { @PostMapping("/search")
   @CustodySla(amount = 500) public SearchResponse search(@RequestBody @Valid
   SearchRequest request) {...} } At deploy time, these annotations are scanned
   and converted into monitors along with the config-driven definitions, just
   like our Ruby implementation. SLOs in code Now that we have our metrics
   flowing, our engineers can define SLOs. If an engineer has a monitor tied to
   metrics or APM, then they just need to plug in the monitor ID directly into
   our SLO yaml interface. - last_updated_date: "2021-02-18" approval_date:
   "2021-03-02" next_revisit_date: "2021-03-15" category: latency type: monitor
   description: This SLO covers latency for our CI notifications system -
   whether it's the github context updates on your PRs or the slack
   notifications you receive. tags: - team:sre thresholds: - target: 99.5
   timeframe: 30d warning_target: 99.99 monitor_ids: - 30842606 The interface
   supports metrics directly as well (mirroring Datadog’s SLO types) so an
   engineer can reference any metric directly in their SLO definition, as seen
   here: # availability - last_updated_date: "2021-02-16" approval_date:
   "2021-03-02" next_revisit_date: "2021-03-15" category: availability tags: -
   team:sre thresholds: - target: 99.9 timeframe: 30d warning_target: 99.99
   type: metric description: 99.9% of manual deploys will complete successfully
   over a 30day period. query: # (total_events - bad_events) over total_events
   == good_events/total_events numerator:
   sum:trace.rack.request.hits{service:coach,env:production,resource_name:deployscontroller_create}.as_count()-sum:trace.rack.request.errors{service:coach,env:production,resource_name:deployscontroller_create}.as_count()
   denominator:
   sum:trace.rack.request.hits{service:coach,resource_name:deployscontroller_create}.as_count()
   We love having these SLOs defined in GitHub because we can track who's
   changing them, how they're changing, and get review from peers. It's not
   quite the interactive experience of the Datadog UI, but it's fairly
   straightforward to fiddle in the UI and then extract the resulting
   configuration and add it to our config file. Notifications When we merge our
   SLO templates into this repository, Coach will manage creating SLO resources
   in Datadog and accompanying SLO alerts (that ping slack channels of our
   choice) if and when our SLOs violate their target goals. This is the slightly
   nicer part of SLOs versus simple monitors - we aren’t going to be pinged for
   every latency failure or error rate spike. We’ll only be notified if, over 7
   days or 30 days or even longer, they exceed the target goal we’ve defined for
   our service. We can also set a “warning threshold” if we want to be notified
   earlier when we’re using up our error budget. Fewer alerts means the alerts
   should be something to take note of, and possibly take action on. This is a
   great way to get a good signal while reducing unnecessary noise. If, for
   example, our user research says we should aim for  99.5% uptime, that’s 3h
   21m 36s of downtime available per 28 days. That’s a lot of time we can
   reasonably not react to failures. If we aren’t alerting on those 3 hours of
   errors, and instead just once if we exceed that limit, then we can direct our
   attention toward new product features, platform improvements, or learning and
   development. The last part of defining our SLOs is including a date when we
   plan to revisit that SLO specification. Coach will send us a message when
   that date rolls around to encourage us to take a deeper look at our
   measurements and possibly reevaluate our goals around measuring this part of
   our service. What if SLOs don’t make sense yet? It’s definitely the case that
   a team might not be at the level of operational maturity where defining
   product or user-specific service level objectives is in the cards. Maybe
   their on-call is really busy, maybe there are a lot of manual interventions
   needed to keep their services running, maybe they’re still putting out fires
   and building out their team’s systems. Whatever the case may be, this
   shouldn’t deter them from collecting data. They can define what is called an
   “aspirational” SLO - basically an SLO for an important component in their
   system - to start collecting data over time. They don’t need to define an
   error budget policy, and they don’t need to take action when they fail their
   aspirational SLO. Just keep an eye on it. Another option is to start tracking
   the level of operational complexity for their systems. Perhaps they can set
   goals around "Bug Tracker Inbox Zero" or "Failed Background Jobs Zero" within
   a certain time frame, a week or a month for example. Or they can define some
   SLOs around types of on-call tasks that their team tackles each week. These
   aren’t necessarily true-to-form SLOs but engineers can use this framework and
   tooling provided to collect data around how their systems are operating and
   have conversations on prioritization based on what they discover, beginning
   to build a culture of observability and accountability Conclusion Betterment
   is at a point in its growth where prioritization has become more difficult
   and more important. Our systems are generally stable, and feature development
   is paramount to business success. But so is reliability and performance.
   Proper reliability is the greatest operational requirement for any service2.
   If the service doesn’t work as intended, no user (or engineer) will be happy.
   This is where SLOs come in. SLOs should align with business objectives and
   needs, which will help Product and Engineering Managers understand the direct
   business impact of engineering efforts. SLOs will ensure that we have a solid
   understanding of the state of our services in terms of reliability, and they
   empower us to focus on user happiness. If our SLOs don’t align directly with
   business objectives and needs, they should align indirectly via tracking
   operational complexity and maturity. So, how do we choose where to spend our
   time? SLOs (service level objectives) - including managing their error
   budgets - will permit us - our product engineering teams - to have the right
   conversations and make the right decisions about prioritization and
   resourcing so that we can balance our efforts spent on reliability and new
   product features, helping to ensure the long term happiness and confidence of
   our users (and engineers). 2 Alex Hidalgo, Implementing Service Level
   Objectives
   13 min read


 * FINDING AND PREVENTING RAILS AUTHORIZATION BUGS
   
   Finding and Preventing Rails Authorization Bugs This article walks through
   finding and fixing common Rails authorization bugs. At Betterment, we build
   public facing applications without an authorization framework by following
   three principles, discussed in another blog post. Those three principles are:
   Authorization through Impossibility Authorization through Navigability
   Authorization through Application Boundaries This post will explore the first
   two principles and provide examples of common patterns that can lead to
   vulnerabilities as well as guidance for how to fix them. We will also cover
   the custom tools we’ve built to help avoid these patterns before they can
   lead to vulnerabilities. If you’d like, you can skip ahead to the tools
   before continuing on to the rest of this post. Authorization through
   Impossibility This principle might feel intuitive, but it’s worth reiterating
   that at Betterment we never build endpoints that allow users to access
   another user’s data. There is no /api/socialsecuritynumbers endpoint because
   it is a prime target for third-party abuse and developer error. Similarly,
   even our authorized endpoints never allow one user to peer into another
   user’s object graph. This principle keeps us from ever having the opportunity
   to make some of the mistakes addressed in our next section. We acknowledge
   that many applications out there can’t make the same design decisions about
   users’ data, but as a general principle we recommend reducing the ways in
   which that data can be accessed. If an application absolutely needs to be
   able to show certain data, consider structuring the endpoint in a way such
   that a client can’t even attempt to request another user’s data.
   Authorization through Navigability Rule #1: Authorization should happen in
   the controller and should emerge naturally from table relationships
   originating from the authenticated user, i.e. the “trust root chain”. This
   rule is applicable for all controller actions and is a critical component of
   our security story. If you remember nothing else, remember this. What is a
   “trust root chain”? It’s a term we’ve co-opted from ssl certificate lingo,
   and it’s meant to imply a chain of ownership from the authenticated user to a
   target resource. We can enforce access rules by using the affordances of our
   relational data without the need for any additional “permission” framework.
   Note that association does not imply authorization, and the onus is on the
   developer to ensure that associations are used properly. Consider the
   following controller:     So long as a user is authenticated, they can
   perform the show action on any document (including documents belonging to
   others!) provided they know or can guess its ID - not great! This becomes
   even more dangerous if the Documents table uses sequential ids, as that would
   make it easy for an attacker to start combing through the entire table. This
   is why Betterment has a rule requiring UUIDs for all new tables. This type of
   bug is typically referred to as an Insecure Direct Object Reference
   vulnerability. In short, these bugs allow attackers to access data directly
   using its unique identifiers – even if that data belongs to someone else –
   because the application fails to take authorization into account. We can use
   our database relationships to ensure that users can only see their own
   documents. Assuming a User has many Documents then we would change our
   controller to the following: Now any document_id that doesn’t exist in the
   user’s object graph will raise a 404 and we’ve provided authorization for
   this endpoint without a framework - easy peezy. Rule #2: Controllers should
   pass ActiveRecord models, rather than ids, into the model layer. As a
   corollary to Rule #1, we should ensure that all authorization happens in the
   controller by disallowing model initialization with *_id attributes. This
   rule speaks to the broader goal of authorization being obvious in our code.
   We want to minimize the hops and jumps required to figure out what we’re
   granting access to, so we make sure that it all happens in the controller.
   Consider a controller that links attachments to a given document. Let’s
   assume that a User has many Attachments that can be attached to a Document
   they own. Take a minute and review this controller - what jumps out to you?
   At first glance, it looks like the developer has taken the right steps to
   adhere to Rule #1 via the document method and we’re using strong params, is
   that enough? Unfortunately, it’s not. There’s actually a critical security
   bug here that allows the client to specify any attachment_id, even if they
   don’t own that attachment - eek! Here’s simple way to resolve our bug: Now
   before we create a new AttachmentLink, we verify that the attachment_id
   specified actually belongs to the user and our code will raise a 404
   otherwise - perfect! By keeping the authorization up front in the controller
   and out of the model, we’ve made it easier to reason about. If we buried the
   authorization within the model, it would be difficult to ensure that the
   trust-root chain is being enforced – especially if the model is used by
   multiple controllers that handle authorization inconsistently. Reading the
   AttachmentLink model code, it would be clear that it takes an attachment_id
   but whether authorization has been handled or not would remain a bit of a
   mystery. Automatically Detecting Vulnerabilities At Betterment, we strive to
   make it easy for engineers to do the right thing – especially when it comes
   to security practices. Given the formulaic patterns of these bugs, we decided
   static analysis would be a worthwhile endeavor. Static analysis can help not
   only with finding existing instances of these vulnerabilities, but also
   prevent new ones from being introduced. By automating detection of these “low
   hanging fruit” vulnerabilities, we can free up engineering effort during
   security reviews and focus on more interesting and complex issues. We decided
   to lean on RuboCop for this work. As a Rails shop, we already make heavy use
   of RuboCop. We like it because it’s easy to introduce to a codebase,
   violations break builds in clear and actionable ways, and disabling specific
   checks requires engineers to comment their code in a way that makes it easy
   to surface during code review. Keeping rules #1 and #2 in mind, we’ve created
   two cops: Betterment/UnscopedFind and Betterment/AuthorizationInController;
   these will flag any models being retrieved and created in potentially unsafe
   ways, respectively. At a high level, these cops track user input (via
   params.permit et al.) and raise offenses if any of these values get passed
   into methods that could lead to a vulnerability (e.g. model initialization,
   find calls, etc). You can find these cops here. We’ve been using these cops
   for over a year now and have had a lot of success with them. In addition to
   these two, the Betterlint repository contains other custom cops we’ve written
   to enforce certain patterns -- both security related as well as more general
   ones. We use these cops in conjunction with the default RuboCop
   configurations for all of our Ruby projects. Let’s run the first cop,
   Betterment/UnscopedFind against DocumentsController from above: $ rubocop
   app/controllers/documents_controller.rb Inspecting 1 file C   Offenses:  
   app/controllers/documents_controller.rb:3:17: C: Betterment/UnscopedFind:
   Records are being retrieved directly using user input. Please query for the
   associated record in a way that enforces authorization (e.g. "trust-root
   chaining").   INSTEAD OF THIS: Post.find(params[:post_id])   DO THIS:
   currentuser.posts.find(params[:postid])   See here for more information on
   this error:
   https://github.com/Betterment/betterlint/blob/main/README.md#bettermentunscopedfind
     @document = Document.find(params[:document_id])
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   1 file inspected, 1 offense detected
   The cop successfully located the vulnerability. If we attempted to deploy
   this code, RuboCop would fail the build, preventing the code from going out
   while letting reviewers know exactly why. Now let’s try running
   Betterment/AuthorizationInController on the AttachmentLink example from
   earlier: $ rubocop app/controllers/documents/attachments_controller.rb
   Inspecting 1 file C   Offenses:  
   app/controllers/documents/attachments_controller.rb:3:24: C:
   Betterment/AuthorizationInController: Model created/updated using unsafe
   parameters. Please query for the associated record in a way that enforces
   authorization (e.g. "trust-root chaining"), and then pass the resulting
   object into your model instead of the unsafe parameter.   INSTEAD OF THIS:
   postparameters = params.permit(:albumid, :caption) Post.new(post_parameters)
     DO THIS: album = currentuser.albums.find(params[:albumid]) post_parameters
   = params.permit(:caption).merge(album: album) Post.new(post_parameters)   See
   here for more information on this error:
   https://github.com/Betterment/betterlint/blob/main/README.md#bettermentauthorizationincontroller
     AttachmentLink.new(create_params.merge(document: document)).save!
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   1 file inspected, 1 offense
   detected The model initialization was flagged because it was seen using
   create_params, which contains user input. Like with the other cop, this would
   fail the build and prevent the code from making it to production. You may
   have noticed that unlike the previous example, the vulnerable code doesn’t
   directly reference a params.permit call or any of the parameter names, but
   the code was still flagged. This is because both of the cops keep a little
   bit of state to ensure they have the appropriate context necessary when
   analyzing potentially unsafe function calls. We also made sure that when
   developing these cops that we tested them with real code samples and not just
   contrived scenarios that no developer would actually ever attempt. False
   Positives With any type of static analysis, there’s bound to be false
   positives. When working on these cops, we narrowed down false positives to
   two scenarios: The flagged code could be considered insecure only in other
   contexts: e.g. the application or models in question don’t have a concept of
   “private” data The flagged code isn’t actually insecure: e.g. the
   initialization happens to take a parameter whose name ends in _id but it
   doesn’t refer to a unique identifier for any objects In both these cases, the
   developer should feel empowered to either rewrite the line in question or
   locally disable the cop, both of which will prevent the code from being
   flagged. Normally we’d consider opting out of security analysis to be an
   unsafe thing to do, but we actually like the way RuboCop handles this because
   it can help reduce some code review effort; the first solution eliminates the
   vulnerable-looking pattern (even if it wasn’t a vulnerability to begin with)
   while the second one signals to reviewers that they should confirm this code
   is actually safe (making it easy to pinpoint areas of focus). Testing & Code
   Review Strategies Rubocop and Rails tooling can only get us so far in
   mitigating authorization bugs. The remainder falls on the shoulders of the
   developer and their peers to be cognizant of the choices they are making when
   shipping new application controllers. In light of that, we’ll cover some
   helpful strategies for keeping authorization front of mind. Testing When
   writing request specs for a controller action, write a negative test case to
   prove that attempts to circumvent your authorization measures return a 404.
   For example, consider a request spec for our
   Documents::AttachmentsController: These test cases are an inexpensive way to
   prove to yourself and your reviewers that you’ve considered the authorization
   context of your controller action and accounted for it properly. Like all of
   our tests, this functions both as regression prevention and as documentation
   of your intent. Code Review Our last line of defense is code review. Security
   is the responsibility of every engineer, and it’s critical that our reviewers
   keep authorization and security in mind when reviewing code. A few simple
   questions can facilitate effective security review of a PR that touches a
   controller action: Who is the authenticated user? What resource is the
   authenticated user operating on? Is the authenticated user authorized to
   operate on the resource in accordance with Rule #1? What parameters is the
   authenticated user submitting? Where are we authorizing the user’s access to
   those parameters? Do all associations navigated in the controller properly
   signify authorization? Getting in the habit of asking these questions during
   code review should lead to more frequent conversations about security and
   data access. Our hope is that linking out to this post and its associated
   Rules will reinforce a strong security posture in our application
   development. In Summary Unlike authentication, authorization is context
   specific and difficult to “abstract away” from the leaf nodes of application
   code. This means that application developers need to consider authorization
   with every controller we write or change. We’ve explored two new rules to
   encourage best practices when it comes to authorization in our application
   controllers: Authorization should happen in the controller and should emerge
   naturally from table relationships originating from the authenticated user,
   i.e. the “trust root chain”. Controllers should pass ActiveRecord models,
   rather than ids, into the model layer. We’ve also covered how our custom cops
   can help developers avoid antipatterns, resulting in safer and easier to read
   code. Keep these in mind when writing or reviewing application code that an
   authenticated user will utilize and remember that authorization should be
   clear and obvious.
   11 min read


 * USING TARGETED UNIVERSALISM TO BUILD INCLUSIVE FEATURES
   
   Using Targeted Universalism To Build Inclusive Features The best products are
   inclusive at every stage of the design and engineering process. Here's how we
   turned a request for more inclusion into a feature all Betterment customers
   can benefit from. Earlier this year, a coworker asked me how difficult it
   would be to add a preferred name option into our product. They showed me how
   we were getting quite a few requests from trans customers to quit deadnaming
   them. The simplest questions tend to be the hardest to answer. For me, simple
   questions bring to mind this interesting concept called The Illusion Of
   Explanatory Depth, which is when “people feel they understand complex
   phenomena with far greater precision, coherence, and depth than they really
   do.” Simple questions tend to shed light on subjects shrouded in this
   illusion and force you to confront your lack of knowledge. Asking for
   someone’s name is simple, but full of assumptions. Deadnaming is when,
   intentionally or not, you refer to a trans person by the name they used
   before transitioning. For many trans folks like myself, this is the name
   assigned at birth which means all legal and government issued IDs and
   documents use this non-affirming name. According to Healthline, because legal
   name changes are “expensive, inaccessible, and not completely effective at
   eliminating deadnaming”, institutions like Betterment can and should make
   changes to support our trans customers. This simple question from our trans
   customers “Can you quit deadnaming me?” was a sign that our original
   understanding of our customers' names was not quite right, and we were
   lacking knowledge around how names are commonly used. Now, our work involved
   dispelling our previous understanding of what a name is. How to turn simple
   questions into solutions. At Betterment, we’re required by the government to
   have a record of a customer’s legal first name, but that shouldn’t prevent us
   from letting customers share their preferred or chosen first name, and then
   using that name in the appropriate places. This was a wonderful opportunity
   to practice targeted universalism: a concept that explains how building
   features specifically for a marginalized audience not only benefit the people
   in that marginalized group, but also people outside of it, which increases
   its broad impact. From a design standpoint, executing a preferred name
   feature was pretty straightforward—we needed to provide a user with a way to
   share their preferred name with us, and then start using it. The lead
   designer for this project, Crys, did a lovely job of incorporating
   compassionate design into how we show the user which legal name we have on
   file for them, without confronting that user with their deadname every time
   they go to change their settings. They accomplished that by hiding the user’s
   legal name in a dropdown accordion that is toggled closed by default. Crys
   also built out a delightful flow that shows the user why we require their
   legal name, that answers a few common questions, and allows them to edit
   their preferred first name in the future if needed. With a solid plan for
   gathering user input, we pivoted to the bigger question: Where should we use
   a customer’s preferred first name? From an engineering standpoint, this
   question revealed a few hurdles that we needed to clear up. First, I needed
   to provide a translation of my own understanding of legal first names and
   preferred first names to our codebase. The first step in this translation was
   to deprecate our not-very-descriptively named #firstname method and push
   engineers to start using two new, descriptive methods called #legalfirstname
   and #commonfirstname (#commonfirstname is essentially a defaulting method
   that falls back to #legalfirstname if #preferredfirst_name is not present for
   that user). To do this, I used a tool built by our own Betterment engineer,
   Nathan, called Uncruft, which not only gave engineers a warning whenever they
   tried to use the old #first_name method but also created a list of all the
   places in our code where we were currently using that old method. This was
   essentially a map for us engineers to be able to reference and go update
   those old usages in our codebase whenever we wanted. This new map leads us to
   our second task: addressing those deprecated usages. At first glance the
   places where we used #firstname in-app seemed minimal—emails, in-app
   greetings, tax documents. But once we looked under the surface, #firstname
   was sprinkled nearly everywhere in our codebase. I identified the most
   visible spots where we address a user and changed them, but for less visible
   changes I took this new map and delegated cross-squad ownership of each
   usage. Then, a group of engineers from each squad began tackling each
   deprecation one by one. In order to help these engineers, we provided
   guidelines around where it was necessary to use a legal first name, but in
   general we pushed to use a customer’s preferred first name wherever possible.
   From a high level view I essentially split this large engineering lift into
   two different streams of work. There was the feature work stream which
   involved: Storing the user’s new name information. Building out the user
   interface. Updating the most visible spots in our application. Modifying our
   integration with SimonData in order to bulk update our outgoing emails, and
   Changing how we share a user’s name with our customer service (CX) team
   through a Zendesk integration, as well as in our internal CX application.
   Then there was the foundational work stream, which involved mapping out and
   addressing every single depreciation. Thanks to Uncruft, once I generated
   that initial map of deprecations the large foundational work stream could
   then be further split into smaller brooks of work that could be tackled by
   different squads at different times. Enabling preferred first names moves us
   towards a more inclusive product. Once this feature went live, it was
   extremely rewarding to see our targeted universalism approach reveal its
   benefits. Our trans customers got the solution they needed, which makes this
   work crucial for that fact alone—but because of that, our cis customers also
   received a feature that delighted them. Ultimately, we now know that if
   people are given a tool to personalize their experience within our product,
   folks of many different backgrounds will use it.
   6 min read


 * GUIDELINES FOR TESTING RAILS APPLICATIONS
   
   Guidelines for Testing Rails Applications Discusses the different
   responsibilities of model, request, and system specs, and other high level
   guidelines for writing specs using RSpec & Capybara. Testing our Rails
   applications allows us to build features more quickly and confidently by
   proving that code does what we think it should, catching regression bugs, and
   serving as documentation for our code. We write our tests, called “specs”
   (short for specification) with RSpec and Capybara. Though there are many
   types of specs, in our workflow we focus on only three: model specs, request
   specs, and system specs. This blog post discusses the different
   responsibilities of these types of specs, and other related high level
   guidelines for specs. Model Specs Model specs test business logic. This
   includes validations, instance and class method inputs and outputs, Active
   Record callbacks, and other model behaviors. They are very specific, testing
   a small portion of the system (the model under test), and cover a wide range
   of corner cases in that area. They should generally give you confidence that
   a particular model will do exactly what you intended it to do across a range
   of possible circumstances. Make sure that the bulk of the logic you’re
   testing in a model spec is in the method you’re exercising (unless the
   underlying methods are private). This leads to less test setup and fewer
   tests per model to establish confidence that the code is behaving as
   expected. Model specs have a live database connection, but we like to think
   of our model specs as unit tests. We lean towards testing with a bit of
   mocking and minimal touches to the database. We need to be economical about
   what we insert into the database (and how often) to avoid slowing down the
   test suite too much over time. Don’t persist a model unless you have to. For
   a basic example, you generally won’t need to save a record to the database to
   test a validation. Also, model factories shouldn’t by default save associated
   models that aren’t required for that model’s persistence. At the same time,
   requiring a lot of mocks is generally a sign that the method under test
   either is doing too many different things, or the model is too highly coupled
   to other models in the codebase. Heavy mocking can make tests harder to read,
   harder to maintain, and provide less assurance that code is working as
   expected. We try to avoid testing declarations directly in model specs -
   we’ll talk more about that in a future blog post on testing model behavior,
   not testing declarations. Below is a model spec skeleton with some common
   test cases: System Specs System specs are like integration tests. They test
   the beginning to end workflow of a particular feature, verifying that the
   different components of an application interact with each other as intended.
   There is no need to test corner cases or very specific business logic in
   system specs (those assertions belong in model specs). We find that there is
   a lot of value in structuring a system spec as an intuitively sensible user
   story - with realistic user motivations and behavior, sometimes including the
   user making mistakes, correcting them, and ultimately being successful. There
   is a focus on asserting that the end user sees what we expect them to see.
   System specs are more performance intensive than the other spec types, so in
   most cases we lean towards fewer system specs that do more things, going
   against the convention that tests should be very granular with one assertion
   per test. One system spec that asserts the happy path will be sufficient for
   most features. Besides the performance benefits, reading a single system spec
   from beginning to end ends up being good high-level documentation of how the
   software is used. In the end, we want to verify the plumbing of user input
   and business logic output through as few large specs per feature that we can
   get away with. If there is significant conditional behavior in the view layer
   and you are looking to make your system spec leaner, you may want to extract
   that conditional behavior to a presenter resource model and test that
   separately in a model spec so that you don’t need to worry about testing it
   in a system spec. We use SitePrism to abstract away bespoke page interactions
   and CSS selectors. It helps to make specs more readable and easier to fix if
   they break because of a UI or CSS change. We’ll dive more into system spec
   best practices in a future blog post. Below is an example system spec. Note
   that the error path and two common success paths are exercised in the same
   spec. Request Specs Request specs test the traditional responsibilities of
   the controller. These include authentication, view rendering, selecting an
   http response code, redirecting, and setting cookies. It’s also ok to assert
   that the database was changed in some way in a request spec, but like system
   specs, there is no need for detailed assertions around object state or
   business logic. When controllers are thin and models are tested heavily,
   there should be no need to duplicate business logic test cases from a model
   spec in a request spec. Request specs are not mandatory if the controller
   code paths are exercised in a system spec and they are not doing something
   different from the average controller in your app. For example, a controller
   that has different authorization restrictions because the actions it is
   performing are more dangerous might require additional testing. The main
   exception to these guidelines is when your controller is an API controller
   serving data to another app. In that case, your request spec becomes like
   your system spec, and you should assert that the response body is correct for
   important use cases. API boundary tests are even allowed to be duplicative
   with underlying model specs if the behavior is explicitly important and
   apparent to the consuming application. Request specs for APIs are owned by
   the consuming app’s team to ensure that the invariants that they expect to
   hold are not broken. Below is an example request spec. We like to extract
   standard assertions such as ones relating to authentication into shared
   examples. More on shared examples in the section below. Why don’t we use
   Controller Specs? Controller specs are notably absent from our guide. We used
   to use controller specs instead of request specs. This was mainly because
   they were faster to run than request specs. However, in modern versions of
   Rails, that has changed. Under the covers, request specs are just a thin
   wrapper around Rails integration tests. In Rails 5+, integration tests have
   been made to run very fast. Rails is so confident in the improvements they’ve
   made to integration tests that they’ve removed controller tests from Rails
   core in Rails 5.1. Additionally, request specs are much more realistic than
   controller specs since they actually exercise the full request / response
   lifecycle – routing, middleware, etc – whereas controller specs circumvent
   much of that process. Given the changes in Rails and the limitations of
   controller specs, we’ve changed our stance. We no longer write controller
   specs. All of the things that we were testing in controller specs can instead
   be tested by some combination of system specs, model specs, and request
   specs. Why don’t we use Feature Specs? Feature specs are also absent from our
   guide. System specs were added to Rails 5.1 core and it is the core team’s
   preferred way to test client-side interactions. In addition, the RSpec team
   recommends using system specs instead of feature specs. In system specs, each
   test is wrapped in a database transaction because it’s run within a Rails
   process, which means we don’t need to use the  DatabaseCleaner gem anymore.
   This makes the tests run faster, and removes the need for having any special
   tables that don’t get cleaned out. Optimal Testing Because we use these three
   different categories of specs, it’s important to keep in mind what each type
   of spec is for to avoid over-testing. Don’t write the same test three times -
   for example, it is unnecessary to have a model spec, request spec, and a
   system spec that are all running assertions on the business logic
   responsibilities of the model. Over-testing takes more development time, can
   add additional work when refactoring or adding new features, slows down the
   overall test suite, and sets the wrong example for others when referencing
   existing tests. Think critically about what each type of spec is intended to
   be doing while writing specs. If you’re significantly exercising behavior not
   in the layer you’re writing a test for, you might be putting the test in the
   wrong place. Testing requires striking a fine balance - we don’t want to
   under-test either. Too little testing doesn’t give any confidence in system
   behavior and does not protect against regressions. Every situation is
   different and if you are unsure what the appropriate test coverage is for a
   particular feature, start a discussion with your team! Other Testing
   Recommendations Consider shared examples for last-mile regression coverage
   and repeated patterns. Examples include request authorization and common
   validation/error handling: Each spec’s description begins with an action
   verb, not a helping verb like “should,” “will” or something similar.
   8 min read


 * WEBVALVE – THE MAGIC YOU NEED FOR HTTP INTEGRATION
   
   WebValve – The Magic You Need for HTTP Integration Struggling with HTTP
   integrations locally? Use WebValve to define HTTP service fakes and toggle
   between real and fake services in non-production environments. When I started
   at Betterment (the company) five years ago, Betterment (the platform) was a
   monolithic Java application. As good companies tend to do, it began
   growing—not just in terms of users, but in terms of capabilities. And our
   platform needed to grow along with it. At the time, our application had no
   established patterns or tooling for the kinds of third-party integrations
   that customers were increasingly expecting from fintech products (e.g., like
   how Venmo connects to your bank to directly deposit and withdraw money). We
   were also feeling the classic pain points of a growing team contributing to a
   single application. To keep the momentum going, we needed to transition
   towards a service-oriented architecture that would allow the engineers of
   different business units to run in parallel against their specific business
   goals, creating even more demand for repeatable solutions to service
   integration. This brought up another problem (and the starting point for this
   blog post): in order to ensure tight feedback loops, we strongly believed
   that our devs should be able to do their work on a modern, modestly-specced
   laptop without internet connectivity. That meant no guaranteed connection to
   a cloud service mesh. And unfortunately, it’s not possible to run a local
   service mesh on a laptop without it melting. In short, our devs needed to be
   able to run individual services in isolation; by default they were set to
   communicate with one another, meaning an engineer would have to run all of
   the services locally in order to work on any one service. To solve this
   problem, we developed WebValve—a tool that allows us to define and register
   fake implementations of HTTP services and toggle between real and fake
   services in non-production environments. I’m going to walk you through how we
   got there. Start with the test Here’s a look at what a test would look like
   to see if a deposit from a bank was initiated: The five lines of code on the
   bottom is the meat of the test. Easy right? Not quite. Notice the two WebMock
   stub_requests calls at the top. The second one has the syntax you’d expect to
   execute the test itself. But take a look at the first one—notice the 100+
   lines of (omitted) code. Without getting into the gory details, this
   essentially requires us, for every test we write, to stub a request for user
   data—with differences across minor things like ID values, we can’t share
   these stubs between tests. In short it’s a sloppy feature spec. So how do we
   narrow this feature spec down to something like this? Through the magic of
   libraries. First things first—defining our view of the problem space. The
   success of projects like these don’t come down to the code itself—it comes
   down to the ‘design’ of the solution based on its specific needs. In this
   case, it meant paring the conditions down to making it work using just rails.
   Those come to life in four major principles, which guide how we engage with
   the problem space for our shift to a service-oriented architecture: We use
   HTTP & REST to communicate with collaborator services We define the
   boundaries and limit the testing of integrations with contract tests We don't
   share code across service boundaries Engineers must remain nimble and
   building features must remain enjoyable. A little bit of color on each,
   starting with HTTP and REST. For APIs that we build for ourselves (e.g.
   internal services) we have full control over how we build them, so using HTTP
   and REST is no issue. We have a strong preference to use a single integration
   pattern for both internal and external service integrations; this reduces
   cognitive overhead for devs. When we’re communicating with external services,
   we have less control, but HTTP is the protocol of the web and REST has been
   around since 2000—the dawn of modern web applications— so the majority of
   integrations we build will use them. REST is semantic, evolvable, limber, and
   very familiar to us as Rails developers —a natural ‘other side of the coin’
   for HTTP to make up the lingua franca of the web. Secondly, we need to define
   the boundaries in terms of ‘contracts.’ Contracts are a point of exchange
   between the consumption side (the app) and producer side (the collaborator
   service). The contract defines the expectations of input and output for the
   exchange. They’re an alternative to the kind of high-level systems
   integration tests that would include a critical mass of components that would
   render the test slow and non-repeatable. Thirdly, we don't want to have
   shared code across service boundaries. Shared code between services creates
   shared ownership, and shared ownership leads to undesirable coupling. We want
   the API provider to own and version their APIs, and we want the API consumer
   to own their integration with each version of a collaborator service's API.
   If we were willing to accept tight coupling between our services,
   specifically in their API contracts, we'd be well-served by a tool like Pact.
   With Pact, you create a contract file based on the consumer's expectations of
   an API and you share it with the provider. The contract files themselves are
   about the syntax and structure of requests and responses rather than the
   interpretation. There's a human conversation and negotiation to be had about
   these contracts, and you can fool yourself into thinking you don't need to
   have that conversation if you've got a file that guarantees that you and your
   collaborator service are speaking the same language; you may be speaking the
   same words, but you might not infer the same meaning. Pact's docs encourage
   these human conversations, but as a tool it doesn't require them. By avoiding
   shared code between services, we force ourselves to have a conversation about
   every API we build with the consumers of those APIs. Finally, these tests’
   effectiveness is directly related to how we can apply them to reality, so we
   need to be simple—we want to be able to test and build features without
   connections to other features. We want them to be able to work without an
   internet connection, and if we do want to integrate with a real service in
   local development, we should be able to do that—meaning we should be able to
   test and integrate locally at will, without having to rely on cumbersome,
   extra-connected services (think Docker, Kubernetes; anything that pairs cloud
   features with the local environment.) Straightforward tests are easy to
   write, read, and maintain. That keeps us moving fast and not breaking things.
   So, to recap, there are four principles that will drive our solution: Service
   interactions happen over HTTP & REST Contract tests ensure that service
   interactions behave as expected Providing an API contract requires no shared
   code Building features remains fast and fun Okay, okay, but how? So we’ve
   established that we don’t want to hit external services in tests, which we
   can do through WebMock or similar libraries. The challenge becomes: how do we
   replicate the integration environment without the integration environment?
   Through fakes. We’ll fake the integration by using Sinatra to build a rack
   app that quacks like the real thing. In the rack app, we define the routes we
   care about for the things we normally would have stubbed in the tests. From
   here, we do the things we couldn’t do before—pull real parameters out of the
   requests and feed them back into the fake response to make it more realistic.
   Additionally, we can use things like ActiveRecord to make these fake
   responses even more realistic based on the data stored in our actual
   database. So what does the fake look like? It's a class with a route defined
   for each URL we care about faking. We can use WebMock to wire the fake to
   requests that match a certain pattern. If we receive a request for a URL we
   didn't define, it will 404. Simple. However, this doesn’t allow us to solve
   all the things we were working for. What’s missing? First, an idiomatic setup
   stance. We want to be able to define fakes in a single place, so when we add
   a new one, we can easily find it and change it. In the same vein, we want to
   be able to answer similar questions about registering fakes in one spot.
   Finally, convention over configuration—if we can load, register, and wire-up
   a fake based on its name, for example, that would be handy. Secondly, it’s
   missing environment-specific behavior, which in this case, translates into
   the ability to toggle the library on and off and separately toggle the
   connection to specific collaborator services on and off. We need to be able
   to have the library active when running tests or doing local development, but
   do not want to have it running in a production environment—if it remains
   active in a real environment, it might affect real customer accounts, which
   we cannot afford. But, there will also be times when we're running in a local
   development environment and we want to communicate with a real collaborator
   service to do some true integration testing. Thirdly, we want to be able to
   autoload our fakes. If they’re in our codebase, we should be able to iterate
   on the fakes without having to restart our server; the behavior isn’t always
   right the first time, and restarting is tedious and it's not the Rails Way.
   Finally, to bolt this on to an IRL application, we need the ability to define
   fakes incrementally and migrate them into existing integrations that we have,
   one by one. Okay brass tacks. No existing library allows us to integrate this
   way and map HTTP requests to in-process fakes for integration and
   development. Hence, WebValve. TL;DR—WebValve is an open-source gem that uses
   Sinatra and WebMock to provide fake HTTP service behavior. The special sauce
   is that it works for more than just your tests. It allows you to run your
   fakes in your dev environment as well, providing functionality akin to real
   environments with the toggles we need to access the real thing when we need
   to. Let’s run it through the gauntlet to show how it works and how it solves
   for all our requirements. First we add the gem to our Gemfile and run bundle
   install. With the gem installed, we can use the  generator rails g
   webvalve:install to bootstrap a default config file where we can register our
   fakes. Then we can generate a fake for our "trading" collaborator service
   using rails generate webvalve:fake_service Trading. This gives us a class in
   a conventional location that inherits from WebValve::FakeService. This looks
   very similar to a Sinatra app, and that's because it is one—with some
   additional magic baked in. To make this fake work, all we have to do is
   define the conventionally-named environment variable, TRADINGAPIURL. That
   tells WebValve what requests to intercept and route to this fake. By
   inheriting from this WebValve class, we gain the ability to toggle the fake
   behavior on or off based on another conventionally-named environment
   variable, in this case TRADING_ENABLED. So let’s take our feature spec.
   First, we configure out test suite to use WebValve with the RSpec config
   helper require 'webvalve/rspec'. Then, we look at the user API call—we define
   a new route for user, in FakeTrading. Then we flesh out that fake route by
   scooping out our json from the test file and probably making it a little more
   dynamic when we drop it into the fake. Then we do the same for the deposit
   API call. And now our test, which doesn't care about the specifics of either
   of those API calls, is much clearer. It looks just like our ideal spec from
   before: We leverage all the power of WebMock and Sinatra through our
   conventions and the teeniest configuration to provide all the same
   functionality as before, but we can write cleaner tests, we get the ability
   to use these fakes in local development instead of the real services—and we
   can enable a real service integration without missing a beat. We’ve achieved
   our goal—we’ve allowed for all the functionality of integration without the
   threats of actual integration. Check it out on GitHub. This article is part
   of Engineering at Betterment.
   11 min read


 * BUILDING FOR BETTER: GENDER INCLUSION AT BETTERMENT
   
   Building for Better: Gender Inclusion at Betterment Betterment sits at the
   intersection of two industries with large, historical gender gaps. We’re
   working to change that—for ourselves and our industries. Since our founding,
   we’ve maintained a commitment to consistently build a better company and
   product for our customers and our customers-to-be. Part of that commitment
   includes reflecting the diversity of those customers. Betterment sits at the
   intersection of finance and technology—two industries with large, historical
   diversity gaps, including women and underrepresented populations. We’re far
   from perfect, but this is what we’re doing to embrace the International
   Women’s Day charge and work toward better gender balance at Betterment and in
   our world. Building Diversity And Inclusion At Betterment Change starts at
   the heart of the matter. For Betterment, this means working to build a
   company of passionate individuals who reflect our customers and bring new and
   different perspectives to our work. Our internal Diversity and Inclusion
   Committee holds regular meetings to discuss current events and topics,
   highlights recognition months (like Black History and Women’s History
   Months), and celebrates the many backgrounds and experiences of our
   employees. We’ve also developed a partnership with Peoplism. According to
   Caitlin Tudor-Savin, HR Business Partner, “This is more than a check-the-box
   activity, more than a one-off meeting with an attendance sheet. By partnering
   with Peoplism and building a long-term, action-oriented plan, we’re working
   to create real change in a sustainable fashion.” One next step we’re excited
   about is an examination of our mentorship program to make sure that everyone
   at Betterment has access to mentors. The big idea: By building empathy and
   connection among ourselves, we can create an inclusive environment that
   cultivates innovative ideas and a better product for our customers. Engaging
   The Tech Community At Large At Betterment, we’re working to creating change
   in the tech industry and bringing women into our space. By hosting meetups
   for Women Who Code, a non-profit organization that empowers women through
   technology, we’re working to engage this community directly. Rather than
   getting together to hear presentations, meetups are designed to have a
   group-led dynamic. Members break out and solve problems together, sharing and
   honing skills, while building community and support. This also fosters
   conversation, natural networking, and the chance for women to get their foot
   in the door. Jess Harrelson, a Betterment Software Engineer, not only leads
   our hosting events, they found a path to Betterment through Women Who Code.
   “Consistency is key,” said Jess. “Our Women Who Code meetups become a way to
   track your progression. It’s exciting to see how I’ve developed since I first
   started attending meetups, and how some of our long-time attendees have grown
   as engineers and as professionals.” Building A Community Of Our Own In 2018,
   our Women of Betterment group had an idea. They’d attended a number of
   networking and connection events, and the events never felt quite right. Too
   often, the events involved forced networking and stodgy PowerPoint
   presentations, with takeaways amounting to little more than a free glass of
   wine. Enter the SHARE (Support, Hire, Aspire, Relate, Empower) Series.
   Co-founder Emily Knutsen wanted “to build a network of diverse individuals
   and foster deeper connections among women in our community.” Through the
   SHARE Series, we hope to empower future leaders in our industry to reach
   their goals and develop important professional connections. While the series
   focuses on programming for women and those who identify as women, it is
   inclusive to everyone in our community who wish to be allies and support our
   mission. We developed the SHARE Series to create an authentic and
   conversational environment, one where attendees help guide the conversations
   and future event themes. Meetings thus far have included a panel discussion
   on breaking into tech from the corporate world and a small-group financial
   discussion led by financial experts from Betterment and beyond. “We’re
   excited that organizations are already reaching out to collaborate,” Emily
   said. “We’ve gotten such an enthusiastic response about designing future
   events around issues that women (and everyone!) face, such as salary
   negotiations.” Getting Involved Want to join us as we work to build a more
   inclusive and dynamic community? Our next SHARE Series event features CBS
   News Business Analyst and CFP® professional Jill Schlesinger, as we celebrate
   her new book, The Dumb Things Smart People Do with Their Money: Thirteen Ways
   to Right Your Financial Wrongs. You can also register to attend our Women Who
   Code meetups, and join engineers from all over New York as we grow, solve,
   and connect with one another.
   4 min read


 * CI/CD: STANDARDIZING THE INTERFACE
   
   CI/CD: Standardizing the Interface Meet our CI/CD platform, Coach and learn
   how wee increased consistent adoption of Continuous Integration (CI) across
   our engineering organization. And why that's important. This is the second
   part of a series of posts about our new CI/CD platform, Coach. Part
   I explores several design choices we made in building out our notifications
   pipeline and describes how those choices are emblematic of our overarching
   engineering principles here at Betterment. Today I’d like to talk about how
   we increased consistent adoption of Continuous Integration (CI) across our
   engineering organization, and why. Our Principles in Action: Standardizing
   the Interface At Betterment, we want to empower our engineers to do their
   best work. CI plays an important role in all of our teams’ workflows. Over
   time, a handful of these teams formed deviating opinions on what kind of
   acceptance criteria they had for CI. While we love the concern that our
   engineers show toward solving these problems, these deviations became
   problematic for applications of the same runtime that should abide by the
   same set of rules; for example, all Ruby apps should run RSpec and Rubocop,
   not just some of them. In building a platform as a service (PaaS), we
   realized that in order to mitigate the problem of nurturing pets vs herding
   cattle we would need to identify a firm set of acceptance criteria for
   different runtimes. In the first post of this series we mention one of our
   principles, Standardize the Pipeline. In this post, we’ll explore that
   principle and dive into how we committed 5000 line configuration files to our
   repositories with confidence by standardizing CI for different runtimes,
   automating configuration generation in code, and testing the process that
   generates that configuration. What’s so good about making everything
   the same? Our goals in standardizing the CI interface were to: Make it easier
   to distribute new CI features more quickly across the organization. Onboard
   new applications more quickly. Ensure the same set of acceptance criteria is
   in place for all codebases in the org. For example, by assuming that any Java
   library will run the PMDlinter and unit tests in a certain way we can
   bootstrap a new repository with very little effort. Allow folks outside of
   the SRE team to contribute to CI. In general, our CI platform categorizes
   projects into applications and libraries and divides those up further by
   language runtime. Combined together we call this a project_type. When we make
   improvements to one project type’s base configuration, we can flip a switch
   and turn it on for everyone in the org at once. This lets us distribute
   changes across the org quickly. How we managed to actually execute on this
   will become clearer in the next section, but for the sake of
   hand-wavy-expediency, we have a way to run a few commands and distribute CI
   changes to every project in a matter of minutes. How did we do it? Because we
   use CircleCI for our CI pipelines, we knew we would have to define our
   workflows using their DSL inside a .circleci/config.yml file at the root of a
   project’s repository. With this blank slate in front of us we were able to
   iterate quickly by manually adding different jobs and steps to that file. We
   would receive immediate feedback in the CircleCI interface when those jobs
   ran, and this feedback loop helped us iterate even faster. Soon we were
   solving for our acceptance criteria requirements left and right — that Java
   app needs the PMD linter! This Ruby app needs to run integration tests! And
   then we reached the point where manual changes were hindering our
   productivity. The .circleci/config.yml file was getting longer than a
   thousand lines fast, partly because we didn’t want to use any YAML shortcuts
   to hide away what was being run, and partly because there were no
   higher-level mechanisms available at the time for re-use when writing YAML
   (e.g. CircleCI’s orbs). Defining the system Our solution to this problem was
   to build a system, a Coach CLI for our Coach app, designed according to CLI
   12-factor conventions. This system’s primary goal is to
   create .circleci/config.yml files for repositories to encapsulate the
   necessary configuration for a project’s CI pipeline. The CLI reads a small
   project-level configuration definition file (coach.yml) located in a
   project’s directory and extrapolates information to create the much larger
   repo-level CircleCI specific configuration file (.circleci/config.yml), which
   we were previously editing ourselves. To clarify the hierarchy of how we
   thought about CI, here are the high level terms and components of our Coach
   CLI system: There are projects. Each project needs a configuration definition
   file (coach.yml) that declares its project_type. We
   support wordpress_app, java_library, java_app, ruby_gem, ruby_app,
   and javascript_libraryfor now. There are repos, each repo has one or more
   projects of any type. There needs to be a way to set up a new project. There
   needs to be a way to idempotently generate the CircleCI configuration
   (.circleci/config.yml) for all the projects in a repo at once. Each project
   needs to be built, tested, and linted. We realized that the dependency graph
   of repository → projects → project jobs was complicated enough that we would
   need to recreate the entire .circleci/config.yml file whenever we needed to
   update it, instead of just modifying the YAML file in place. This was one
   reason for automating the process, but the downsides of human-managed
   software were another. Manual updates to this file allow the configuration
   for infrequently-modified projects to drift. And leaving it up to engineers
   to own their own configuration lets folks modify the file in an unsupported
   way which could break their CI process. And then we’re back to square one. We
   decided to create that large file by ostensibly concatenating smaller
   components together. Each of those smaller components would be the output of
   specific functions, and each of those functions would be written in code and
   be tested. The end result was a lot of small files that look a little like
   this:
   https://gist.github.com/agirlnamedsophia/4b4a11acbe5a78022ecba62cb99aa85a
   Every time we make a change to the Coach CLI codebase we are confident that
   the thousands of lines of YAML that are idempotently generated as a result of
   the coach update ci command will work as expected because they’re already
   tested in isolation, in unit tests. We also have a few heftier integration
   tests to confirm our expectations. And no one needs to manually edit
   the .circleci/config.yml file again. Defining the Interface In order to
   generate the .circleci/config.yml that details which jobs to run and what
   code to execute we first needed to determine what our acceptance criteria
   was. For each project type we knew we would need to support: Static code
   analysis Unit tests Integration tests Build steps Test reports We define the
   specific jobs a project will run during CI by looking at
   the projecttype value inside a project’s coach.yml. If the value
   for projecttype is ruby_app then the .circleci/config.yml generator will
   follow certain conventions for Ruby programs, like including a job to run
   tests with RSpec or including a job to run static analysis commands
   like Rubocopand Brakeman. For Java apps and libraries we run integration and
   unit tests by default as well as PMD as part of our static code analysis.
   Here’s an example configuration section for a single job, the linter job for
   our Coach repository:
   https://gist.github.com/agirlnamedsophia/4b4a11acbe5a78022ecba62cb99aa85a And
   here’s an example of the Ruby code that helps generate that result:
   https://gist.github.com/agirlnamedsophia/a96f3a79239988298207b7ec72e2ed04 For
   each job that is defined in the .circleci/config.yml file, according to the
   project type’s list of acceptance criteria, we include additional steps to
   handle notifications and test reporting. By knowing that the Coach app is
   a ruby_appwe know how many jobs will need to be run and when. By writing that
   YAML inside of Ruby classes we can grow and expand our pipeline as needed,
   trusting that our tests confirm the YAML looks how we expect it to look. If
   our acceptance criteria change, because everything is written in code, adding
   a new job involves a simple code change and a few tests, and that’s it. We’ll
   go into contributing to our platform in more detail below. Onboarding a
   new project One of the main reasons for standardizing the interface and
   automating the configuration generation was to onboard new applications more
   quickly. To set up a new app all you need to do is be in the directory for
   your project and then run coach create project --type $project_type. -> %
   coach create project --type ruby_app 'coach.yml' configuration file added --
   update it based on your project's needs When you run that, the CLI creates
   the small coach.yml configuration definition file discussed earlier. Here’s
   what an example Ruby app’s coach.yml looks like:
   https://gist.github.com/agirlnamedsophia/2f966ab69ba1c7895ce312aec511aa6b The
   CLI will refer back to a project’s coach.yml to decide what kind of CircleCI
   DSL needs to be written to the .circleci/config.yml file to wire up the right
   jobs to run at the right time. Though our contract with projects of different
   types is standardized, we permit some level of customization.
   The coach.yml file allows our users to define certain characteristics of
   their CI flow that vary and require more domain knowledge about a specific
   project: like the level of test parallelism their application test suite
   requires, or the list of databases required for tests to run, or an attribute
   composed of a matrix of Ruby versions and Gemfiles to run the whole test
   suite against. Using this declarative configuration is more extensible and
   more user friendly and doesn’t break the contract we’ve put in place for
   projects that use our CI platform. Contributing to CI Before, if you wanted
   to add an additional linter or CI tool to our pipeline, it would require
   adding a few lines of untested bash code to an existing Jenkins job, or
   adding a new job to a precarious graph of jobs, and crossing your fingers
   that it would “just work.” The addition couldn’t be tested and it was often
   only available to one project or one repository at a time. It couldn’t scale
   out to the rest of the org with ease. Now, updating CI requires opening a PR
   to make the change. We encourage all engineers who want to add to their own
   CI pipeline to make changes on a branch from our Coach repository, where all
   the configuration generation magic happens, verify its effectiveness for
   their use-case, and open a pull request. If it’s a reasonable addition to CI,
   our thought is that everyone should benefit. By having these changes in
   version control, each addition to the CI pipeline goes through code review
   and requires tests be written. We therefore have the added benefit of knowing
   that updates to CI have been tested and are deemed valid and working before
   they’re distributed, and we can prevent folks from removing a feature without
   considering the impact it may have. When a PR is merged, our team takes care
   of redistributing the new version of the library so engineers can update
   their configuration. CI is now a mechanism for instantly sharing the benefits
   of discovery made in isolated exploration, with everyone. Putting it
   all together Our configuration generator is doing a lot more than just taping
   together jobs in a workflow — we evaluate dependency graphs and only run
   certain jobs that have upstream changes or are triggered themselves. We built
   our Coach CLI into the Docker images we use in CircleCI and so those Coach
   CLI commands are available to us from inside the .circleci/config.yml file.
   The CLI handles notifications, artifact generation, and deployment triggers.
   As we stated in our requirements for Coach in the first post, we believe
   there should be one way to test code, and one way to deploy it. To get there
   we had to make all of our Java apps respond to the same set of commands, and
   all of our Ruby apps to do the same. Our CLI and the accompanying conventions
   make that possible. When before it could take weeks of both product
   engineering and SRE time to set up CI for an application or service within a
   complex ecosystem of bash scripts and Jenkins jobs and application
   configuration, now it takes minutes. When before it could take days or weeks
   to add a new step to a CI pipeline, now it takes hours of simple code review.
   We think engineers should focus on what they care about the most, shipping
   great features quickly and reliably. And we think we made it a little easier
   for them (and us) to do just that. What’s Next? Now that we’ve wrangled our
   CI process and encoded the best practices into a tool, we’re ready to tackle
   our Continuous Deployment pipeline. We’re excited to see how the model of
   projects and project types that we built for CI will evolve to help us
   templatize our Kubernetes deployments. Stay tuned.
   11 min read


 * CI/CD: SHORTENING THE FEEDBACK LOOP
   
   CI/CD: Shortening the Feedback Loop As we improve and scale our CD platform,
   shortening the feedback loop with notifications was a small, effective, and
   important piece. Continuous Delivery (CD) at scale is hard to get right. At
   Betterment, we define CD as the process of making every small change to our
   system shippable as soon as it’s been built and tested. It’s part of the
   CI/CD (continuous integration and continuous delivery) process. We’ve been
   doing CD at Betterment for a long time, but it had grown to be quite a
   cumbersome process over the last few years because our infrastructure and
   tools hadn’t evolved to meet the needs of our growing engineering team. We
   reinvented our Site Reliability Engineering (SRE) team last fall with our
   sights set on building software to help developers move faster, be happier,
   and feel empowered. The focus of our work has been on delivering a platform
   as a service to make sense of the complex process of CD. Coach is the
   beginning of that platform. Think of something like Heroku, but for engineers
   here at Betterment. We wanted to build a thoughtfully composed platform based
   on the tried and true principles of 12-factor apps. In order to build this,
   we needed to do two overhauls: 1) Build a new CI pipeline and 2) Build a new
   CD pipeline. Continuous Integration — Our Principles For years, we
   used Jenkins, an open-source tool for automation, and a mess of scripts to
   provide CI/CD to our engineers. Jenkins is a powerful tool and well-used in
   the industry, but we decided to cut it because the way that we were using it
   was wrong, we weren’t pleased with its feature set, and there was too much
   technical debt to overcome. Tests were flakey and we didn’t know if it was
   our Jenkins setup, the tests themselves, or both. Dozens of engineers
   contribute to our biggest repository every day and as the code base and
   engineering team have grown, the complexity of our CI story has increased and
   our existing pipeline couldn’t keep up. There were task forces cobbled
   together to drive up reliability of the test suite, to stamp out flakes, to
   rewrite, and to refactor. This put a band-aid on the problem for a short
   while. It wasn’t enough. We decided to start fresh with CircleCI, an
   alternative to Jenkins that comes with a lot more opinions, far fewer rough
   edges, and a lot more stability built-in. We built a tool (Coach) to make the
   way that we build and test code conventional across all of our of apps,
   regardless of language, application owner, or business unit. As an added
   bonus, since our CI process itself was defined in code, if we ever need to
   switch platforms again, it would be much easier. Coach was designed and built
   with these principles: Standardize the pipeline — there should be one way to
   test code, and one way to deploy it Test code often — code should be tested
   as often as it’s committed Build artifacts often — code should be built as
   often as it’s tested so that it can be deployed at any time Be environment
   agnostic — artifacts should be built in an environment-agnostic way with
   maximum portability Give consistent feedback — the CI output should be
   consistent no matter the language runtime Shorten the feedback
   loop — engineers should receive actionable feedback as soon as possible
   Standardizing CI was critical to our growth as an organization for a number
   of reasons. It ensures that new features can be shipped more quickly, it
   allows new services to adopt our standardized CI strategy with ease, and it
   lets us recover faster in the face of disaster — a hurricane causing a power
   outage at one of our data centers. Our goal was to replace the old way of
   building and testing our applications (what we called the “Old World”) and
   start fresh with these principles in mind (what we deemed the “New World”).
   Using our new platform to build and test code would allow our engineers to
   receive automated feedback sooner so they could iterate faster. One of our
   primary aims in building this platform was to increase developer velocity, so
   we needed to eliminate any friction from commit to deploy. Friction here
   refers to ambiguity of CI results and the uncertainty of knowing where your
   code is in the CI/CD process. Shortening the feedback loop was one of the
   first steps we took in building out our new platform, and we’re excited to
   share the story of how we designed that solution. Our Principles in Action:
   Shortening the Feedback Loop The feedback loop in the Old World run by
   Jenkins was one of the biggest hurdles to overcome. Engineers never really
   knew where their code was in the pipeline. We use Slack, like a lot of other
   companies, so that part of the messaging story wouldn’t change, but there
   were bugs we needed to fix and design flaws we needed to update. How much
   feedback should we give? When do we want to give feedback? How detailed
   should our messages be? These were some of the questions we asked ourselves
   during this part of the design phase. What our Engineers Needed For pull
   requests, developers would commit code and push it up to GitHub and then
   eventually they would receive a Slack message that said “BAD” for every test
   suite that failed, or “GOOD” if everything passed, or nothing at all in the
   case of a Jenkins agent getting stuck and hanging forever. The notifications
   were slightly more nuanced than good/bad, but you get the idea. We valued
   sending Slack messages to our engineers, as that’s how the company
   communicates most effectively, but we didn’t like the rate of communication
   or the content of those messages. We knew both of those would need to change.
   As for merges into master, the way we sent Slack messages to communicate to
   engineering teams (as opposed to just individuals) was limited because of how
   our CI/CD process was constructed. The entire CI and CD process happened as a
   series of interwoven Jenkins freestyle jobs. We never got the logic quite
   right around determining whose code was being deployed — the deploy logic was
   contingent on a pretty rough shell script called “inside a Jenkins job.” The
   best we had was a Slack message that was sent roughly five minutes before a
   deploy began, tagging a good estimation of contributors but often missing
   someone if their Github email address was different from their Slack email
   address. More critically, the one-off script solution wasn’t stored in source
   control, therefore it wasn’t tested. We had no idea when it failed or missed
   tagging some contributors. We liked notifying engineers when a deploy began,
   but we needed to be more accurate about who we were notifying. What our SRE
   Team Needed Our design and UX was informed by what our engineers using our
   platform needed, but Coach was built based on our needs. What did we need?
   Well-tested code stored in version control that could easily be changed and
   developed. All of the code that handles changesets and messaging logic in the
   New World is written in one central location, and it’s tested in isolation.
   Our CI/CD process invokes this code when it needs to, and it works great. We
   can be confident that the right people are notified at the right time because
   we wrote code that does that and we tested it. It’s no longer just a script
   that sometimes works and sometimes doesn’t. Because it’s in source control
   and it runs through its own CI process, we can also easily roll out changes
   to notifications without breaking things. We wanted to build our platform
   around what our engineers would need to know, when they need to know it, and
   how often. And so one of the first components we built out was this new
   communication pipeline. Next we’ll explore in more detail some of our design
   choices regarding the content of our messages and the rate at which we send
   them. Make sure our engineers don’t mute their slack notifications In leaving
   the Old World of inconsistent and contextually sparse communication we looked
   at our blank canvas and initially thought “every time the tests pass, send a
   notification! That will reduce friction!” So we tried that. If we merged code
   into a tracked branch — a branch that multiple engineers contribute to, like
   master — for one of our biggest repos, which contained 20 apps and 20 test
   suites, we would be notified at every transition: every rubocop failure,
   every flakey occurrence of a feature test. We quickly realized it was too
   much. We sat back and thought really hard about what we would want,
   considering we were dogfooding our own pipeline. How often did we want to be
   notified by the notification system when our tests that tested the code that
   built the notification system, succeeded? Sheesh, that’s a mouthful. Our
   Slack bot could barely keep up! We decided it was necessary to be told
   only once when everything ran successfully. However, for failures, we didn’t
   want to sit around for five minutes crossing our fingers hoping that
   everything was successful only to be told that we could have known three
   minutes earlier that we’d forgotten a newline at the end of one of our files.
   Additionally, in CircleCI where we can easily parallelize our test suites, we
   realized we wouldn’t want to notify someone for every chunk of the test suite
   that failed, just the first time a failure happened for the suite. We came up
   with a few rules to design this part of the system: Let the author know as
   soon as possible when something is red but don’t overdo it for redundant
   failures within the same job (e.g. if unit tests ran on 20 containers and 18
   of them saw failures, only notify once) Only notify once about all the green
   things Give as much context as possible without being overwhelming:
   be concise but clear Next we’ll explore the changes we made in content. What
   to say when things fail This is what engineers would see in the Old World
   when tests failed for an open pull request: Among other deficiencies, there’s
   only one link and it takes us to a Jenkins job. There’s no context to orient
   us quickly to what the notification is for. After considering what we were
   currently sending our engineers, we realized that 1) context and
   2) status were the most important things to communicate, which were the
   aspects of our old messaging that were suffering the most. Here’s what we
   came up with: Thanks Coach bot! Right away we know what’s happened. A PR
   build failed. It failed for a specific GitHub
   branch (“what-to-say-when-things-fail-branch”), in a specific
   repo (“Betterment/coach”), for a specific PR (#430),for a specific job in the
   test suite (“coach_cli — lint (Gemfile)”). We can click on any of these links
   and know exactly where they go based on the logo of the service. Messages
   about failures are now actionable and full of context,prompting the engineer
   to participate in CI, to go directly to their failures or to their PR. And
   this bounty of information helps a lot if the engineer has multiple PRs open
   and needs to quickly switch context. The messaging that happened for failures
   when you merged a pull request into master was a little different in that it
   included mentions for the relevant contributors (maybe all of them, if we
   were lucky!): The New World is cleaner, easier to grok, and more immediately
   helpful: The link title to GitHub is the commit diff itself, and it takes you
   to the compare URL for that changeset. The CircleCI info includes the title
   of the job that failed (“coach_cli — lint (Gemfile)”), the build number
   (“#11389”) to reference for context in case there are multiple occurrences of
   the failure in multiple workflows, a link to the top-level “Workflow”, and @s
   for each contributor. What to say when things succeed We didn’t change the
   frequency of messaging for success — we got that right the first time around.
   You got one notification message when everything succeeded and you still do.
   But in the Old World there wasn’t enough context to make the message
   immediately useful. Another disappointment we had with the old messaging was
   that it didn’t make us feel very good when our tests passed. It was just a
   moment in time that came and went: In the New World we wanted to proclaim
   loudly (or as loudly as you can proclaim in a Slack message) that the pull
   request was successful in CI: Tada! We did it! We wanted to maintain the same
   format as the new failure messages for consistency and ease of reading. The
   links to the various services we use are in the same order as our new failure
   messages, but the link to CircleCI only goes to the workflow that shows the
   graph of all the tests and jobs that ran. It’s delightful and easy to parse
   and has just the right amount of information. What’s next? We have big dreams
   for the future of this platform with more and more engineers using our
   product. Shortening the feedback loop with notifications is only one small,
   but rather important, part of our CD platform. In the next post of this
   series on CD, we’ll explore how we committed 5000 line configuration files to
   our repositories with confidence by standardizing CI for different runtimes,
   automating config generation in code, and testing that code generation. We
   believe in a world where shipping code, even in really large codebases with
   lots of contributors, should be done dozens of times a day. Where engineers
   can experience feedback about their code with delight and simplicity. We’re
   building that at Betterment.
   12 min read


 * SHH… IT’S A SECRET: MANAGING SECRETS AT BETTERMENT
   
   Shh… It’s a Secret: Managing Secrets at Betterment Opinionated secrets
   management that helps us sleep at night. Secrets management is one of those
   things that is talked about quite frequently, but there seems to be little
   consensus on how to actually go about it. In order to understand our journey,
   we first have to establish what secrets management means (and doesn’t mean)
   to us. What is Secrets Management? Secrets management is the process of
   ensuring passwords, API keys, certificates, etc. are kept secure at every
   stage of the software development lifecycle. Secrets management does NOT mean
   attempting to write our own crypto libraries or cipher algorithms. Rolling
   your own crypto isn’t a great idea. Suffice it to say, crypto will not be the
   focus of this post. There’s such a wide spectrum of secrets management
   implementations out there ranging from powerful solutions that require a
   significant amount of operational overhead, like Hashicorp Vault, to
   solutions that require little to no operational overhead, like a .env file.
   No matter where they fall on that spectrum, each of these solutions has
   tradeoffs in its approach. Understanding these tradeoffs is what helped our
   Engineering team at Betterment decide on a solution that made the most sense
   for our applications. In this post, we’ll be sharing that journey. How it
   used to work We started out using Ansible Vault. One thing we liked about
   Ansible Vault is that it allows you to encrypt a whole file or just a string.
   We valued the ability to encrypt just the secret values themselves and leave
   the variable name in plain-text. We believe this is important so that we can
   quickly tell which secrets an app is dependent on just by opening the file.
   So the string option was appealing to us, but that workflow didn’t have the
   best editing experience as it required multiple steps in order to encrypt a
   value, insert it into the correct file, and then export it into the
   environment like the 12-factor appmethodology tells us we should. At the
   time, we also couldn’t find a way to federate permissions with Ansible Vault
   in a way that didn’t hinder our workflow by causing a bottleneck for
   developers. To assist us in expediting this workflow, we had an alias in our
   bash_profiles that allowed us to run a shortcut at the command line to
   encrypt the secret value from our clipboard and then insert that secret value
   in the appropriate Ansible variables file for the appropriate environment.
   alias prod-encrypt="pbpaste | ansible-vault encrypt_string
   --vault-password-file=~/ansible-vault/production.key" This wasn’t the worst
   setup, but didn’t scale well as we grew. As we created more applications and
   hired more engineers, this workflow became a bit much for our small SRE team
   to manage and introduced some key-person risk, also known as the Bus Factor.
   We needed a workflow with less of a bottleneck, but allowing every developer
   access to all the secrets across the organization was not an acceptable
   answer. We needed a solution that not only maintained our security posture
   throughout the software development lifecycle, but also enforced our opinions
   about how secrets should be managed across environments. Decisions,
   decisions… While researching our options, we happened upon a tool
   called sops. Maintained and open-sourced by Mozilla, sops is a command line
   utility written in Go that facilitates slick encryption and decryption
   workflows by using your terminal’s default editor. Sops encrypts and decrypts
   your secret values using your cloud provider’s Key Management Service (AWS
   KMS, GCP KMS, Azure Key Vault) and PGP as a backup in the event those
   services are not available. It leaves the variable name in plain-text while
   only encrypting the secret value itself and supports YAML, JSON, or binary
   format. We use the YAML format because of its readability and terseness. See
   a demo of how it works. We think this tool works well with the way we think
   about secrets management. Secrets are code. Code defines how your application
   behaves. Secrets also define how your application behaves. So if you can
   encrypt them safely, you can ship your secrets with your code and have a
   single change management workflow. Github pull request reviews do software
   change management right. YAML does human readable key/value storage right.
   AWS KMS does anchored encryption right. AWS Regions do resilience right. PGP
   does irreversible encryption better than anything else readily available and
   is broadly supported. In sops, we’ve found a tool that combines all of these
   things enabling a workflow that makes secrets management easier. Who’s
   allowed to do what? Sops is a great tool by itself, but operations security
   is hard. Key handling and authorization policy design is tricky to get right
   and sops doesn’t do it all for us. To help us with that, we took things a
   step further and wrote a wrapper around sops we call sopsorific. Sopsorific,
   also written in Go, makes a few assumptions about application environments.
   Most teams need to deploy to multiple environments: production, staging,
   feature branches, sales demos, etc. Sopsorific uses the term “ecosystem” to
   describe this concept, as well as collectively describe a suite of apps that
   make up a working Betterment system. Some ecosystems are ephemeral and some
   are durable, but there is only one true production ecosystem holding
   sensitive PII (Personally Identifiable Information) and that ecosystem must
   be held to a higher standard of access control than all others. To capture
   that idea, we introduced a concept we call “security zones” into sopsorific.
   There are only two security zones per GitHub repository — sensitive, and
   non-sensitive — even if there are multiple apps in a repository. In the case
   of mono-repos, if an app in that repository shouldn’t have its secrets
   visible to all engineers who work in that repository, then the app belongs in
   a different repository. With sopsorific, secrets for the non-sensitive zone
   can be made accessible to a broader subset of the app team than sensitive
   zone secrets helping to eliminate some of bottleneck issues we’ve experienced
   with our previous workflow. By default, sopsorific wants to be configured
   with a production (sensitive zone) secrets file and a default (non-sensitive
   zone) secrets file. The default file makes it easy to spin up new
   non-sensitive one-off ecosystems without having to redefine every secret in
   every ecosystem. It should “just work” unless there are secrets that have
   different values than already configured in the default file. In that case,
   we would just need to define the secrets that have different values in a
   separate secrets file like devintest.yml below where devintest is the name of
   the ecosystem. Here’s an example of the basic directory structure: .sops.yaml
   app/ |_ deployment_secrets/ |_ sensitive/ |_ production.yml |_ nonsensitive/
   |_ default.yml |_ devin_test.yml The security zone concept allows a more
   granular access control policy as we can federate decrypt permissions on a
   per application and per security zone basis by granting or revoking access to
   KMS keys with AWS Identity and Access Management (IAM) roles. Sopsorific
   bootstraps these KMS keys and IAM roles for a given application. It generates
   a secret-editor role that privileged humans can assume to manage the secrets
   and an application role for the application to assume at runtime to decrypt
   the secrets. Following the principle of least privilege, our engineering team
   leads are app owners of the specific applications they maintain. App owners
   have permissions to assume the secret-editor role for sensitive ecosystems of
   their specific application. Non app owners have the ability to assume the
   secret-editor role for non-sensitive ecosystems only. How it works now Now
   that we know who can do what, let’s talk about how they can do what they can
   do. Explaining how we use sopsorific is best done by exploring how our
   secrets management workflow plays out for each stage of the software
   development lifecycle. Development Engineers have permissions to assume the
   secret-editor role for the security zones they have access to. Secret-editor
   roles are named after their corresponding IAM role which includes the
   security zone and the name of the GitHub repository. For
   example, secreteditorsensitive_coach where coach is the name of the
   repository. We use a little command line utility to assume the role and are
   dropped into a secret-editor session where they use sops to add or edit
   secrets with their editor in the same way they add or edit code in a feature
   branch. assuming a secret-editor role The sops command will open and decrypt
   the secrets in their editor and, if changed, encrypt them and save them back
   to the file’s original location. All of these steps, apart from the editing,
   are transparent to the engineer editing the secret. Any changes are then
   reviewed in a pull request along with the rest of the code. Editing a file is
   as simple as: sops deployment_secrets/sensitive/production.yml Testing We
   built a series of validations into sopsorific to further enforce our opinions
   about secrets management. Some of these are: Secrets are unguessable — Short
   strings like “password” are not really secrets and this check enforces
   strings that are at least 128 bits of entropy expressed in unpadded base64.
   Each ecosystem defines a comprehensive set of secrets — The 12-factor app
   methodology reminds us that all environments should resemble production as
   closely as possible. When a secret is added to production, we have a check
   that makes sure that same secret is also added to all other ecosystems so
   that they continue to function properly. All crypto keys match — There are
   checks to ensure the multi-region KMS key ARNs and backup PGP key fingerprint
   in the sops config file matches the intended security zones. These
   validations are run as a step in our Continuous Integration suite. Running
   these checks is a completely offline operation and doesn’t require access to
   the KMS keys making it trivially secure. Developers can also run these
   validations locally: sopsorific check Deployment The application server is
   configured with the instance profile generated by sopsorific so that it can
   assume the IAM role that it needs to decrypt the secrets at runtime. Then, we
   configure our init system, upstart, to execute the process wrapped in the
   sopsorific run command. sopsorific run is another custom command we built to
   make our usage of sops seamless. When the app starts up, the decrypted
   secrets will be available as environment variables only to the process
   running the application instead of being available system wide. This makes
   our secrets less likely to unintentionally leak and our security team a
   little happier. Here’s a simplified version of our upstart configuration.
   start on starting web-app stop on stopping web-app respawn exec su -s
   /bin/bash -l -c '\ cd /var/www/web-app; \ exec "$0" "$@"' web-app-owner --
   sopsorific run 'bundle exec puma -C config/puma.rb' >> /var/log/upstart.log
   2>&1 >Operations The 12-factor app methodology reminds us that sometimes
   developers need to be able to run one-off admin tasks by starting up a
   console on a live running server. This can be accomplished by establishing a
   secure session on the server and running what you would normally run to get a
   console with the sopsorific run command. For our Ruby on Rails apps, that
   looks like this: sopsorific run 'bundle exec rails c' What did we learn?
   Throughout this journey, we learned many things along the way. One of these
   things was having an opinionated tool to help us manage secrets helped to
   make sure we didn’t accidentally leave around low-entropy secrets from when
   we were developing or testing out a feature. Having a tool to protect
   ourselves from ourselves is vital to our workflow. Another thing we learned
   was that some vendors provide secrets with lower entropy than we’d like for
   API tokens or access keys and they don’t provide the option to choose
   stronger secrets. As a result, we had to build features into sopsorific to
   allow vendor provided secrets that didn’t meet the sopsorific standards by
   default to be accepted by sopsorific’s checks. In the process of adopting
   sops and building sopsorific, we discovered the welcoming community and
   thoughtful maintainers of sops. We had the pleasure of contributing a few
   changes to sops, and that left us feeling like we left the community a little
   bit better than we found it. In doing all of these things, we’ve reduced
   bottlenecks for developers so they can focus more on shipping features and
   less on managing secrets.
   11 min read


 * HOW WE DEVELOP DESIGN COMPONENTS IN RAILS
   
   How We Develop Design Components in Rails Learn how we use Rails components
   to keep our code D.R.Y. (Don’t Repeat Yourself) and to implement UX design
   changes effectively and uniformly.. A little over a year ago, we rebranded
   our entire site . And we've even written on why we did it. We were able to
   achieve a polished and consistent visual identity under a tight deadline
   which was pretty great, but when we had our project retrospective, we
   realized there was a pain point that still loomed over us. We still lacked a
   good way to share markup across all our apps. We repeated multiple styles and
   page elements throughout the app to make the experience consistent, but we
   didn’t have a great way to reuse the common elements. We used Rails
   partials in an effort to keep the code DRY (Don’t Repeat Yourself) while
   sharing the same chunks of code and that got us pretty far, but it had its
   limitations. There were aspects of the page elements (our shared chunks) that
   needed to change based on their context or the page where they were being
   rendered. Since these contexts change, we found ourselves either altering the
   partials or copying and pasting their code into new views where additional
   context-specific code could be added. This resulted in app code (the
   content-specific code) becoming entangled with “system” (the base HTML) code.
   Aside from partials, there was corresponding styling, or CSS, that was being
   copied and sometimes changed when these shared partials were altered. This
   meant when the designs were changed, we needed to find all of the places this
   code was used to update it. Not only was this frustrating, but it was
   inefficient. To find a solution, we drew inspiration from the component
   approach used by modern design systems and JavaScript frameworks. A component
   is a reusable code building block. Pages are built from a collection of
   components that are shared across pages, but can be expanded upon or
   manipulated in the context of the page they’re on. To implement our component
   system, we created our internal gem, Style Closet. There are a few other
   advantages and problems this system solves too: We’re able to make global
   changes in a pretty painless way. If we need to change our brand colors,
   let’s say, we can just change the CSS in Style Closet instead of scraping our
   codebase and making sure we catch it everywhere. Reusable parts of code
   remove the burden from engineers for things like CSS and allows time to focus
   on and tackle other problems. Engineers and designers can be confident
   they’re using something that’s been tested and validated across browsers.
   We’re able to write tests specific to the component without worrying about
   the use-case or increasing testing time for our apps. Every component is on
   brand and consistent with every other app, feels polished, high quality and
   requires lower effort to implement. It allows room for future growth which
   will inevitably happen. The need for new elements in our views is not going
   to simply vanish because we rebranded, so this makes us more prepared for the
   future. How does it work? Below is an example of one of our components, the
   flash. A flash message/warning is something you may use throughout your app
   in different colors and with different text, but you want it to look
   consistent. In our view, or the page where we write our HTML, we would write
   the following to render what you see above: Here’s a breakdown of how that
   one line, translates into what you see on the page. The component consists of
   3 parts: structure, behavior and appearance. The view (the structure): a
   familiar html.erb file that looks very similar to what would exist without a
   component but a little more flexible since it doesn’t have its content hard
   coded in. These views can also leverage Rails’ view yield functionality when
   needed. Here’s the view partial from Style Closet: You can see how the
   component.message is passed into the dedicated space/ slot keeping this code
   flexible for reuse. A Ruby class (the behavior aside from any JavaScript):
   the class holds the “props” the component allows to be passed in as well as
   any methods needed for the view, similar to a presenter model. The props are
   a fancier attr_accessor with the bonus of being able to assign defaults.
   Additionally, all components can take a block, which is typically the content
   for the component. This allows the view to be reusable. CSS (the appearance):
   In this example, we use it to set things like the color, alignment and the
   border. A note on behavior: Currently, if we need to add some JS behavior, we
   use unobtrusive JavaScript or UJS sprinkles. When we add new components or
   make changes, we update the gem (as well as the docs site associated with
   Style Closet) and simply release the new version. As we develop and
   experiment with new types of components, we test these bigger changes out in
   the real world by putting them behind a feature flag using our open source
   split testing framework, Test Track. What does the future hold? We’ve
   used UJS sprinkles in similar fashion to the rest of the Rails world over the
   years, but that has its limitations as we begin to design more complex
   behaviors and elements of our apps. Currently we’re focusing on building more
   intricate and and interactive components using React. A bonus of Style Closet
   is how well it’s able to host these React components since they can simply be
   incorporated into a view by being wrapped in a Style Closet component. This
   allows us to continue composing a UI with self contained building blocks.
   We’re always iterating on our solutions, so if you’re interested in expanding
   on or solving these types of problems with us, check out our career page!
   Addition information Since we introduced our internal Rails component code, a
   fantastic open-source project emerged, Komponent, as well as a really great
   and in-depth blog post on component systems in Rails from Evil Martians.
   6 min read


 * ENGINEERING THE LAUNCH OF A NEW BRAND FOR BETTERMENT
   
   Engineering the Launch of a New Brand for Betterment In 2017, Betterment set
   out to launch a new brand to better define the voice and feel of our product.
   After months of planning across all teams at the company, it was time for our
   engineering team to implement new and responsive designs across all user
   experiences. The key to the success of this project was to keep the build
   simple, maintain a low risk of regressions, and ensure a clear path to remove
   the legacy brand code after launch. Our team learned a lot, but a few key
   takeaways come to mind. Relieving Launch Day Stress with Feature Flags
   Embarking on this rebrand project, we wanted to keep our designs under wrap
   until launch day. This would entail a lot of code changes, however, as an
   engineering team we believe deeply in carving up big endeavors into small
   pieces. We’re constantly shipping small, vertical slices of work hidden
   behind feature flags and we’ve even built our own open-source
   system, TestTrack, to help us do so. This project would be no exception. On
   day one, we created a feature flag and started shipping rebranded code to
   production. Our team could then use TestTrack’s browser plugin to preview and
   QA the new views along the way. When the day of the big reveal arrived, all
   that would be left to do was toggle the flag to unveil the code we’d shipped
   and tested weeks before. We then turned to the challenge of rebranding our
   entire user experience. Isolating New Code with ActionPack Variants
   ActionPack variants provide an elegant solution to rolling out significant
   front end changes. Typically, variants are prescribed to help render distinct
   views for different device types, but they are equally powerful when
   rendering distinct HTML/CSS for any significant redesign. We created a
   variant for our rebrand, which would be exposed based on the status of our
   new feature flag. Our variant also required a new CSS file, where all our new
   styles would live. Rails provides rich template resolver logic at every level
   of the view hierarchy, and we were able to easily hook into it by simply
   modifying the extensions of our new layout files. The rebranded version of
   our application’s core layout imported the new CSS file and just like that,
   we were in business. Implementing the Rebrand without a Spaghetti of “IF”
   Statements Our rebranded experience would become the default at launch time,
   so another challenge we faced was maintaining two worlds without creating
   unneeded complexity. The “rebrand” variant and correlating template file
   helped us avoid a tangled web of conditionals, and instead boiled down the
   overhead to a toggle in our ApplicationController. This created a clean
   separation between the old and new world and protected us against regressions
   between the two. Rebranding a feature involved adding new styles to the
   application_rebrand.css and implementing them in new rebrand view files.
   Anything that didn’t get a new, rebranded template stayed in the world of
   plain old production. This freedom from legacy stylesheets and markup were
   critical to building and clearly demonstrating the new brand and value
   proposition we wanted to demonstrate to the world. De-scoping with a
   Lightweight Reskin To rebrand hundreds of pages in time, we had to iron out
   the precise requirements of what it meant for our views to be “on brand”.
   Working with our product team, we determined that the minimum amount of
   change to consider a page rebranded was adoption of the new header, footer,
   colors, and fonts. These guidelines constituted our “opted out”
   experience — views that would receive this lightweight reskin immediately but
   not the full rebrand treatment. This light coat of paint was applied to our
   production layer, so any experience that couldn’t be fully redesigned within
   our timeline would still get a fresh header and the fonts and colors that
   reflected our new brand. As we neared the finish line, the rebranded world
   became our default and this opt-out world became a variant. A
   controller-level hook allowed us to easily distinguish which views were to
   display opt-out mode with a single line of code. We wrote a controller-level
   hook to update the variant and render the new layout files, reskinning
   the package. Using a separate CSS manifest with the core changes enumerated
   above, we felt free to dedicate resources to more thoroughly rebranding our
   high traffic experiences, deferring improvements to pages that received the
   initial reskin until after launch. As we’ve circled back to clean up these
   lower-traffic views and give them the full rebrand treatment, we’ve come
   closer to deleting the opt_out CSS manifest and deprecating our our legacy
   stylesheets for good. Designing an Off Ramp Just as we are committed to
   rolling out large changes in small portions, we are careful to avoid huge
   changesets on the other side of a release. Fortunately, variants made
   removing legacy code quite straightforward. After flipping the feature flag
   and establishing “rebrand” as the permanent variant context, all that
   remained was to destroy the legacy files that were no longer being rendered
   and remove the variant name from the file extension of the new primary view
   template. Controllers utilizing the opt_out hook made their way onto a to-do
   list for this work without the stress of a deadline. The Other Side of the
   Launch As the big day arrived, we enjoyed a smooth rebrand launch thanks to
   the thoughtful implementation of our existing tools and techniques. We
   leveraged ActionPack variants built into Rails and feature flags from
   TestTrack in new ways, ensuring we didn’t need to make any architecture
   changes. The end result: a completely fresh set of views and a new brand
   we’re excited to share with the world at large.
   5 min read


 * REFLECTING ON OUR ENGINEERING APPRENTICESHIP PROGRAM
   
   Reflecting on Our Engineering Apprenticeship Program Betterment piloted an
   Apprentice Program to add junior talent to our engineering organization in
   2017, and it couldn’t have been more successful or rewarding for all of us.
   One year later, we’ve asked them to reflect on their experiences. In Spring
   of 2017, Betterment’s Diversity & Inclusion Steering Committee partnered with
   our Engineering Team to bring on two developers with non-traditional
   backgrounds. We hired Jess Harrelson (Betterment for Advisors Team) and Fidel
   Severino (Retail Team) for a 90 day Apprentice Program. Following their
   apprenticeship, they joined us as full-time Junior Engineers. I’m Jess, a
   recruiter here at Betterment, and I had the immense pleasure of working
   closely with these two. It’s been an incredible journey, so I sat down with
   them to hear first hand about their experiences. Tell us a bit about your
   life before Betterment. Jess Harrelson: I was born and raised in Wyoming and
   spent a lot of time exploring the outdoors. I moved to Nashville to study
   songwriting and music business, and started a small label through which I
   released my band’s album. I moved to New York after getting an opportunity at
   Sony and worked for a year producing video content. Fidel Severino: I’m
   originally from the Dominican Republic and moved to the United States at age
   15. After graduation from Manhattan Center for Science and Mathematics High
   School, I completed a semester at Lehman College before unfortunate family
   circumstances required me to go back to the Dominican Republic. When I
   returned to the United States, I worked in the retail sector for a few years.
   While working, I would take any available time for courses on websites like
   Codecademy and Team Treehouse. Can we talk about why you decided to become an
   Engineer? Jess Harrelson: Coding became a hobby for me when I would make
   websites for my bands in Nashville, but after meeting up with more and more
   people in tech in the city, I knew it was something I wanted to do as a
   career. I found coding super similar from a composition and structure
   perspective, which allowed me to tap into the creative side of coding. I
   started applying to every bootcamp scholarship I could find and received a
   full scholarship to Flatiron School. I made the jump to start becoming an
   engineer. Fidel Severino: While working, I would take any available time for
   courses on websites like Codecademy and Team Treehouse. I have always been
   interested in technology. I was one of those kids who “broke” their toys in
   order to find out how they worked. I’ve always had a curious mind. My
   interactions with technology prior to learning about programming had always
   been as a consumer. I cherished the opportunity and the challenge that comes
   with building with code. The feeling of solving a bug you’ve been stuck on
   for a while is satisfaction at its best. Those bootcamps changed all of our
   lives! You learned how to be talented, dynamic engineers and we reap the
   benefit. Let’s talk about why you chose Betterment. Jess Harrelson: I first
   heard of Betterment by attending the Women Who Code — Algorithms meetup
   hosted at HQ. Paddy, who hosts the meetups, let us know that Betterment was
   launching an apprenticeship program and after the meetup I asked how I could
   get involved and applied for the program. I was also applying for another
   different apprenticeship program but throughout the transparent,
   straightforward interview process, the Betterment apprenticeship quickly
   became my first choice. Fidel Severino: The opportunity to join Betterment’s
   Apprenticeship program came via the Flatiron School. One of the main reasons
   I was ecstatic to join Betterment was how I felt throughout the recruiting
   process. At no point did I feel the pressure that’s normally associated with
   landing a job. Keep in mind, this was an opportunity unlike any other I had
   up to this point in my life, but once I got to talking with the interviewers,
   the conversation just flowed. The way the final interview was setup made me
   rave about it to pretty much everyone I knew. Here was a company that wasn’t
   solely focused on the traditional Computer Science education when hiring an
   apprentice/junior engineer. The interview was centered around how well you
   communicate,work with others, and problem solve. I had a blast pair
   programming with 3 engineers, which I’m glad to say are now my co-workers! We
   are so lucky to have you! What would you say has been the most rewarding part
   of your experience so far? Jess Harrelson: The direct mentorship during my
   apprenticeship and exposure to a large production codebase. Prior to
   Betterment, I only had experience with super small codebases that I built
   myself or with friends. Working with Betterment’s applications gave me a
   hands-on understanding of concepts that are hard to reproduce on a smaller,
   personal application level. Being surrounded by a bunch of smart, helpful
   people has also been super amazing and helped me grow as an engineer. Fidel
   Severino: Oh man! There’s so many things I would love to list here. However,
   you asked for the most rewarding, and I would have to say without a
   doubt — the mentorship. As someone with only self-taught and Bootcamp
   experience, I didn’t know how much I didn’t know. I had two exceptional
   mentors who went above and beyond and removed any blocks preventing me from
   accomplishing tasks. On a related note, the entire company has a
   collaborative culture that is contagious. You want to help others whenever
   you can; and it has been the case that I’ve received plenty of help from
   others who aren’t even directly on my team. What’s kept you here? Fidel
   Severino: The people. The collaborative environment. The culture of learning.
   The unlimited supply of iced coffee. Great office dogs. All of the above!
   Jess Harrelson: Seriously though, it was the combination of all that plus so
   many other things. Getting to work with talented, smart people who want to
   make a difference. This article is part of Engineering at Betterment.
   6 min read


 * A JOURNEY TO TRULY SAFE HTML RENDERING
   
   A Journey to Truly Safe HTML Rendering We leverage Rubocop’s OutputSafety
   check to ensure we’re being diligent about safe HTML rendering, so when we
   found vulnerabilities, we fixed them. As developers of financial software on
   the web, one of our biggest responsibilities is to keep our applications
   secure. One area we need to be conscious of is how we render HTML. If we
   don’t escape content properly, we could open ourselves and our customers up
   to security risks. We take this seriously at Betterment, so we use tools like
   Rubocop, the Ruby static analysis tool, to keep us on the right track. When
   we found that Rubocop’s OutputSafety check had some holes, we plugged them.
   What does it mean to escape content? Escaping content simply means replacing
   special characters with entities so that HTML understands to print those
   characters rather than act upon their special meanings. For example,
   the < character is escaped using <, the >character is escaped using >, and
   the & character is escaped using &. What could happen if we don’t
   escape content? We escape content primarily to avoid opening ourselves up to
   XSS (cross-site scripting) attacks. If we were to inject user-provided
   content onto a page without escaping it, we’d be vulnerable to executing
   malicious code in the user’s browser, allowing an attacker full control over
   a customer’s session.This resource is helpful to learn more about XSS. Rails
   makes escaping content easier Rails escapes content by default in some
   scenarios, including when tag helpers are used. In addition, Rails has a few
   methods that provide help in escaping content. safejoin escapes the content
   and returns a SafeBuffer (a String flagged as safe) containing it. On the
   other hand, some methods are just a means for us to mark content as already
   safe. For example, the <%==interpolation token renders content as is
   and raw, htmlsafe, and safe_concat simply return a SafeBuffer containing the
   original content as is, which poses a security risk. If content is inside
   a SafeBuffer, Rails won’t try to escape it upon rendering. Some examples:
   html_safe: [1] pry(main)> include ActionView::Helpers::OutputSafetyHelper =>
   Object [2] pry(main)> result = “hi”.html_safe => “hi” [3] pry(main)>
   result.class => ActiveSupport::SafeBuffer raw: [1] pry(main)> result =
   raw(“hi”) => “hi” [2] pry(main)> result.class => ActiveSupport::SafeBuffer
   safe_concat: [1] pry(main)> include ActionView::Helpers::TextHelper => Object
   [2] pry(main)> buffer1 = “hi”.html_safe => “hi” [3] pry(main)> result =
   buffer1.safe_concat(“bye”) => “hibye” [4] pry(main)> result.class =>
   ActiveSupport::SafeBuffer safe_join: [1] pry(main)> include
   ActionView::Helpers::OutputSafetyHelper => Object [2] pry(main)> result =
   safe_join([“hi”, “bye”]) => “<p>hi</p><p>bye</p>” [3] pry(main)> result.class
   => ActiveSupport::SafeBuffer => ActiveSupport::SafeBuffer Rubocop:
   we’re safe! As demonstrated, Rails provides some methods that mark content as
   safe without escaping it for us. Rubocop, a popular Ruby static analysis
   tool, provides a cop (which is what Rubocop calls a “check”) to alert us when
   we’re using these methods: Rails/OutputSafety. At Betterment, we explicitly
   enable this cop in our Rubocop configurations so if a developer wants to mark
   content as safe, they will need to explicitly disable the cop. This forces
   extra thought and extra conversation in code review to ensure that the usage
   is in fact safe. … Almost We were thrilled about the introduction of this
   cop — we had actually written custom cops prior to its introduction to
   protect us against using the methods that don’t escape content. However, we
   realized there were some issues with the opinions the cop held about some of
   these methods. The first of these issues was that the cop allowed usage
   of raw and htmlsafewhen the usages were wrapped in safejoin. The problem with
   this is that when raw or htmlsafe are used to mark content as already safe by
   putting it in a SafeBuffer as is, safejoin will not actually do anything
   additional to escape the content. This means that these usages
   of raw and html_safeshould still be violations. The second of these issues
   was that the cop prevented usages of raw and htmlsafe, but did not prevent
   usages of safeconcat. safeconcat has the same functionality
   as raw and htmlsafe — it simply marks the content safe as is by returning it
   in a SafeBuffer. Therefore, the cop should hold the same opinions
   about safe_concat as it does about the other two methods. So, we fixed it
   Rather than continue to use our custom cops, we decided to give back to the
   community and fix the issues we had found with the Rails/OutputSafety cop. We
   began with this pull request to patch the first issue — change the behavior
   of the cop to recognize raw and htmlsafe as violations regardless of being
   wrapped in safejoin. We found the Rubocop community to be welcoming, making
   only minor suggestions before merging our contribution. We followed up
   shortly after with a pull request to patch the second issue — change the
   behavior of the cop to disallow usages of safe_concat. This contribution was
   merged as well. Contributing to Rubocop was such a nice experience that when
   we later found that we’d like to add a configuration option to an unrelated
   cop, we felt great about opening a pull request to do so, which was merged as
   well. And here we are! Our engineering team here at Betterment takes security
   seriously. We leverage tools like Rubocop and Brakeman, a static analysis
   tool specifically focused on security, to make our software safe by default
   against many of the most common security errors, even for code we haven’t
   written yet. We now rely on Rubocop’s Rails/OutputSafety cop (instead of our
   custom cop) to help ensure that our team is making good decisions about
   escaping HTML content. Along the way, we were able to contribute back to a
   great community. This article is part of Engineering at Betterment. These
   articles are maintained by Betterment Holdings Inc. and they are not
   associated with Betterment, LLC or MTG, LLC. The content on this article is
   for informational and educational purposes only. © 2017–2019 Betterment
   Holdings Inc.
   5 min read


 * BUILDING BETTER SOFTWARE FASTER WITH SHARED PRINCIPLES
   
   Building Better Software Faster with Shared Principles Betterment’s playbook
   for extending the golden hour of startup innovation at scale. Betterment’s
   promise to customers rests on our ability to execute. To fulfill that
   promise, we need to deliver the best product and tools available and then
   improve them indefinitely, which, when you think about it, sounds incredibly
   ambitious or even foolhardy. For a problem space as large as ours, we can’t
   fulfill that promise with a single two pizza team. But a scaled engineering
   org presents other challenges that could just as easily put the goal out of
   reach. Centralizing architectural decision-making would kill ownership and
   autonomy, and ensure your best people leave or never join in the first place.
   On the other hand, shared-nothing teams can lead to information silos,
   wheel-reinventing, and integration nightmares when an initiative is too big
   for a squad to deliver alone. To meet those challenges, we believe it’s
   essential to share more than languages, libraries, and context-free best
   practices. We can collectively build and share a body of interrelated
   principles driven by insights that our industry as a whole hasn’t yet
   realized or is just beginning to understand. Those principles can form chains
   of reasoning that allow us to run fearlessly, in parallel, and arrive at
   coherent solutions better than the sum of their parts. I gave a talk about
   Betterment’s engineering principles at a Rails at Scale meetup earlier last
   year and promised to share them after our diligent legal team finished
   reviewing. (Legal helpfully reviewed these principles months ago, but then I
   had my first child, and, as you can imagine, priorities shifted.) Without any
   further ado, here are Betterment’s Engineering Principles. You can also watch
   my Rails at Scale talk to learn why we developed them and how we maintain
   them. Parting Thoughts on Our Principles Our principles aren’t permanent
   as-written. Our principles are a living document in an actual git repository
   that we’ll continue to add to and revise as we learn and grow. Our principles
   derive from and are matched to Betterment’s collective experience and
   context. We don’t expect these principles to appeal to everybody. But we do
   believe strongly that there’s more to agree about than our industry has been
   able to establish so far. Consider these principles, along with our current
   and future open source work, part of our contribution to that conversation.
   What are the principles that your team share?
   3 min read


 * SUPPORTING FACE ID ON THE IPHONE X
   
   Supporting Face ID on the iPhone X We look at how Betterment's mobile
   engineering team developed Face ID for the latest phones, like iPhone X.
   Helping people do what’s best with their money requires providing them with
   responsible security measures to protect their private financial data. In
   Betterment’s mobile apps, this means including trustworthy but convenient
   local authentication options for resuming active login sessions. Three years
   ago, in 2014, we implemented Touch ID support as an alternative to using PIN
   entry in our iOS app. Today, on its first day, we’re thrilled to announce
   that the Betterment iOS app fully supports Apple’s new Face ID technology on
   the iPhone X. Trusting the Secure Enclave While we’re certainly proud of
   shipping this feature quickly, a lot of credit is due to Apple for how
   seriously the company takes device security and data privacy as a whole. The
   hardware feature of the Secure Enclave included on iPhones since the 5S make
   for a readily trustworthy connection to the device and its operating system.
   From an application’s perspective, this relationship between a biometric
   scanner and the Secure Enclave is simplified to a boolean response. When
   requested through the Local Authentication framework, the biometry evaluation
   either succeeds or fails separate from any given state of an application. The
   “reply” completion closure of evaluatePolicy(_:localizedReason:reply:) This
   made testing from the iOS Simulator a viable option for gaining a reasonable
   degree of certainty that our application would behave as expected when
   running on a device, thus allowing us to prepare a build in advance of having
   a device to test on. LABiometryType Since we’ve been securely using Touch ID
   for years, adapting our existing implementation to include Face ID was a
   relatively minor change. Thanks primarily to the simple addition of
   the LABiometryType enum newly available in iOS 11, it’s easy for our
   application to determine which biometry feature, if any, is available on a
   given device. This is such a minor change, in fact, that we were able to
   reuse all of our same view controllers that we had built for Touch ID with
   only a handful of string values that are now determined at runtime. One
   challenge we have that most existing iOS apps share is the need to still
   support older iOS versions. For this reason, we chose to
   wrap LABiometryTypebehind our own BiometryType enum. This allows us to
   encapsulate both the need to use an iOS 11 compiler flag and the need to
   call canEvaluatePolicy(_:error:) on an instance of LAContext before accessing
   its biometryType property into a single calculated property: See the Gist.
   NSFaceIDUsageDescription The other difference with Face ID is the
   new NSFaceIDUsageDescriptionprivacy string that should be included in the
   application’s Info.plist file. This is a departure from Touch ID which does
   not require a separate privacy permission, and which uses
   the localizedReason string parameter when showing its evaluation prompt.
   Touch ID evaluation prompt displaying the localized reason While Face ID does
   not seem to make a use of that localizedReason string during evaluation,
   without the privacy string the iPhone X will run the application’s Local
   Authentication feature in compatibility mode. This informs the user that the
   application should work with Face ID but may do so imperfectly.   Face ID
   permissions prompt without (left) and with (right) an
   NSFaceIDUsageDescription string included in the Info.plist This compatibility
   mode prompt is undesirable enough on its own, but it also clued us into the
   need to check for potential security concerns opened up by this
   forwards-compatibility-by-default from Apple. Thankfully, the changes to the
   Local Authentication framework were done in such a way that we determined
   there wasn’t a security risk, but it did leave a problematic user experience
   in reaching a potentially-inescapable screen when selecting “Don’t Allow” on
   the privacy permission prompt. Since we believe strongly in our users’ right
   to say “no”, resolving this design issue was the primary reason we
   prioritized shipping this update. Ship It If your mobile iOS app also
   displays sensitive information and uses Touch ID for biometry-based local
   authentication, join us in making the easy adaption to delight your users
   with full support for Face ID on the iPhone X.
   4 min read


 * FROM 1 TO N: DISTRIBUTED DATA PROCESSING WITH AIRFLOW
   
   From 1 to N: Distributed Data Processing with Airflow Betterment has built a
   highly available data processing platform to power new product features and
   backend processing needs using Airflow. Betterment’s data platform is unique
   in that it not only supports offline needs such as analytics, but also powers
   our consumer-facing product. Features such as Time Weighted
   Returns and Betterment for Business balances rely on our data platform
   working throughout the day. Additionally, we have regulatory obligations to
   report complex data to third parties daily, making data engineering a mission
   critical part of what we do at Betterment. We originally ran our data
   platform on a single machine in 2015 when we ingested far less data with
   fewer consumer-facing requirements. However, recent customer and data growth
   coupled with new business requirements require us to now scale horizontally
   with high availability. Transitioning from Luigi to Airflow Our single-server
   approach used Luigi, a Python module created to orchestrate long-running
   batch jobs with dependencies. While we could achieve high availability with
   Luigi, it’s now 2017 and the data engineering landscape has shifted. We
   turned to Airflow because it has emerged as a full-featured workflow
   management framework better suited to orchestrate frequent tasks throughout
   the day. To migrate to Airflow, we’re deprecating our Luigi solution on two
   fronts: cross-database replication and task orchestration. We’re using
   Amazon’s Database Migration Service (DMS) to replace our Luigi-implemented
   replication solution and re-building all other Luigi workflows in Airflow.
   We’ll dive into each of these pieces below to explain how Airflow mediated
   this transition. Cross-Database Replication with DMS We used Luigi to extract
   and load source data from multiple internal databases into our Redshift data
   warehouse on an ongoing basis. We recently adopted Amazon’s DMS for
   continuous cross-database replication to Redshift, moving away from our
   internally-built solution. The only downside of DMS is that we are not aware
   of how recent source data is in Redshift. For example, a task computing all
   of a prior day’s activity executed at midnight would be inaccurate if
   Redshift were missing data from DMS at midnight due to lag. In Luigi, we knew
   when the data was pulled and only then would we trigger a task. However, in
   Airflow we reversed our thinking to embrace DMS, using Airflow’s sensor
   operators to wait for rows to be pushed from DMS before carrying on with
   dependent tasks. High Availability in Airflow While Airflow doesn’t claim to
   be highly available out of the box, we built an infrastructure to get as
   close as possible. We’re running Airflow’s database on Amazon’s Relational
   Database Service and using Amazon’s Elasticache for Redis queuing. Both of
   these solutions come with high availability and automatic failover as add-ons
   Amazon provides. Additionally, we always deploy multiple baseline Airflow
   workers in case one fails, in which case we use automated deploys to stand up
   any part of the Airflow cluster on new hardware. There is still one single
   point of failure left in our Airflow architecture though: the scheduler.
   While we may implement a hot-standby backup in the future, we simply accept
   it as a known risk and set our monitoring system to notify a team member of
   any deviances. Cost-Effective Scalability Since our processing needs
   fluctuate throughout the day, we were paying for computing power we didn’t
   actually need during non-peak times on a single machine, as shown in our
   Luigi server’s load. Distributed workers used with Amazon’s Auto Scaling
   Groups allow us to automatically add and remove workers based on outstanding
   tasks in our queues. Effectively, this means maintaining only a baseline
   level of workers throughout the day and scaling up during peaks when our
   workload increases. Airflow queues allow us to designate certain tasks to run
   on particular hardware (e.g. CPU optimized) to further reduce costs. We found
   just a few hardware type queues to be effective. For instance, tasks that
   saturate CPU are best run on a compute optimized worker with concurrency set
   to the number of cores. Non-CPU intensive tasks (e.g. polling a database) can
   run on higher concurrency per CPU core to save overall resources. Extending
   Airflow Code Airflow tasks that pass data to each other can run on different
   machines, presenting a new challenge versus running everything on a single
   machine. For example, one Airflow task may write a file and a subsequent task
   may need to email the file from the dependent task ran on another machine. To
   implement this pattern, we use Amazon S3 as a persistent storage tier.
   Fortunately, Airflow already maintains a wide selection of hooks to work with
   remote sources such as S3. While S3 is great for production, it’s a little
   difficult to work with in development and testing where we prefer to use the
   local filesystem. We implemented a “local fallback” mixin for Airflow
   maintained hooks that uses the local filesystem for development and testing,
   deferring to the actual hook’s remote functionality only on production.
   Development & Deployment We mimic our production cluster as closely as
   possible for development & testing to identify any issues that may arise with
   multiple workers. This is why we adopted Docker to run a production-like
   Airflow cluster from the ground up on our development machines. We use
   containers to simulate multiple physical worker machines that connect to
   officially maintained local Redis and PostgreSQL containers. Development and
   testing also require us to stand up the Airflow database with predefined
   objects such as connections and pools for the code under test to function
   properly. To solve this programmatically, we adopted Alembicdatabase
   migrations to manage these objects through code, allowing us to keep our
   development, testing, and production Airflow databases consistent. Graceful
   Worker Shutdown Upon each deploy, we use Ansible to launch new worker
   instances and terminate existing workers. But what happens when our workers
   are busy with other work during a deploy? We don’t want to terminate workers
   while they’re finishing something up and instead want them to terminate after
   the work is done (not accepting new work in the interim).
   Fortunately, Celery supports this shutdown behavior and will stop accepting
   new work after receiving an initial TERM signal, letting old work finish up.
   We use Upstart to define all Airflow services and simply wrap the TERM
   behavior in our worker’s post-stop script, sending the TERM signal first,
   waiting until we see the Celery process stopped, then finally poweroff the
   machine. Conclusion The path to building a highly available data processing
   service was not straightforward, requiring us to build a few specific but
   critical additions to Airflow. Investing the time to run Airflow as a cluster
   versus a single machine allows us to run work in a more elastic manner,
   saving costs and using optimized hardware for particular jobs. Implementing a
   local fallback for remote hooks made our code much more testable and easier
   to work with locally, while still allowing us to run with Airflow-maintained
   functionality in production. While migrating from Luigi to Airflow is not yet
   complete, Airflow has already offered us a solid foundation. We look forward
   to continuing to build upon Airflow and contributing back to the community.
   This article is part of Engineering at Betterment. These articles are
   maintained by Betterment Holdings Inc. and they are not associated with
   Betterment, LLC or MTG, LLC. The content on this article is for informational
   and educational purposes only. © 2017–2019 Betterment Holdings Inc.
   6 min read


 * A FUNCTIONAL APPROACH TO PENNY-PRECISE ALLOCATION
   
   A Functional Approach to Penny-Precise Allocation How we solved the problem
   allocating a sum of money proportionally across multiple buckets by leaning
   on functional programming. An easy trap to fall into as an object-oriented
   developer is to get too caught up in the idea that everything has to be an
   object. I work in Ruby, for example, where the first thing you learn is
   that everything is an object. Some problems, however, are better solved by
   taking a functional approach. For instance, at Betterment, we faced the
   challenge of allocating a sum of money proportionally across multiple
   buckets. In this post, I’ll share how we solved the problem by leaning on
   functional programming to allocate money precisely across proportional
   buckets. The Problem Proportional allocation comes up often throughout our
   codebase, but it’s easiest to explain using a fictional example: Suppose your
   paychecks are $1000 each, and you always allocate them to your different
   savings accounts as follows: College savings fund: $310 Buy a car fund: $350
   Buy a house fund: $200 Safety net: $140 Now suppose you’re an awesome
   employee and received a bonus of $1234.56. You want to allocate your bonus
   proportionally in the same way you allocate your regular paychecks. How much
   money do you put in each account? You may be thinking, isn’t this a simple
   math problem? Let’s say it is. To get each amount, take the ratio of the
   contribution from your normal paycheck to the total of your normal paycheck,
   and multiply that by your bonus. So, your college savings fund would get:
   (310/1000)*1234.56 = 382.7136 We can do the same for your other three
   accounts, but you may have noticed a problem. We can’t split a penny into
   fractions, so we can’t give your college savings fund the exact proportional
   amount. More generally, how do we take an inflow of money and allocate it to
   weighted buckets in a fair, penny-precise way? The Mathematical Solution:
   Integer Allocation We chose to tackle the problem by working with integers
   instead of decimal numbers in order to avoid rounding. This is easy to do
   with money — we can just work in cents instead of dollars. Next, we settled
   on an algorithm which pays out buckets fairly, and guarantees that the total
   payments exactly sum to the desired payout. This algorithm is called
   the Largest Remainder Method. Multiply the inflow (or the payout in the
   example above) by each weight (where the weights are the integer amounts of
   the buckets, so the contributions to the ticket in our example above), and
   divide each of these products by the sum of the buckets, finding the integer
   quotient and integer remainder Find the number of pennies that will be left
   over to allocate by taking the inflow minus the total of the integer
   quotients Sort the remainders in descending order and allocate any leftover
   pennies to the buckets in this order The idea here is that the quotients
   represent the amounts we should give each bucket aside from the leftover
   pennies. Then we figure out which bucket deserves the leftover pennies. Let’s
   walk through this process for our example: Remember that we’re working in
   cents, so our inflow is 123456 and we need to allocate it across bucket
   weights of [31000, 35000, 20000, 14000]. We find each integer quotient and
   remainder by multiplying the inflow by the weight and dividing by the total
   weight. We took advantage of the divmod method in Ruby to grab the integer
   quotient and remainder in one shot, like so: buckets.map do |bucket| (inflow
   * bucket).divmod(total_bucket_weight) end This gives us 12345631000/100000,
   12345635000/100000, 12345620000 /100000 and 12345614000/100000. The integer
   quotients with their respective remainders are [38271, 36000], [43209,
   60000], [24691, 20000], [17283, 84000]. Next, we find the leftover pennies by
   taking the inflow minus the total of the integer quotients, which is
   123456 — (38271 + 43209 + 24691 + 17283) = 2. Finally, we sort our buckets in
   descending remainder order (because the buckets with the highest remainders
   are most deserving of extra pennies) and allocate the leftover pennies we
   have in this order. It’s worth noting that in our case, we’re using Ruby’s
   sort_by method, which gives us a nondeterministic order in the case where
   remainders are equal. In this case, our fourth bucket and second bucket,
   respectively, are most deserving. Our final allocations are therefore [38271,
   43210, 24691, 17284]. This means that your college savings fund gets $382.71,
   your car fund gets $432.10, your house fund gets $246.91, and your safety net
   gets $172.84. The Code Solution: Make It Functional Given we have to manage
   penny allocations between a person’s goals often throughout our codebase, the
   last thing we’d want is to have to bake penny-pushing logic throughout our
   domain logic. Therefore, we decided to extract our allocation code into a
   module function. Then, we took it even further. Our allocation code doesn’t
   need to care that we’re looking to allocate money, just that we’re looking to
   allocate integers. What we ended up with was a black box ‘Allocator’ module,
   with a public module function to which you could pass 2 arguments: an inflow,
   and an array of weightings. Our Ruby code looks like this. The takeaway The
   biggest lesson to learn from this experience is that, as an engineer, you
   should not be afraid to take a functional approach when it makes sense. In
   this case, we were able to extract a solution to a complicated problem and
   keep our OO domain-specific logic clean.
   5 min read


 * HOW WE BUILT TWO-FACTOR AUTHENTICATION FOR BETTERMENT ACCOUNTS
   
   How We Built Two-Factor Authentication for Betterment Accounts Betterment
   engineers implemented Two-Factor Authentication across all our apps,
   simplifying and strengthening our authentication code in the process. Big
   change is more stressful than small change for people and software systems
   alike. Dividing a big software project into small pieces is one of the most
   effective ways to reduce the risk of introducing bugs. As we incorporated
   Two-Factor Authentication (2FA) into our security codebase, we used a phased
   rollout strategy to validate portions of the picture before moving on.
   Throughout the project, we leaned heavily on our collaborative review
   processes to both strengthen and simplify our authentication patterns. Along
   the way, we realized that we could integrate our new code more easily if we
   reworked surrounding access patterns with 2FA in mind. In other words, the 2F
   itself was relatively easy. Getting the surrounding A right was much
   trickier. Lead software engineer Chris LoPresto (right) helped lead the team
   in building Two-Factor Authentication and App Passwords for Betterment
   accounts.  What We Built Two-factor authentication is a security scheme in
   which users must provide two separate pieces of evidence to verify their
   identity prior to being granted access. We recently introduced two different
   forms of 2FA for Betterment apps: TOTP (Time-based One-Time Passwords) using
   an authenticator app like Google Authenticator or Authy SMS verification
   codes While SMS is not as secure as an authenticator app, we decided the
   increased 2FA adoption it facilitated was worthwhile. Two authentication
   factors are better than one, and it is our hope that all customers consider
   taking advantage of TOTP. To Build or Not To Build When designing new
   software features, there is a set of tradeoffs between writing your own code
   and integrating someone else's. Even if you have an expert team of
   developers, it can be quicker and more cost-efficient to use a third-party
   service to set up something complex like an authentication service. We don't
   suffer from Not Invented Here Syndrome at Betterment, so we evaluated
   products like Authy and Duo at the start of this project. Both services offer
   a robust set of authentication features that provide tremendous value with
   minimal development effort. But as we envisioned integrating either service
   into our apps, we realized we had work to do on our end. Betterment has
   multiple applications for consumers, financial advisors, and 401(k)
   participants that were built at different times with different technologies.
   Unifying the authentication patterns in these apps was a necessary first step
   in our 2FA project and would involve far more time and thought than building
   the 2F handshake itself. This realization, coupled with the desire to build a
   tightly integrated user experience, led to our decision to build 2FA
   ourselves. Validating the Approach Once we decide to build something, we also
   need to learn what not to build. Typically the best way to do that is to
   build something disposable, throw it away, and start over atop freshly
   learned lessons. To estimate the level of effort involved in building 2FA
   user interactions, we built some rough prototypes. For our TOTP prototype we
   generated a secret key, formatted it as a TOTP provisioning URI, and ran that
   through a QR code gem. SMS required a third-party provider, Twilio, whose
   client gem made it almost too easy to text each other "status updates." In
   short order, we were confident in our ability to deliver 2FA functionality
   that would work well. The quick ramp-up time and successful outcome of such
   experiments are among the reasons we value working within the mature,
   developer-friendly Rails ecosystem. While our initial prototypes were naive
   and didn’t actually integrate with our auth systems, they formed the core of
   the two-factor approaches that ultimately landed in our production codebase.
   Introducing Concepts Before Behaviors Before 2FA entered the picture, our
   authentication systems performed several tasks when a Betterment user
   attempted to log in: Verify the provided email address matches an existing
   user account Hash the provided password with the user’s salt and verify that
   it matches the hashed password stored for the user account Verify the user
   account is not locked for security reasons (e.g., too many incorrect password
   attempts) Create persistent authorization context (e.g., browser cookie,
   mobile token) to allow the user in the door Our authentication codebase
   handled all of these tasks in response to a single user action (the act of
   providing an email and password). As we began reworking this code to handle a
   potential second user action (the act of providing a login challenge code)
   the resultant branching logic became overly complex and difficult to
   understand. Many of our prior design assumptions no longer held, so we paused
   2FA development and spun our chairs around for an impromptu design meeting.
   With 2FA requirements in mind, we decided to redesign our existing password
   verification as the first of two potential authentication factors. We built,
   tested, and released this new code independently. Our test suite gave us
   confidence that our existing password and user state validations remained
   unchanged within the new notion of a “first authentication factor.” Taking
   this remodeling detour enabled us to deliver the concept of authentication
   factors separately from any new system behaviors that relied on them. When we
   resumed work on 2FA, the proposed “second authentication factor”
   functionality now fell neatly into place. As a result, we delivered the new
   2FA features far more safely and quickly than we could have if we attempted
   to do everything in one fell swoop. Adding App Passwords Betterment customers
   have the option of connecting their account to third-party services like
   TurboTax and Mint. In keeping with our design principle of authorization
   through impossibility, we created a dedicated API authentication strategy for
   this use case, separate from our user-focused web authentication strategy.
   Dedicated endpoints for these services provide read-only access to the bare
   minimum content (e.g., account balances, transaction information). This
   strict separation of concerns helps to keep our customers’ data safe and our
   code simple. However, in order to connect to third-party services, our
   customers still had to share their account password with these third parties.
   While these institutions may be trustworthy, it is best to eliminate shared
   trust wherever possible when designing secure systems. Because these services
   do not support 2FA, it was now time to build a more secure password scheme
   for third-party apps. We started by designing a simple process for customers
   to generate app passwords for each service they wish to connect. These app
   passwords are complex enough for safe usage yet employ an alphabet scheme
   easily transcribed by our customers during setup. We then rewrote our API
   authentication code to accept app passwords and to reject account passwords
   for users with 2FA enabled. Our customers can now provide (and revoke) unique
   read-only passwords for third party services they connect to Betterment.
   Crucially, our app password scheme is compatible right out of the gate with
   the new 2FA features we just launched. Slicing Up Deliverables Building 2FA
   and app passwords involved a complex set of coordinated changes to sensitive
   security-related code. To minimize the level of risk in this ambitious
   project, we used the feature-toggling built into our open-source
   split-testing framework TestTrack. By hiding the new functionality behind a
   feature flag, we were able to to launch and validate features over the course
   of months before publicly unveiling them to retail customers. Even
   experienced programmers sometimes resist the “extra” work necessary to devise
   a phased approach to a problem. Sometimes we struggle to disentangle pieces
   that are ready for a partial launch from pieces that aren’t. But the point
   cannot be overstated: Feature flags are our friends. At Betterment, we use
   them to orchestrate the partial rollout of big features. We validate new
   functionality before unveiling it to our user base at large. By facilitating
   a series of small, testable code changes, feature flags provide one of the
   most effective means of mitigating risks associated with shipping large
   features. At the beginning of the 2FA project, we created a feature flag for
   the engineers working on the project. As the project progressed, we flipped
   the flag on for Betterment employees followed by a set of external beta
   testers. By the time we announced 2FA in the release notes for our mobile
   apps, the “new” code had been battle tested for months. Help Us Iterate The
   final step of our 2FA project was to delete the aforementioned feature flag
   from our codebase. While that was a truly satisfying moment, we all know that
   our work is never done. If you’re interested in approaching our next set of
   tricky projects in a nimble, iterative fashion, go ahead and apply.
   8 min read


 * HOW WE ENGINEERED BETTERMENT’S TAX-COORDINATED PORTFOLIO™
   
   How We Engineered Betterment’s Tax-Coordinated Portfolio™ For our latest
   tax-efficiency feature, Tax Coordination, Betterment’s solver-based portfolio
   management system enabled us to manage and test our most complex algorithms.
   Tax efficiency is a key consideration of Betterment’s portfolio management
   philosophy. With our new Tax Coordination feature, we’re continuing the
   mission to help our customers’ portfolios become as tax efficient as
   possible. While new products can often be achieved using our existing
   engineering abstractions, TCP brought the engineering team a new level of
   complexity that required us to rethink how parts of our portfolio management
   system were built. Here’s how we did it. A Primer on Tax Coordination
   Betterment’s TCP feature is our very own, fully automated version of an
   investment strategy known as asset location. If you’re not familiar with
   asset location, it is a strategy designed to optimize after-tax returns by
   placing tax-inefficient securities into more tax-advantaged accounts, such as
   401(k)s and Individual Retirement Accounts (IRAs). Before we built TCP,
   Betterment customers had each account managed as a separate, standalone
   portfolio. For example, customers could set up a Roth IRA with a portfolio of
   90% stocks and 10% bonds to save for retirement. Separately, they could set
   up a taxable retirement account invested likewise in 90% stocks and 10%
   bonds. Now, Betterment customers can turn on TCP in their accounts, and their
   holdings in multiple investment accounts will be managed as a single
   portfolio allocation, but rearranged in such a way that the holdings across
   those accounts seek to maximize the overall portfolio’s after-tax returns. To
   illustrate, let’s suppose you’re a Betterment customer with three different
   accounts: a Roth IRA, a traditional IRA, and a taxable retirement account.
   Let’s say that each account holds $50,000, for a total of $150,000 in
   investments. Now assume that the $50,000 in each account is invested into a
   portfolio of 70% stocks and 30% bonds. For reference, consider the diagram.
   The circles represent various asset classes, and the bar shows the allocation
   for all the accounts, if added together. Each account has a 70/30 allocation,
   and the accounts will add up to 70/30 in the aggregate, but we can do better
   when it comes to maximizing after-tax returns. We can maintain the aggregate
   70/30 asset allocation, but use the available balances of $50,000 each, to
   rearrange the securities in such a way that places the most tax-efficient
   holdings into a taxable account, and the most tax-inefficient ones into IRAs.
   Here’s a simple animation solely for illustrative purposes: Asset Location in
   Action The result is the same 70/30 allocation overall, except TCP has now
   redistributed the assets unevenly, to reduce future taxes. How We Modeled the
   Problem The fundamental questions the engineering team tried to answer were:
   How do we get our customers to this optimal state, and how do we maintain it
   in the presence of daily account activity? We could have attempted to
   construct a procedural-style heuristic solution to this, but the complexity
   of the problem led us to believe this approach would be hard to implement and
   challenging to maintain. Instead, we opted to model our problem as a linear
   program. This made the problem provably solvable and quick to compute—on the
   order of milliseconds per customer. Let’s consider a hypothetical customer
   account example. Meet Joe Joe is a hypothetical Betterment customer. When he
   signed up for Betterment, he opened a Roth IRA account. As an avid saver, Joe
   quickly reached his annual Roth IRA contribution limit of $5,500. Wanting to
   save more for his retirement, he decided to open up a Betterment taxable
   account, which he funded with an additional $11,000. Note that the
   contribution limits mentioned in this example are as of the time this article
   was published. Limits are subject to change from year to year, so please
   defer to IRS guidelines for current limits. See IRA limits here and 401(k)
   limits. Joe isn’t one to take huge risks, so he opted for a moderate asset
   allocation of 50% stocks and 50% bonds in both his Roth IRA and taxable
   accounts. To make things simple, let’s assume that both portfolios are only
   invested in two asset classes: U.S. total market stocks and emerging markets
   bonds. In his taxable account, Joe holds $5,500 worth of U.S. total market
   stocks in VTI (Vanguard Total Stock Market ETF), and $5,500 worth of emerging
   markets bonds in VWOB (Vanguard Emerging Markets Bond ETF). Let’s say that
   his Roth IRA holds $2,750 of VTI, and $2,750 of VWOB. Below is a table
   summarizing Joe’s holdings: Account Type: VTI (U.S. Total Market) VWOB
   (Emerging Markets Bonds) Account Total Taxable $5,500 $5,500 $11,000 Roth
   $2,750 $2,750 $5,500 Asset Class Total $8,250 $8,250 $16,500 To begin to
   construct our model for an optimal asset location strategy, we need to
   consider the relative value of each fund in both accounts. A number of
   factors are used to determine this, but most importantly each fund’s tax
   efficiency and expected returns. Let’s assume we already know that VTI has a
   higher expected value in Joe’s taxable account, and that VWOB has a higher
   expected value in his Roth IRA. To be more concrete about this, let’s define
   some variables.   Each variable represents the expected value of holding a
   particular fund in a particular account. For example, we’re representing the
   expected value of holding VTI in your Taxable as which we’ve defined to be
   0.07. More generally, Let’s let be the expected value of holding fund F in
   account A. Circling back to the original problem, we want to rearrange the
   holdings in Joe’s accounts in a way that’s maximally valuable in the future.
   Linear programs try to optimize the value of an objective function. In this
   example, we want to maximize the expected value of the holdings in Joe’s
   accounts. The overall value of Joe’s holdings are a function of the specific
   funds in which he has investments. Let’s define that objective function.  
   You’ll notice the familiar terms—measuring the expected value of holding each
   fund in each account, but also you’ll notice variables of the form Precisely,
   this variable represents the balance of fund F in account A. These are our
   decision variables—variables that we’re trying to solve for. Let’s plug in
   some balances to see what the expected value of V is with Joe’s current
   holdings: V=0.07*5500+0.04*5500+0.06*2750+0.05*2750=907.5   Certainly, we can
   do better. We cannot just assign arbitrarily large values to the decision
   variables due to two restrictions which cannot be violated: Joe must maintain
   $11,000 in his taxable account and $5,500 in his Roth IRA. We cannot assign
   Joe more money than he already has, nor can we move money between his Roth
   IRA and taxable accounts. Joe’s overall portfolio must also maintain its
   allocation of 50% stocks and 50% bonds—the risk profile he selected. We don’t
   want to invest all of his money into a single fund. Mathematically, it’s
   straightforward to represent the first restriction as two linear constraints.
   Simply put, we’ve asserted that the sum of the balances of every fund in
   Joe’s taxable account must remain at $11,000. Similarly, the sum of the
   balances of every fund in his Roth IRA must remain at $5,500. The second
   restriction—maintaining the portfolio allocation of 50% stocks and 50%
   bonds—might seem straightforward, but there’s a catch. You might guess that
   you can express it as follows: The above statements assert that the sum of
   the balances of VTI across Joe’s accounts must be equal to half of his total
   balance. Similarly, we’re also asserting that the sum of the balances of VWOB
   across Joe’s accounts must be equal to the remaining half of his total
   balance. While this will certainly work for this particular example,
   enforcing that the portfolio allocation is exactly on target when determining
   optimality turns out to be too restrictive. In certain scenarios, it’s
   undesirable to buy or to sell a specific fund because of tax consequences.
   These restrictions require us to allow for some portfolio drift—some
   deviation from the target allocation. We made the decision to maximize the
   expected after-tax value of a customer’s holdings after having achieved the
   minimum possible drift. To accomplish this, we need to define new decision
   variables. Let’s add them to our objective function: is the dollar amount
   above the target balance in asset class AC. Similarly, is the dollar amount
   below the target balance in asset class AC. For instance, is the dollar
   amount above the target balance in emerging markets bonds—the asset class to
   where VWOB belongs. We still want to maximize our objective function V.
   However, with the introduction of the drift terms, we want every dollar
   allocated toward a single fund to incur a penalty if it moves the target
   balance for that fund’s asset class below or above its target amount. To do
   this, we can relate the terms with the terms using linear constraints.   As
   shown above, we’ve asserted that the sum of the balances in funds including
   U.S. total market stocks (in this case, only VTI), plus some net drift amount
   in that asset class, must be equal to the target balance of that asset class
   in the portfolio (which in this case, is 50% of Joe’s total holdings).
   Similarly, we’ve also done this for emerging markets bonds. This way, if we
   can’t achieve perfect allocation, we have a buffer that we can fill—albeit at
   a penalty. Now that we have our objective function and constraints set up, we
   just need to solve these equations. For this we can use a mathematical
   programming solver. Here’s the optimal solution: Managing Engineering
   Complexity Reaching the optimal balances would require our system to buy and
   sell securities in Joe’s investment accounts. It’s not always free for Joe to
   go from his current holdings to optimal ones because buying and selling
   securities can have tax consequences. For example, if our system sold
   something at a short-term capital gain in Joe’s taxable account, or bought a
   security in his Roth IRA that was sold at a loss in the last 30
   days—triggering the wash-sale rule, we would be negatively impacting his
   after-tax return. In the simple example above with two accounts and two
   funds, there are a total of four constraints. Our production model is orders
   of magnitude more complex, and considers each Betterment customer’s
   individual tax lots, which introduces hundreds of individual constraints to
   our model. Generating these constraints that ultimately determine buying and
   selling decisions can often involve tricky business logic that examines a
   variety of data in our system. In addition, we knew that as our work on TCP
   progressed, we were going to need to iterate on our mathematical model.
   Before diving head first into the code, we made it a priority to be cognizant
   of the engineering challenges we would face. Key Principles for Using Tax
   Coordination on a Retirement Goal As a result, we wanted to make sure that
   the software we built respected four key principles, which are: Isolation
   from third-party solver APIs. Ability to keep pace with changes to the
   mathematical model, e.g., adding, removing, and changing the constraints and
   the objective function must be quick and painless. Separation of concerns
   between how we accessed data in our system and the business logic defining
   algorithmic behavior. Easy and comprehensive testing. We built our own
   internal framework for modeling mathematical programs that was not tied to
   our trading system’s domain-specific business logic. This gave us the
   flexibility to switch easily between a variety of third-party mathematical
   programming solvers. Our business logic that generates the model knows only
   about objects defined by our framework, and not about third-party APIs. To
   incorporate a third-party solver into our system, we built a translation
   layer that received our system-generated constraints and objective function
   as inputs, and utilized those inputs to solve the model using a third-party
   API. Switching between third-party solvers simply meant switching
   implementations of the interface below. We wanted that same level of
   flexibility in changing our mathematical model. Changing the objective
   function and adding new constraints needed to be easy to do. We did this by
   providing well-defined interfaces that give engineers access to core system
   data needed to generate our model. This means that an engineer implementing a
   change to the model would only need to worry about implementing algorithmic
   behavior, and not about how to retrieve the data needed to do that. To add a
   new set of constraints, engineers simply provide an implementation of a
   TradingConstraintGenerator. Each TradingConstraintGenerator knows about all
   of the system related data it needs to generate constraints. Through
   dependency injection, the new generator is included among the set of
   generators used to generate constraints. The sample code below illustrates
   how we generated the constraints for our model. With hundreds of constraints
   and hundreds of thousands of unique tax profiles across our customer base, we
   needed to be confident that our system made the right decisions in the right
   situations. For us, that meant having clear, readable tests that were a joy
   to write. Below is a test written in Groovy, which sets up fixture data that
   mimics the exact situation in our “Meet Joe” example. We not only had unit
   tests such as the one above to test simple scenarios where a human could
   calculate the outcome, but we also ran the optimizer in a simulated
   production-like environment, through hundreds of thousands of scenarios that
   closely resembled real ones. During testing, we often ran into scenarios
   where our model had no feasible solution—usually due to a bug we had
   introduced. As soon as the bug was fixed, we wanted to ensure that we had
   automated tests to handle a similar issue in the future. However, with so
   many sources of input affecting the optimized result, writing tests to cover
   these cases was very labor-intensive. Instead, we automated the test setup by
   building tools that could snapshot our input data as of the time the error
   occurred. The input data was serialized and automatically fed back into our
   test fixtures. Striving for Simplicity At Betterment, we aim to build
   products that help our customers reach their financial goals. Building new
   products can often be done using our existing engineering abstractions.
   However, TCP brought a new level of complexity that required us to rethink
   the way parts of our trading system were built. Modeling and implementing our
   portfolio management algorithms using linear programming was not easy, but it
   ultimately resulted in the simplest possible system needed to reliably pursue
   optimal after-tax returns. To learn more about engineering at Betterment,
   visit the engineering page on the Betterment Resource Center. All return
   examples and return figures mentioned above are for illustrative purposes
   only. For much more on our TCP research, including additional considerations
   on the suitability of TCP to your circumstances, please see our white paper.
   See full disclosure for our estimates and Tax Coordination in general.
   13 min read


 * WHAT’S THE BEST AUTHORIZATION FRAMEWORK? NONE AT ALL
   
   What’s the Best Authorization Framework? None At All Betterment’s engineering
   team builds software more securely by forgoing complicated authorization
   frameworks. As a financial institution, we take authorization—deciding who is
   allowed to do what—extremely seriously. But you don't need an authorization
   framework to build an application with robust security and access control. In
   fact, the increased complexity and indirection that authorization frameworks
   require can actually make your software less secure. At Betterment, we follow
   key principles to avoid authorization frameworks altogether in many of our
   applications. Of course, it would be impractical to completely avoid
   authorization features in internal applications that support our team's
   diverse responsibilities. For these apps, Betterment reframed the problem and
   built a radically simpler authorization framework by following a few simple
   ground rules. The Downside of Frameworks Application security is tough to get
   right. Some problems, like cryptography, are so thorny that even implementing
   a well-known algorithm yourself would be malpractice. Professional teams lean
   on proven libraries and frameworks to solve hard problems, such as NaCl for
   crypto and Devise for authentication. But authorization isn't like crypto or
   authentication. At Betterment, we've found that authorization rules emerge
   naturally from our business logic, and we believe that's where they belong.
   Most authorization frameworks blur the lines around this crucial piece of a
   business domain, leaving engineers to wonder whether and how to leverage the
   authorization framework versus treating a given condition as a regular
   business rule. Over time, these hundreds or thousands of successive decisions
   can result in a minefield of inconsistent, unauditable semantics, and
   ultimately the confusion can lead to bugs and vulnerabilities. Betterment has
   structured our entire platform around the security of our customers. By
   following the principles in this article, we've simplified the authorization
   problem, making decisions easy and accountable, and achieving even higher
   confidence in our systems’ safety. Authorization Without the Framework Here
   are the principles that keep Betterment's most security-critical apps free of
   authorization frameworks: Authorization Through Impossibility The most
   fundamental authorization rule of an app like Betterment’s is that users
   should only be able to see their own financial profiles. That could be
   modeled in an authorization framework by specifying that users only have
   access to their own profiles, and then querying the framework before display.
   But there's a better way: Make it impossible. We simply don't have an
   endpoint to allow somebody to request another user's information. The most
   secure code is the code that never got written. Authorization Through
   Navigability Most things that could be described as authorization rules
   emerge naturally from relationships. For instance, if I'm co-owner of a joint
   account opened by my spouse, then I am allowed to see that account. Rather
   than add another layer of indirection, we simply rely on our data model, and
   only expose data that can be reached through the app's natural relationships.
   The app can’t even locate data that should be inaccessible. Authorization
   Through Application Boundaries Many arguments for heavyweight authorization
   arise from administrative access. What if a Customer Support representative
   needs the ability to help a customer make a change to an account? Shouldn’t
   there be a simple override available to her? No, at least not within the same
   app. Each application should have a single audience. For instance, our
   consumer-facing app lets customers view and manage their investments. Our
   internal Customer Support app allows our Customer Support representatives to
   look up the accounts of customers they’re assisting. Our Ops app gives our
   broker-dealer operations team the tools to monitor risk systems and manage
   transactions. This isn’t just a boon for security—it's better software. Each
   app is built for a specific team with exactly the tools and information it
   needs. But Sometimes You Need a Framework The real world is complicated. At
   Betterment, where we’re approaching 200 employees across many disciplines, we
   know this well. Some tasks require a senior team member. Some trainees only
   need limited access to a system. As an engineering organization, you could
   build a new app for every single title and level within your company, but
   it'd be confusing for team members whose jobs are more similar than they are
   different, and mind-bogglingly expensive to maintain. How do you move forward
   without going whole-hog on heavyweight authorization? By setting a few ground
   rules for ourselves, we were able to design a lightweight, auditable, and
   intuitive approach to authorization that has scaled with our team and stayed
   dead simple. Here were the rules we followed: 1. Privilege Levels Are Named
   After the People Who Use the Software As Phil Karlton once said, naming
   things is one of the two hard things in computer science. Software is built
   by people, for people. To tend toward security over the long term, the names
   we use must be intuitive to both the engineers building the software and its
   users. For instance, our Customer Support app has the levels trainee, staff,
   and manager. As the organization grows and matures, internal jargon will
   change too. It's crucial to make sure these names remain meaningful, updating
   them as needed. 2. Privilege Levels Are Linear Once you've built a separate
   app for each audience, you don't need to support multiple orthogonal roles—a
   single ladder is enough. In our Customer Support app, staff can do a superset
   of what trainees can do, and managers can do a superset of what staff can do.
   In combination with the naming rule, this means that you can easily add
   levels above, below, or between the existing levels as your team grows
   without rethinking every privilege in the system. Eventually, you may find
   that a single ladder isn’t enough. This is a great opportunity to force the
   conversation about how roles within your team are diverging, and build
   software to match. 3. REST Resources Are the Only Resources, and HTTP Verbs
   Are the Only Actions At their core, all authorization systems determine
   whether a user has permission to perform an action on a resource. Much of the
   complexity in a traditional authorization system comes from defining those
   resources and their relationships to users, usually in terms of database
   entities. RESTful applications have the concepts of resources and actions in
   their DNA, so there's no need to reinvent that wheel. REST doesn't just give
   us the ability to define simple resources like accounts that correspond to
   database tables. We can build up semantic concepts like a search resource to
   enable basic user lookup, and a secure_search resource that allows senior
   team members to query by sensitive details like Social Security number. By
   treating HTTP verbs as our actions, we can easily allow a trainee to GET an
   account but not PATCH it. 4. Authorization Framework Calls Are Simple, and
   Stay in the Controllers and Views If the only way to initiate an action is
   through a REST endpoint, there's no need to add complexity to your business
   logic layer. The authorization framework only has two features: To answer
   whether a user can request a given resource with a given verb. App developers
   use this feature to customize views (e.g., show or hide a button). To abort
   requests for a resource if the answer is no. And that's all you need. Help Us
   Solve the Hard Problems Security is a mindset, philosophy, and practice more
   than a set of tools or solutions, and many challenges lie ahead. If you’d
   like to help Betterment design, build, and spread radically simpler and more
   secure solutions to the hard problems our customers and team face, go ahead
   and apply.
   7 min read


 * THE EVOLUTION OF THE BETTERMENT ENGINEERING INTERVIEW
   
   The Evolution of the Betterment Engineering Interview Betterment’s
   engineering interview now includes a pair programming experience where
   candidates are tested on their collaboration and technical skills. Building
   and maintaining the world’s largest independent robo-advisor requires a
   world-class team of human engineers. This means we must continuously iterate
   on our recruiting process to remain competitive in attracting and hiring top
   talent. As our team has grown impressively from five to more than 50
   engineers (and this was just in the last three years), we’ve significantly
   improved our abilities to make clearer hiring decisions, as well as shortened
   our total hiring timeline. Back in the Day Here’s how our interview process
   once looked: Resumé review Initial phone screen Technical phone screen
   Onsite: Day 1 Technical interview (computer science fundamentals) Technical
   interview (modelling and app design) Hiring manager interview Onsite: Day 2
   Product and design interview Company founder interview Company executive
   interview While this process helped in growing our engineering team, it began
   showing some cracks along the way. The main recurring issue was that hiring
   managers were left uncertain as to whether a candidate truly possessed the
   technical aptitude and skills to justify making them an employment offer.
   While we tried to construct computer science and data modelling problems that
   led to informative interviews, watching candidates solve these problems still
   wasn’t getting to the heart of whether they’d be successful engineers once at
   Betterment. In addition to problems arising from the types of questions
   asked, we saw that one of our primary interview tools, the whiteboard, was
   actually getting in the way; many candidates struggled to communicate their
   solutions using a whiteboard in an interview setting. The last straw for
   using whiteboards came from feedback provided by Betterment’s Women in
   Technology group. When I sat down with them to solicit feedback on our entire
   hiring process, they pointed to the whiteboard problem-solving dynamics (one
   to two engineers sitting, observing, and judging the candidate standing at a
   whiteboard) as unnatural and awkward. It was clear this part of the
   interviewing process needed to go. We decided to allow candidates the choice
   of using a whiteboard if they wished, but it would no longer be the default
   method for presenting one’s skills. If we did away with the whiteboard, then
   what would we use? The most obvious alternative was a computer, but then many
   of our engineers expressed concerns with this method, having had bad
   experiences with computer-based interviews in the past. After spirited
   internal discussions we landed on a simple principle: We should provide
   candidates the most natural setting possible to demonstrate their abilities.
   As such, our technical interviews switched from whiteboards to computers.
   Within the boundaries of that principle, we considered multiple interview
   formats, including take-home and online assessments, and several variations
   of pair programming interviews. In the end, we landed on our own flavor of a
   pair programming interview. Today: A Better Interview Here’s our revised
   interview process: Resumé review Initial phone screen Technical phone screen
   Onsite: Technical interview 1 Ask the candidate to describe a recent
   technical challenge in detail Set up the candidate’s laptop Introduce the
   pair programming problem and explore the problem Pair programming (optional,
   time permitting) Technical interview 2 Pair programming Technical interview 3
   Pair programming Ask-Me-Anything session Product and design interview Hiring
   manager interview Company executive interview While an interview setting may
   not offer pair programming in its purest sense, our interviewers truly
   participate in the process of writing software with the candidates. Instead
   of simply instructing and watching candidates as they program, interviewers
   can now work with them on a real-world problem, and they take turns in
   control of the keyboard. This approach puts candidates at ease, and feels
   closer to typical pair programming than one might expect. As a result, in
   addition to learning how well a candidate can write code, we learn how well
   they collaborate. We also split the main programming portion of our original
   interview into separate sections with different interviewers. It’s nice to
   give candidates a short break in between interviews, but the main reason for
   the separation is to evaluate the handoff. We like to evaluate how well a
   candidate explains the design decisions and progress from one interviewer to
   the next. Other Improvements We also streamlined our question-asking process
   and hiring timeline, and added an opportunity for candidates to speak with
   non-interviewers. Questions Interviews are now more prescriptive regarding
   non-technical questions. Instead of multiple interviewers asking a candidate
   about the same questions based on their resumé, we prescribe topics based on
   the most important core competencies of successful (Betterment) engineers.
   Each interviewer knows which competencies (e.g., software craftsmanship) to
   evaluate. Sample questions, not scripts, are provided, and interviewers are
   encouraged to tailor the competency questions to the candidates based on
   their backgrounds. Timeline Another change is that the entire onsite
   interview is completed in a single day. This can make scheduling difficult,
   but in a city as competitive as New York is for engineering talent, we’ve
   found it valuable to get to the final offer stage as quickly as possible.
   Discussion Finally, we’ve added an Ask-Me-Anything (AMA) session—another idea
   provided by our Women in Technology group. While we encourage candidates to
   ask questions of everyone they meet, the AMA provides an opportunity to meet
   with a Betterment engineer who has zero input on whether or not to hire them.
   Those “interviewers” don’t fill out a scorecard, and our hiring managers are
   forbidden from discussing candidates with them. Ship It Our first run of this
   new process took place in November 2015. Since then, the team has met several
   times to gather feedback and implement tweaks, but the broad strokes have
   remained unchanged. As of July 2016, all full-stack, mobile, and
   site-reliability engineering roles have adopted this new approach. We’re
   continually evaluating whether to adopt this process for other roles, as
   well. Our hiring managers now report that they have a much clearer
   understanding of what each candidate brings to the table. In addition, we’ve
   consistently received high marks from candidates and interviewers alike, who
   prefer our revamped approach. While we didn’t run a scientifically valid
   split-test for the new process versus the old (it would’ve taken years to
   reach statistical significance), our hiring metrics have improved across the
   board. We’re happy with the changes to our process, and we feel that it does
   a great job of fully and honestly evaluating a candidate’s abilities, which
   helps Betterment to continue growing its world-class team. For more
   information about working at Betterment, please visit our Careers page. More
   from Betterment: Server Javascript: A Single-Page App To…A Single-Page App
   Going to Work at Betterment Engineering at Betterment: Do You Have to Be a
   Financial Expert? Determination of largest independent robo-advisor reflects
   Betterment LLC’s distinction of having highest number of assets under
   management, based on Betterment’s review of assets self-reported in the SEC’s
   Form ADV, across Betterment’s survey of independent robo-advisor investing
   services as of March 15, 2016. As used here, “independent” means that a
   robo-advisor has no affiliation with the financial products it recommends to
   its clients.
   6 min read


 * SERVER JAVASCRIPT: A SINGLE-PAGE APP TO…A SINGLE-PAGE APP
   
   Server JavaScript: A Single-Page App To…A Single-Page App Betterment
   engineers recently migrated a single-page backbone app to a server-driven
   Rails experience. Betterment engineers (l-r): Arielle Sullivan, J.P.
   Patrizio, Harris Effron, and Paddy Estridge We recently changed the way we
   organize our major business objects. All the new features we’re working on
   for customers with multiple accounts—be they Individual Retirement Accounts
   (IRAs), taxable investment accounts, trusts, joint accounts, or even synced
   outside accounts—required this change. We were also required to rename
   several core concepts, and make some big changes to the way we display data
   to our customers. Currently, our Web application is a JavaScript single-page
   app that uses a frontend MVC framework, backed by a JSON api. We use
   Marionette.js, a framework built on top of Backbone.js, to help us organize
   our JavaScript and manage page state. It was built out over the past few
   years, with many different paradigms and patterns. After some time, we found
   ourselves with an application that had a lot of complexity and splintered
   code practices throughout. The complexity partly arose from the fact that we
   needed to duplicate business logic from the backend and the frontend. By only
   using the server as a JSON API, the frontend needed to know exactly what to
   do with that JSON. It needed to be able to organize the different server
   endpoints (and its data) into models, as well as know how to take those
   models and render them into views. For example, a core concept such as “an
   account has some money in it” needed to be separately represented in the
   frontend codebase, as well as the server. This led to maintenance issues, and
   it made our application harder to test. The additional layer of frontend
   complexity made it even harder for new hires to be productive from day one.
   When we first saw this project on the horizon, we realized it would end up
   requiring a substantial refactor of our web app. We had a few options:
   Rewrite the JavaScript in a way that makes it simpler and easier to use.
   Don’t rewrite JavaScript. We went with option 2. Instead of using a client
   side MVC framework to help enable us to write a single page app, we opted to
   use our Rails server to render views, and we used server generated JavaScript
   responses to make the app feel just as snappy for our customers. We achieved
   the same UX wins as a single page app with a fraction of the code. Method to
   the Madness The crux of our new pattern is this: We use Rails’ unobtrusive
   JavaScript (ujs) library to declare that forms and links should be submitted
   using AJAX. Our server then gets an AJAX rest request as usual, but instead
   of rendering the data as JSON, it responds to the request with a snippet of
   JavaScript. That JavaScript gets evaluated by the browser. The “trick” here
   is that JavaScript is a simple call to jQuery’s html method, and we use
   Rails’ built-in partial view rendering to respond with all the HTML we need.
   Now, the frontend just needs to blindly listen to the server, and render the
   HTML as instructed. An Example As a simple example, let’s say we want to edit
   a user’s home address. Using the JavaScript single page app framework, we
   would need a few things. First, we want an address model, which we map to our
   “/addresses” endpoint. Next, we need a View, that represents our form for
   editing the address. We need a frontend template for that view. Then, we need
   a route in our frontend for navigating to this page. And for our server, we
   need to add a route, a controller, a model, and a jbuilder to render that
   model as JSON. A Better Way With our new paradigm, we can skip most of this.
   All we need is the server. We still have our route, controller, and model,
   but instead of a jbuilder for returning JSON, we can port our template to
   embedded Ruby, and let the server do all the work. Using UJS patterns, our
   view can live completely on the server. There are a few major wins here:
   Unifying our business logic. The server is responsible for knowing about (1)
   our data, (2) how to wrap that data into rich domain models that own our
   business logic, (3) how to render those models into views, and (4) how to
   render those views on the page. The client needs to know almost nothing. Less
   JavaScript. We aren’t getting rid of all the JavaScript in our application.
   Certain snappy user experience elements don’t work as well without
   JavaScript. Interactive elements, some delightful animations, and other
   frontend behaviors still need it. For these things, we are using HTML data
   elements to specify behaviors. For example, we can tag an element with a
   data-behavior-dropdown, and then we have some simple, well organized global
   JavaScript that knows how to wrap that element in some code that makes it
   more interactive. We are hoping that by using these patterns, we can limit
   our use of JavaScript to only know about how to enhance HTML, not how
   to automatically calculate net income when trying to distribute excess tax
   year contributions from an IRA (something that our frontend JavaScript used
   to know how to do). We can do this migration in small pieces. Even with this
   plan, migrating a highly complex web application isn’t easy. We decided to
   tackle it using a tab-by-tab approach. We’ve written a few useful helpers
   that allow us to easily plug in our new server-driven style into our existing
   Marionette application. By doing this piecemeal, we are hoping to bake in
   useful patterns early on, which we can iterate and use to make migrating the
   next part even simpler. If we do this right, we will be able to swap
   everything to a normal Rails app with minimal effort. Once we migrate to
   Rails 5, we should even be able to easily take advantage of Turbolinks 3,
   which is a conventionalized way to do regional AJAX updates. This new pattern
   will make building out newer and even more sophisticated features easier, so
   we can focus on encapsulating the business logic once. Onboarding new hires
   familiar with the Rails framework will be faster, and those who aren’t
   familiar can find great external (and internal) resources to learn it. We
   think that our Web app will be just as pleasant to use, and we can more
   quickly enhance and build new features going forward.
   6 min read


 * MODERN DATA ANALYSIS: DON’T TRUST YOUR SPREADSHEET
   
   Modern Data Analysis: Don’t Trust Your Spreadsheet To conduct research in
   business, you need statistical computing that you easily reproduce, scale,
   and make accessible to many stakeholders. Just as the Ford Motor Company
   created efficiency with assembly line production and Pixar opened up new
   worlds by computerizing animation, companies now are innovating and improving
   the craft of using data to do business. Betterment is one of them. We are
   built from the ground up on a foundation of data. It’s only been about three
   decades since companies started using any kind of computer-assisted data
   analysis. The introduction of the spreadsheet defined the beginning of the
   business analytics era, but the scale and complexity of today’s data has
   outgrown that origin. To avoid time-consuming manual processes, and the human
   error typical of that approach, analytics has become a programming
   discipline. Companies like Betterment are hiring data scientists and analysts
   who use software development techniques to reliably answer business questions
   which have quickly expanded in scale and complexity. To do good data work
   today, you need to use a system that is reproducible, versionable, scalable,
   and open. Our analytics and data science team at Betterment uses these data
   best practices to quickly produce reliable and sophisticated insights to
   drive product and business decisions. A Short History of Data in Business
   First, a step back in the business time machine. With VisiCalc, the
   first-ever spreadsheet program, in 1979 and Excel in 1987, the business world
   stepped into two new eras in which any employee could manage large amounts of
   data. The bottlenecks in business analytics had been the speed of human
   arithmetic or the hours available on corporate mainframes operated by only a
   few specialists. With spreadsheet software in every cubicle, analytical
   horsepower was commoditized and Excel jockeys were crowned as the arbiters of
   truth in business. But the era of the spreadsheet is over. The data is too
   large, the analyses are too complex, and mistakes are too dangerous to trust
   to our dear old friend the spreadsheet. Ask Carmen Reinhart and Kenneth
   Rogoff, two Harvard economists who published an influential paper on
   sovereign debt and economic growth, only to find out that the results rested
   in part on the accidental omission of five cells from an average. Or ask the
   execs at JPMorgan who lost $6 billion in the ‘London Whale’ trading debacle,
   also due in part of poor data practices in Excel. More broadly, a 2015 survey
   of large businesses in the UK reported that 17% had experienced direct
   financial losses because of spreadsheet errors. It’s a new era with a new
   scale of data, and it’s time to define new norms around management of and
   inferences from business data. Requirements for Modern Data Analysis
   Spreadsheets fundamentally lack these properties essential to modern data
   work. To do good data work today, you need to use a system that is:
   Reproducible It’s not personal, but I don’t trust any number that comes
   without supporting code. That code should take me from the raw data to the
   conclusions. Most analyses contain too many important detailed steps to
   plausibly communicate in an email or during a meeting. Worse yet, it’s
   impossible to remember exactly what you’ve done in a point and click
   environment, so doing it the same way again next time is a crap shoot.
   Reproducible also means efficient. When an input or an assumption changes, it
   should be as easy as re-running the whole thing. Versionable Code versioning
   frameworks, such as git, are now a staple in the workflow of most technical
   teams. Teams without versioning are constantly asking questions like, “Did
   Jim send the latest file?”, “Can I be sure that my teammate selected all
   columns when he re-sorted?”, or “The bottom line numbers are different in
   this report; what exactly changed since the first draft?” These
   inefficiencies in collaboration and uncertainties about the calculations can
   be deadly to a data team. Sharing code in a common environment also enables
   the reuse of modular analysis components. Instead of four analysts all
   inventing their own method for loading and cleaning a table of users, you can
   share as a group the utils/LoadUsers() function and ensure you are talking
   about the same people at every meeting. Scalable There are hard technical
   limits to how large an analysis you can do in a spreadsheet. Excel 2013 is
   capped at just more than 1 million rows. It doesn’t take a very large
   business these days to collect more than 1 million observations of customer
   interactions or transactions. There are also feasibility limits. How long
   does it take your computer to open a million row spreadsheet?  How likely is
   it that you’ll spot a copy-paste error at row 403,658? Ideally, the same
   tools you build to understand your data when you’re at 10 employees should
   scale and evolve through your IPO. Open Many analyses meet the above ideals
   but have been produced with expensive, proprietary statistical software that
   inhibits sharing and reproducibility. If I do an analysis with open-source
   tools like R or Python, I can post full end-to-end instructions that anyone
   in the world can reproduce, check, and expand upon. If I do the same in SAS,
   only people willing to spend $10,000 (or more if particular modules are
   required) can review or extend the project. Platforms that introduce
   compatibility problems between versions and save their data in proprietary
   formats may limit access to your own work even if you are paying for the
   privilege. This may seem less important inside a corporate bubble where
   everyone has access to the same proprietary platform, but it is at the very
   least a turnoff to most new talent in the field. I don’t hear anyone saying
   that expensive proprietary data solutions are the future. What to Use, and
   How Short answer: R or Python. Longer answer: Here at Betterment, we use
   both. We use Python more for data pipeline processes and R more for modeling,
   analyses, and reporting. But this article is not about the relative merits of
   these popular modern solutions. It is about the merits of using one of them
   (or any of the smaller alternatives). To get the most out of a programmatic
   data analysis workflow, it should be truly end-to-end, or as close as you can
   get in your environment. If you are new to one or both of these environments,
   it can be daunting to sort through all of the tools and figure out what does
   what. These are some of the most popular tools in each language organized by
   their layer in your full-stack analysis workflow: Full Stack Analysis R
   Python Environment RStudio iPython / Jupyter, PyCharm Sourcing Data RMySQL,
   rpostgresql, rvest, RCurl, httr MySQLdb, requests, bs4 Cleaning, Reshaping
   and Summarizing data.table, dplyr pandas Analysis, Model Building, Learning
   see CRAN Task Views NumPy, SciPy, Statsmodels, Scikit-learn Visualization
   ggplot2, ggvis, rCharts matplotlib, d3py, Bokeh Reporting RMarkdown, knitr,
   shiny, rpubs IPython notebook Sourcing Data If there is any ambiguity in this
   step, the whole analysis stack can collapse on the foundation. It must be
   precise and clear where you got your data, and I don’t mean conversationally
   clear. Whether it’s a database query, a Web-scraping function, a MapReduce
   job, or a PDF extraction, script it and include it in your reproducible
   process. You’ll thank yourself when you need to update the input data, and
   your successors and colleagues will be thankful they know what you’re basing
   your conclusions on. Cleaning, Reshaping, Summarizing Every dataset includes
   some amount of errant, corrupted, or outlying observations. A good analysis
   excludes them based on objective rules from the beginning and then tests for
   sensitivity to these exclusions later. Dropping observations is also one of
   the easiest ways for two people doing similar analyses to reach different
   conclusions. Putting this process in code keeps everyone accountable and
   removes ambiguity about how the final analysis set was reached. Analysis,
   Model Building, Learning You’ll probably only present one or two of the
   scores of models and variants you build and test. Develop a process where
   your code organizes and saves these variants rather than discarding the ones
   that didn’t work. You never know when you’ll want to circle back. Try to
   organize analyses in a structure similar to how you present them so that the
   connection from claims to details is easy to make. Visualization, Reporting
   Careful, a trap is looming. So many times, the chain of reproducibility is
   broken right before the finish line when plots and statistical summaries are
   copied onto PowerPoint slides. Doing so introduces errors, breaks the link
   between claims and process, and generates huge amounts of work in the
   inevitable event of revisions. R and Python both have great tools to produce
   finished reports as static HTML or PDF documents, or even interactive
   reporting and visualization products. It might take some time to convince the
   rest of your organization to receive reports in these more modern formats.
   Moving your organization towards these ideals is likely to be an imperfect
   and gradual process. If you’re the first convert, absolutism is probably not
   the right approach. If you have influence in the hiring process, try to push
   for candidates who understand and respect these principles of data science.
   In the near term, look for smaller pieces of the analytical workflow which
   would benefit especially from the efficiencies of reproducible, programmatic
   analysis and reporting. Good candidates are reports that are updated
   frequently, require extensive collaboration, or are constantly hung up on
   discussions over details of implementation or interpretation. Changing
   workflows and acquiring new skills is always an investment, but the dividends
   here are better collaboration, efficient iteration, transparency in process
   and confidence in the claims and recommendations you make.  It’s worth it.
   9 min read


 * ENGINEERING AT BETTERMENT: DO YOU HAVE TO BE A FINANCIAL EXPERT?
   
   Engineering at Betterment: Do You Have to Be a Financial Expert? When I
   started my engineering internship at Betterment, I barely knew anything about
   finance. By the end of the summer, I was working on a tool to check for money
   launderers and fraudsters. Last summer, I built an avatar creator for K-12
   students. Now, a year later, I’m working on a tool to check for money
   launderers and fraudsters. How did I go from creating avatars with Pikachu
   ears to improving detection of financial criminals? Well, it was one part
   versatility of software engineering, one part courage to work in an industry
   I knew nothing about, and a dash of eagerness to learn as much as I could. I
   was on the verge of taking another internship in educational technology,
   commonly referred to as ‘edtech.’ But when I got the opportunity to work at
   Betterment, a rapidly growing company, I had to take it. Before my
   internship, finance, to me, was a field in which some of my peers would work
   more hours than I had hours of consciousness. Definitely not my cup of tea. I
   knew I didn’t want to work at a big bank, but I did want to learn more about
   the industry that employed 16.6% of my classmates at Yale. The name
   Betterment jumped out at me on a job listings page because it sounded like it
   would make my life ‘better.’ Betterment is a financial technology, or
   ‘fintech,’ company; while it provides financial services, it’s an engineering
   company at its core. Working here offered me the opportunity to learn about
   finance while still being immersed in tech startup culture. I was nervous to
   work in an industry I knew nothing about. But I soon realized it was just the
   opposite: Knowing less about finance motivated me to learn—quickly. When I
   started working at Betterment, I barely knew anything about finance. I
   couldn’t tell you what a dividend was. I didn’t know 401(k)s were
   employer-sponsored. My first task involved DTC participants, CUSIPs, and
   ACATS—all terms that I’d never heard before. (For the record, they stand for
   The Depository Trust Company, Committee on Uniform Security Identification
   Procedures, and Automated Customer Account Transfer Service, respectively.) A
   few days into my internship, I sat through a meeting about traditional and
   Roth IRAs wondering, what does IRA stand for? The unfortunate thing is that
   this is common for people my age. Personal finance is not something many
   college students think about—partially because it’s not taught in school and
   partially because we don’t have any money to worry about anyway. (Besides, no
   one wants to be an adult, right?) As a result, only 26% of 20-somethings have
   any money invested in stocks. At first, I thought my lack of exposure to
   finance put me at a disadvantage. I was nervous to work in an industry I knew
   nothing about. But I soon realized it was just the opposite: Knowing less
   about finance motivated me to learn—quickly. I started reading Robert
   Shiller’s Finance and the Good Society, a book my dad recommended to me
   months earlier. I searched every new term I came across and, when that wasn’t
   enough, asked my co-workers for help. Many of them took the time to draw
   diagrams and timelines to accompany their explanations. Soon enough, I had
   not only expanded my knowledge of engineering best practices, but I learned
   about dividends, tax loss harvesting, and IRAs (it stands for individual
   retirement account, in case you were wondering). The friendly atmosphere at
   Betterment and the helpfulness of the people here nurtured my nascent
   understanding of finance and turned me into someone who is passionate about
   investing. Before working at Betterment, I didn’t think finance was relevant
   to me. It took eight hours a day of working on a personal finance product for
   me to notice that the iceberg was even there. Now, I know that my money
   (well, the money I will hopefully have in the future) ideally should work
   hard for me instead of just sitting in a savings account. Luckily, I won’t
   have to struggle with building an investment portfolio or worry about
   unreasonable fees. I’ll just use Betterment.
   4 min read


 * WOMEN WHO CODE: AN ENGINEERING Q&A WITH VENMO
   
   Women Who Code: An Engineering Q&A with Venmo Betterment recently hosted a
   Women in Tech meetup with Venmo developer Cassidy Williams, who spoke about
   impostor syndrome. Growing up, I watched my dad work as an electrical
   engineer. Every time I went with him on Take Your Child to Work Day, it
   became more and more clear that I wanted to be an engineer, too. In 2012, I
   graduated from the University of Portland with a degree in computer science
   and promptly moved to the Bay Area. I got my first job at Intel, where I
   worked as a Scala developer. I stayed there for several years until last May,
   when I uprooted my life to New York for Betterment, and I haven’t looked back
   since. As an engineer, I not only love building products from the ground up,
   but I’m passionate about bringing awareness to diversity in tech, an
   important topic that has soared to the forefront of social justice issues.
   People nationwide have chimed in on the conversation. Most recently, Isis
   Wenger, a San Francisco-based platform engineer, sparked the
   #ILookLikeAnEngineer campaign, a Twitter initiative designed to combat gender
   inequality in tech. At Betterment, we’re working on our own set of
   initiatives to drive the conversation. We’ve started an internal roundtable
   to voice our concerns about gender inequality in the workplace, we’ve
   sponsored and hosted Women in Tech meetups, and we’re starting to collaborate
   with other companies to bring awareness to the issue. Cassidy Williams, a
   software engineer at mobile payments company Venmo, recently came in to
   speak. She gave a talk on impostor syndrome, a psychological phenomenon in
   which people are unable to internalize their accomplishments. The phenomenon,
   Williams said, is something that she has seen particularly among
   high-achieving women—where self-doubt becomes an obstacle for professional
   development. For example, they think they’re ‘frauds,’ or unqualified for
   their jobs, regardless of their achievements. Williams’ goal is to help women
   recognize the characteristic and empower them to overcome it. Williams has
   been included as one of Glamour Magazine's 35 Women Under 35 Who Are Changing
   the Tech Industry and listed in the Innotribe Power Women in FinTech Index.
   As an engineer myself, I was excited to to speak with her after the event
   about coding, women in tech,  and fintech trends. Cassidy Williams, Venmo
   engineer, said impostor syndrome tends to be more common in high-achieving
   women. Photo credit: Christine Meintjes Abi: Can you speak about a time in
   your life where ‘impostor syndrome’ was limiting in your own career? How did
   you overcome that feeling? Cassidy: For a while at work, I was very nervous
   that I was the least knowledgeable person in the room, and that I was going
   to get fired because of it. I avoided commenting on projects and making
   suggestions because I thought that my insight would just be dumb, and not
   necessary. But at one point (fairly recently, honestly), it just clicked that
   I knew what I was doing. Someone asked for my help on something, and then I
   discussed something with him, and suddenly I just felt so much more secure in
   my job. Can you speak to some techniques that have personally proven
   effective for you in overcoming impostor syndrome? Asking questions,
   definitely. It does make you feel vulnerable, but it keeps you moving
   forward. It's better to ask a question and move forward with your problem
   than it is to struggle over an answer. As a fellow software engineer, I can
   personally attest to experiencing this phenomenon in tech, but I’ve also
   heard from friends and colleagues that it can be present in non-technical
   backgrounds, as well. What are some ways we can all work together to empower
   each other in overcoming imposter syndrome? It's cliché, but just getting to
   know one another and sharing how you feel about certain situations at work is
   such a great way to empower yourself and empower others. It gets you both
   vulnerable, which helps you build a relationship that can lead to a stronger
   team overall. Whose Twitter feed do you religiously follow? InfoSec Taylor
   Swift. It's a joke feed, but they have some great tech and security points
   and articles shared there. In a few anecdotes throughout your talk, you
   mentioned the importance of having mentors and role models. Who are your
   biggest inspirations in the industry? Jennifer Arguello - I met Jennifer at
   the White House Tech Inclusion Summit back in 2013, where we hit it off
   talking about diversity in tech and her time with the Latino Startup
   Alliance. I made sure to keep in touch because I would be interning in the
   Bay Area, where she’s located, and we’ve been chatting ever since. Kelly Hoey
   - I met Kelly at a women in tech hackathon during my last summer as a student
   in 2013, and then she ended up being on my team on the British Airways
   UnGrounded Thinking hackathon. She and I both live in NYC now, and we see
   each other regularly at speaking engagements and chat over email about
   networking and inclusion. Rane Johnson - I met Rane at the Grace Hopper
   Celebration for Women in Computing in 2011, and then again when I interned at
   Microsoft in 2012. She and I started emailing and video chatting each other
   during my senior year of college, when I started working with her on the Big
   Dream Documentary and the International Women’s Hackathon at the USA Science
   and Engineering Festival. Ruthe Farmer - I first met Ruthe back in 2010
   during my senior year of high school when I won the Illinois NCWIT
   Aspirations Award. She and I have been talking with each other at events and
   conferences and meetups (and even just online) almost weekly since then about
   getting more girls into tech, working, and everything in between. One of the
   things we chatted about after the talk was how empowering it is to have the
   resources and movements of our generation to bring more diversity to the tech
   industry. The solutions that come out of that awareness are game-changing.
   What are some specific ways in which companies can contribute to these
   movements and promote a healthier and more inclusive work culture?   Work
   with nonprofits: Groups like NCWIT, the YWCA, the Anita Borg Institute, the
   Scientista Foundation, and several others are so great for community outreach
   and company morale. Educate everyone, not just women and minorities: When
   everyone is aware and discussing inclusion in the workplace, it builds and
   maintains a great company culture. Form small groups: People are more open to
   talking closely with smaller groups than a large discussion roundtable.
   Building those small, tight-knit groups promotes relationships that can help
   the company over time. It’s a really exciting time to be a software engineer,
   especially in fintech. What do you think are the biggest trends of our time
   in this space? Everyone's going mobile! What behavioral and market shifts can
   we expect to see from fintech in the next five to 10 years? I definitely
   think that even though cash is going nowhere fast, fewer and fewer people
   will ever need to make a trip to the bank again, and everything will be on
   our devices. What genre of music do you listen to when you’re coding? I
   switch between 80s music, Broadway show tunes, Christian music, and classical
   music. Depends on my feelings about the problem I'm working on. ;) IDE of
   choice? Vim! iOS or Android? Too tough to call.
   7 min read


 * HOW WE BUILT BETTERMENT'S RETIREMENT PLANNING TOOL IN R AND JAVASCRIPT
   
   How We Built Betterment's Retirement Planning Tool in R and JavaScript
   Engineering Betterment’s new retirement planning tool meant finding a way to
   translate financial simulations into a delightful Web experience. In this
   post, we’ll dive into some of the engineering that took place to build
   RetireGuide™ and our strategy for building an accurate, responsive, and
   easy-to-use advice tool that implements sophisticated financial calculations.
   The most significant engineering challenge in building RetireGuide was
   turning a complex, research-driven financial model into a personalized Web
   application. If we used a research-first approach to build RetireGuide, the
   result could have been a planning tool that was mathematically sound but hard
   for our customers to use. On the other hand, only thinking of user experience
   might have led to a beautiful design without quantitative substance. At
   Betterment, our end goal is to always combine both. Striking the right
   balance between these priorities and thoroughly executing both is paramount
   to RetireGuide’s success, and we didn’t want to miss the mark on either
   dimension. Engineering Background RetireGuide started its journey as a set of
   functions written in the R programming language, which Betterment’s
   investment analytics team uses extensively for internal research. The team
   uses R to rapidly prototype financial simulations and visualize the results,
   taking advantage of R’s built-in statistical functions and broad set of
   pre-built packages. The investment analytics team combined their R functions
   using Shiny, a tool for building user interfaces in R, and released
   Betterment’s IRA calculator as a precursor to RetireGuide. The IRA calculator
   runs primarily in R, computing its advice on a Shiny server. This interactive
   tool was a great start, but it lives in isolation, away from the holistic
   Betterment experience. The calculator focuses on just one part of the broader
   set of retirement calculations, and doesn’t have the functionality to
   automatically import customers’ existing information. It also doesn’t assist
   users in acting on the results it gives. From an engineering standpoint, the
   end goal was to integrate much of the original IRA calculator’s code, plus
   additional calculations, into Betterment’s Web application to create
   RetireGuide as a consumer-facing tool. The result would let us offer a
   permanent home for our retirement advice that would be “always on” for our
   end customers. However, to complete this integration, we needed to migrate
   the entire advice tool from our R codebase into the Betterment Web
   application ecosystem. We considered two approaches: (1) Run the existing R
   code directly server-side, or (2) port our R code to JavaScript to integrate
   it into our Web application. Option 1: Continue Running R Directly Our first
   plan was to reuse the research code in R and let it continue to run
   server-side, building an API on top of the core functions. While this
   approach enabled us to reuse our existing R code, it also introduced lag and
   server performance concerns. Unlike our original IRA calculator, RetireGuide
   needed to follow the core product principles of the Betterment experience:
   efficiency, real-time feedback, and delight. Variable server response times
   do not provide an optimal user experience, especially when performing
   personalized financial projections. Customers looking to fine-tune their
   desired annual savings and retirement age in real time would have to wait for
   our server to respond to each scenario—those added seconds become noticeable
   and can impair functionality. Furthermore, because of the CPU-intensive
   nature behind our calculations, heavy bursts of simultaneous customers could
   compromise a given server’s response time. While running R server-side is a
   win on code-reuse, it’s a loss on scalability and user experience. Even
   though code reuse presented itself as a win, the larger concerns behind user
   experience, server lag, and new infrastructure overhead motivated us to
   rethink our approach, prioritizing the user experience and minimizing
   engineering overhead. Option 2: Port the R Code to JavaScript Because our Web
   application already makes extensive use of JavaScript, another option was to
   implement our R financial models in JavaScript and run all calculations
   client-side, on the end user’s Web browser. Eliminating this potential server
   lag solved both our CPU-scaling and usability concerns. However,
   reimplementing our financial models in a very different language exposed a
   number of engineering concerns. It eliminated the potential for any code
   reuse and meant it would take us longer to implement. However, in keeping
   with the company mission to provide smarter investing, it was clear that
   re-engineering our code was essential to creating a better product. Our
   process was heavily test-driven, during which product engineering
   reimplemented many of the R tests in JavaScript, understood the R code’s
   intent, and ported the code while modifying for client-side performance wins.
   Throughout the process, we identified several discrepancies between
   JavaScript and R function outputs, so we regularly reconciled the
   differences. This process added extra validation, testing, and optimizations,
   helping us to create the most accurate advice in our end product. The cost of
   maintaining a separate codebase is well worth the benefits to our customers
   and our code quality. A Win for Customers and Engineering Building
   RetireGuide—from R to JavaScript—helped reinforce the fact that no
   engineering principle is correct in all cases. While optimizing for code
   reuse is generally desirable, rewriting our financial models in JavaScript
   benefited the product in two noticeable ways: It increased testing and
   organizational understanding. Rewriting R to JavaScript enabled knowledge
   sharing and further code vetting across teams to ensure our calculations are
   100% accurate. It made an optimal user experience possible. Being able to run
   our financial models within our customers’ Web browsers ensures an instant
   user experience and eliminates any server lag or CPU-concerns.
   5 min read


 * MEET BLAZER: A NEW OPEN-SOURCE PROJECT FROM BETTERMENT (VIDEO)
   
   Meet Blazer: A New Open-Source Project from Betterment (video) While we love
   the simplicity and flexibility of Backbone, we’ve recently encountered
   situations where the Backbone router didn’t perfectly fit the needs of our
   increasingly sophisticated application. To meet these needs, we created
   Blazer, an extension of the Backbone router. We created an open-source
   project called Blazer to work as an extension of the Backbone router. All
   teams at Betterment are responsible for teasing apart complex financial
   concepts and then presenting them in a coherent manner, enabling our
   customers to make informed financial decisions. One of the tools we use to
   approach this challenge on the engineering team is a popular Javascript
   framework called Backbone. While we love the simplicity and flexibility of
   Backbone, we’ve recently encountered situations where the Backbone router
   didn’t perfectly fit the needs of our increasingly sophisticated application.
   To meet these needs, we created Blazer, an extension of the Backbone router.
   In the spirit of open-source software, we are sharing Blazer with the
   community. To learn more, we encourage you to watch the below video featuring
   Betterment’s Sam Moore, a lead engineer, who reveals the new framework at a
   Meetup in Betterment’s NYC offices. Take a look at Blazer.
   https://www.youtube.com/embed/F32QhaHFn1k
   2 min read


 * DEALING WITH THE UNCERTAINTY OF LEGACY CODE
   
   Dealing With the Uncertainty of Legacy Code To complete our portfolio
   optimization, we had to tackle a lot of legacy code. And then we applied our
   learnings going forward. Last fall, Betterment optimized its portfolio,
   moving from the original platform to an upgraded trading platform that
   included more asset classes and the ability to weight exposure of each asset
   class differently for every level of risk. For Betterment engineers, it meant
   restructuring the underlying portfolio data model for increased flexibility.
   For our customers, it should result in better expected, risk-adjusted returns
   for investments. However, as our data model changed, pieces of the trading
   system also had to change to account for the new structure.  While most of
   this transition was smooth, there were a few cases where legacy code slowed
   our progress. To be sure, we don't take changing our system lightly. While we
   want to iterate rapidly, we never compromise the security of our customers
   nor the correctness of our code. For this reason, we have a robust testing
   infrastructure and only peer-reviewed, thoroughly-tested code gets pushed
   through to production. What is legacy code? While there are plenty of
   metaphors and ways to define legacy code, it has this common feature: It’s
   always tricky to work with it. The biggest problem is that sometimes you're
   not always sure the original purpose of older code. Either the code is poorly
   designed, the code has no tests around it to specify its behavior, or both.
   Uncertainty like this makes it hard to build new and awesome features into a
   product. Engineers' productivity and happiness decrease as even the smallest
   tasks can be frustrating and time-consuming.  Thus, it’s important for
   engineers to do two things well: (a) be able to remove existing legacy code
   and (b) not to write code that is likely to become legacy code in the future.
   Legacy code is a form of technical debt—the sooner it gets fixed, the less
   time it will take to fix in the future. How to remove legacy code During our
   portfolio optimization, we had to come up with a framework for dealing with
   pieces of old code. Here’s what we considered: We made sure we knew its
   purpose.  If the code is not on any active or planned future development
   paths and has been working for years, it probably isn't worth it.  Legacy
   code can take a long time to properly test and remove. We made a good effort
   to understand it.  We talked to other developers who might be more familiar
   with it.  During the portfolio update project, we routinely brought a few
   engineers together to diagram trading system flow on a whiteboard. We wrote
   tests around the methods in question.  It's important to have tests in place
   before changing code to be as confident as possible that the behavior of the
   code is not changing during refactoring. Hopefully, it is possible to write
   unit tests for at least a part of the method's behavior.  Write unit tests
   for a piece of the method, then refactor that piece. Test, repeat, test. Once
   the tests are passing, write more tests for the next piece, and repeat the
   test, refactor, test, refactor process.  Fortunately, we were able to get rid
   of most of the legacy code encountered during the portfolio optimization
   project using this method. Then there are outliers Yet sometimes even the
   best practices still didn’t apply to a piece of legacy code. In fact,
   sometimes it was hard to even know where to start to make changes. In my
   experience, the best approach was to jump in and rewrite a small piece of
   code that was not tested, and then add tests for the rewritten portion
   appropriately. Write characterization tests We also experimented with
   characterization tests. First proposed by Michael Feathers (who wrote the
   bible on working with legacy code) these tests simply take a set of verified
   inputs/outputs from the existing production legacy code and then assert that
   the output of the new code is the same as the legacy code under the same
   inputs. Several times we ran into corner cases around old users, test users,
   and other anomalous data that caused false positive failures in our
   characterization tests.  These in turn led to lengthy investigations that
   consumed a lot of valuable development time. For this reason, if you do write
   characterization tests, we recommend not going too far with them. Handle a
   few basic cases and be done with them.  Get better unit or integration tests
   in place as soon as possible. Build extra time into project estimates Legacy
   code can also be tricky when it comes to project estimates.  It is
   notoriously hard to estimate the complexity of a task when it needs to be
   built into or on top of a legacy system. In our experience, it has always
   taken longer than expected.  The portfolio optimization project took longer
   than initially estimated.  Also, if database changes are part of the project
   (e.g. dropping a database column that no longer makes sense in the current
   code structure), it's safe to assume that there will be data issues that will
   consume a significant portion of developer time, especially with older data.
   Apply the learnings to future The less legacy code we have, the less we have
   to deal with the aforementioned processes.  The best way to avoid legacy code
   is to make a best effort at not writing in the first place. The best way to
   avoid legacy code is to make a best effort at not writing it in the first
   place.   For example, we follow a set of pragmatic design principles drawn
   from SOLID (also created by Michael Feathers) to help ensure code quality.
    All code is peer reviewed and does not go to production if there is not
   adequate test coverage or if the code is not up to design standards.  Our
   unit tests are not only to test behavior and drive good design, but should
   also be readable to the extent that they help document the code itself.  When
   writing code, we try to keep in mind that we probably won't come back later
   and clean up the code, and that we never know who the next person to touch
   this code will be.  Betterment has also established a "debt day" where once
   every month or two, all developers take one day to pay down technical debt,
   including legacy code. The Results It's important to take a pragmatic
   approach to refactoring legacy code.  Taking the time to understand the code
   and write tests before refactoring will save you headaches in the future.
    Companies should strive for a fair balance between adding new features and
   refactoring legacy code, and should establish a culture where thoughtful code
   design is a priority.  By incorporating many of these practices, it is
   steadily becoming more and more fun to develop on the Betterment platform.
   And the Betterment engineering team is avoiding the dreaded productivity and
   happiness suck that happens when working on systems with too much legacy
   code. Interested in engineering at Betterment? Betterment is an
   engineering-driven company that has developed the most-trusted online
   financial advisor based on the principles of optimization and efficiency.
   Learn more about engineering jobs and our culture. Determination of most
   trusted online financial advisor reflects Betterment LLC's distinction of
   having the most customers in the industry, made in reliance on customer
   counts, self-reported pursuant to SEC rules, across all online-only
   registered investment advisors.
   7 min read


 * THIS IS HOW YOU BOOTSTRAP A DATA TEAM
   
   This Is How You Bootstrap a Data Team Data alone is not enough—we needed the
   right storytellers. Six months ago, I packed up my travel-sized toothbrush
   kit, my favorite coffee mug now filled with pens and business cards, and a
   duffel bag full of gym socks and free conference tee-shirts.  With my
   start-up survival kit in tow, it was time to move on from my job as a
   back-office engineer. From the left: Avi Lederman, data warehousing engineer;
   Yuriy Goldman, engineering lead; Jon Mauney, data analyst; Nick Petri, data
   analyst; and Andrew Weisgall, marketing analyst. I dragged my chair ten feet
   across the office and began my new life as the engineering lead of
   Betterment’s nascent data team—my new mates included two talented data
   analysts, a data warehousing engineer and a marketing analyst, also the
   product owner. I was thrilled. There was a lot for us to do. In our new
   roles, we are now informing and guiding many of the ongoing product and
   marketing efforts at Betterment. Thinking big, we decided to dub ourselves
   Team Polaris after the sky's brightest star. Creating a tighter feedback loop
   Even though our move to create an in-house data team was a natural part of
   our own engineering team evolution here at Betterment, it’s still something
   of a risky unknown for most companies. Business intelligence tooling has
   traditionally been something that comes at a great upfront cost to an
   organization (it can reach into the millions of dollars)—but as a startup, we
   instead looked carefully at how we could leverage our homegrown talent and
   resources to build a team to seamlessly integrate into the existing company
   architecture. Specifically, we wanted  a tight feedback loop between the
   business and technology so that we could experiment and figure out what
   worked before committing real dollars to a solution—aka high-frequency
   hypothesis testing. We needed a team responsible for collecting, curating and
   presenting the data—and our data had to be trustworthy for objective
   metric-level reporting to the organization. Our work consisted of
   collaborating with our marketing, analytics, and product teams to establish
   systems and practices that: Measure progress towards high level goals
   Optimize growth and conversion Support product and project strategy Improve
   customer outcome A guide to tactical decisions With these requirements in
   mind, here are some of the tactical decisions we made from the start to get
   our new data team off the ground. In the future, expect to read more from our
   team about how we use our data insights to drive product and growth
   development at Betterment. 1. Define our process For us the obvious first
   order of business was to deliver continuous, incremental value and gradual
   transition from legacy systems to new ones. Our initial task was to interview
   internal stakeholders to get at their data-related pain points.  We sent out
   questionnaires in advance but collected answers through face-to-face
   dialogue. A couple of hours of focused conversation defined a six-month
   tactical focus for the team. Then, with our meticulous notes compiled, it
   became clear to us that our major challenges lay with the accessibility to
   and reliability of key performance metrics. With the interviews in hand, the
   team sat down to pen a manifest and define pillars by which we would measure
   our progress. We came up with ACES: Automated, Consistent, Efficient, and
   Self-serviced as the motifs by which we could create a measurable feedback
   loop. 2. Inform the roadmap Within three weeks of operations, it became clear
   that we could use turn-around time metrics from ad-hoc or advisory requests
   to inform us where we need to invest in project cycles and technology. Yet
   busy with data projects we were feeling the pain ourselves.  We needed more
   easily accessible business measures with sufficient context by which we and
   our colleagues could roll up or slice and dice our data.  We knew that a star
   schema approach would help us clarify a data narrative and give all of us a
   consistent view of truth.  But there was no way for us to do it all at once.
   3. Limit disruption while we build To limit disruption to our colleagues
   while delivering incremental improvements, we implemented a clever and
   completely practical transition plan within MySQL’s native feature set.
    Specifically, we set up a new database server dedicated to reporting and
   ad-hoc workloads.  This dedicated MySQL instance consisted of three database
   schemas we now refer to as our Triumvirate Data Warehouse. The first member
   of this triad is betterment_live. This database is a complete, real-time,
   read-only replica of our production database.  It’s just native MySQL
   master-slave replication; easy to set up and maintain on dedicated hardware
   or in the cloud. The second member is client_analytics.  It is a read-write
   schema to which our colleagues have full privileges.  The usage pattern is
   for folks to connect to client_analytics and from there to: cross-query
   against the betterment_live schema, import/export and manipulate custom
   datasets with Python or R, perform regression and analysis, etc.  Everybody
   wins.  Our data workers retain their ability to run existing processes until
   we can transition them to a “better” way while the engineering team has
   successfully expelled business users out of an already busy production
   environment. Last but certainly not least is our new baby, the data
   warehouse.  It is a read-only, star-schema representation of fact and
   dimensional tables for growth subject areas.  We’ve pushed the aforementioned
   nuisance and complexity into our data pipeline (ETL) process and are able to
   synthesize atomic and summary metrics in a format that is more intuitive for
   our business users. Legacy workloads that are complex and underperforming can
   now be transitioned over to the data warehouse schema incrementally.
    Further, because all three schemas live in the same MySQL server,
   client_analytics becomes a central hub from which our colleagues can join
   tables that have not yet been modeled in the warehouse with key dimensions
   that have been.  They get the best of both worlds while we look to what comes
   next  Finally, transition is prioritized in-stream with the needs of the
   organization and we never bite off more than we can chew. 4. Standardize and
   educate A major part of our data warehouse build out was in clarifying
   definitions of business terms and key metrics present in our daily parlance.
    Maintaining a Data Dictionary wiki became a part of our Definition of Done.
    Our dashboards, displayed on large screen TVs and visible by all, were the
   first to be relabeled and remodeled.  Reports available to the entire office
   were next.  Cleaning up the most looked at metrics helped the organization
   speak to and understand key data in a consistent manner. 5. Maintain a tight
   feedback loop The team follows an agile process familiar to modern technology
   organizations.  We Scrum, we Git, and we Jenkins.  We stay in regular contact
   with stakeholders throughout a build-out and iterate over MVPs. Now, back to
   the future These are just the first few bootstrapping steps.  In future posts
   I will be tempted to wax technical and provide more color on the choices
   we’ve made and why.  I will also share our vision for an Event Narrative Data
   Warehouse and how we are leveraging start-up friendly partners such as
   MixPanel for real-time event processing, funneling, and segmentation.
    Finally, we will share some tactics for enabling data scientists to be more
   collaborative and presentational with their R or Python visualizations. At
   Betterment, our ultimate goal is to continue developing products that change
   the investing world—and that starts with data. But data alone is not
   enough—we needed the right storytellers. As we see it, the members of Team
   Polaris are the bards of a data narrative that help the organization grow
   while delivering a top-tier product. Interested in engineering at
   Betterment? Betterment is an engineering-driven company that has developed
   the most trusted online financial advisor based on the principles of
   optimization and efficiency. Learn more about engineering jobs and our
   culture. Determination of most trusted online financial advisor reflects
   Betterment LLC's distinction of having the most customers in the industry,
   made in reliance on customer counts, self-reported pursuant to SEC rules,
   across all online-only registered investment advisors.  
   7 min read


 * ONE MASSIVE MONTE CARLO, ONE VERY EFFICIENT SOLUTION
   
   One Massive Monte Carlo, One Very Efficient Solution We optimized our
   portfolio management algorithms in six hours for less than $500. Here’s how
   we did it. Optimal portfolio management requires managing a portfolio in
   real-time, including taxes, rebalancing, risk, and circumstantial variables
   like cashflows. It’s our job to fine-tune these to help our clients, and it’s
   very important we have these decisions be robust to the widest possible array
   of potential futures they might face. We recently re-optimized our portfolio
   to include more complex asset allocations and risk models (and it will soon
   be available). Next up was optimizing our portfolio management algorithms,
   which manage cashflows, rebalances, and tax exposures. It’s as if we
   optimized the engine for a car, and now we needed to test it on the race
   track with different weather conditions, tires, and drivers. Normally, this
   is a process that can literally take years (and may explain why legacy
   investing services are slow to switch to algorithmic asset allocation and
   advice.) But we did things a little differently, which saved us thousands of
   computing hours and hundreds of thousands of dollars. First, the Monte Carlo
   The testing framework we used to assess our algorithmic strategies needed to
   fulfill a number of criteria to ensure we were making robust and informed
   decisions. It needed to: Include many different potential futures Include
   many different cash-flow patterns Respect path dependence (taxes you pay this
   year can’t be invested next year) Accurately test how the algorithm would
   perform if run live. To test our algorithms-as-strategies, we simulated the
   thousands of potential futures they might encounter. Each set of strategies
   was confronted with both bootstrapped historical data and novel simulated
   data. Bootstrapping is a process by which you take random chunks of
   historical data and re-order it. This made our results robust to the risk of
   solely optimizing for the past, a common error in the analysis of strategies.
   We used both historic and simulated data because they complement each other
   in making future-looking decisions: The historical data allows us to include
   important aspects of return movements, like auto-correlation, volatility
   clustering, correlation regimes, skew, and fat tails. It is bootstrapped
   (sampled in chunks) to help generate potential futures. The simulated data
   allows us to generate novel potential outcomes, like market crashes bigger
   than previous ones, and generally, futures different than the past. The
   simulations were detailed enough to replicate how they’d run in our live
   systems, and included, for example, annual tax payments due to capital gains
   over losses, cashflows from dividends and the client saving or withdrawing.
   It also showed how an asset allocation would perform over the lifetime of an
   investment. During our testing, we ran over 200,000 simulations of 12 daily
   level returns of our 12 asset classes for 20 year's worth of returns. We
   included realistic dividends at an asset class level. In short, we tested a
   heckuva a lot of data. Normally, running this Monte Carlo would have taken
   nearly a full year to complete on a single computer, but we created a far
   more nimble system by piecing together a number of existing technologies. By
   harnessing the power of Amazon Web Services (specifically EC2 and S3) and a
   cloud-based message queue called IronMQ we reduced that testing time to just
   six hours—and for a total cost of less than $500. How we did it 1. Create an
   input queue: We created a bucket with every simulation—more than 200,000—we
   wanted to run. We used IronMQ to manage the queue, which  allows individual
   worker nodes to pull inputs themselves instead of relying on a system to
   monitor worker nodes and push work to them. This solved the problem found in
   traditional systems where a single node acts as the gatekeeper, which can get
   backed up, either breaking the system or leading to idle testing time. 2.
   Create 1,000 worker instances: With Amazon Cloud Service, we signed up to
   access time on 1,000 virtual machines. This increased our computing power by
   a thousandfold, and buying time is cheap on these machines. We employed the
   m1.small instances, relying on the quality of quantity. 3. Each machine pulls
   a simulation: Thanks the the maturation of modern message queues it is more
   advantageous and simple to orchestrate jobs in a pull-based fashion, than the
   old push system, as we mentioned above.  In this model there is no single
   controller. Instead, each worker acts independently.  When the worker is idle
   and ready for more work, it takes it upon itself to go out and find it.  When
   there’s no more work to be had, the worker shuts itself down. 4. Store
   results in central location: We used another Amazon Cloud service called S3
   to store the results of each simulation. Each file — with detailed asset
   allocation, tax, trading and returns information — was archived inexpensively
   in the cloud. Each file was also named algorithmically to allow us to refer
   back to it and do granular audits of each run. 5. Download results for local
   analysis: From S3, we could download the summarized results of each of our
   simulations for analysis on a "regular" computer. The resulting analytical
   master file was still large, but small enough to fit on a regular MacBook
   Pro. We ran the Monte Carlo simulations over two weekends. Keeping our
   overhead low, while delivering top-of-the-line portfolio analysis and
   optimization is a key way we keep investment fees as low as possible.  This
   is just one more example of where our quest for efficiency—and your
   happiness—paid off. This post was written with Dan Egan.
   5 min read


 * ENGINEERING THE TRADING PLATFORM: INSIDE BETTERMENT’S PORTFOLIO OPTIMIZATION
   
   Engineering the Trading Platform: Inside Betterment’s Portfolio Optimization
   To complete the portfolio optimization, Betterment engineers needed enhance
   the code in our existing trading platform. Here's how they did it. In just a
   few weeks, Betterment is launching an updated portfolio -- one that has been
   optimized for better expected returns. The optimization will be partly driven
   by a more sophisticated asset allocation algorithm, which will dynamically
   vary individual asset allocations within the stock and bond basket based on a
   goal’s overall allocation. This new flexible set of asset allocations
   significantly affects our current trading processes. Until now, we executed
   transactions based on fixed weights or a precise allocation of assets to
   every level of risk. Now, in our updated portfolio with a more sophisticated
   way to allocate, we are using a matrix to manage asset weights—and that
   requires more complex trading logic. From an engineering perspective, this
   means we needed to enhance the code in our existing trading platform to
   accommodate dynamic asset allocation, with an eye towards future enhancements
   in our pipeline. Here's how we did it. 1. Build a killer testing framework
   When dealing with legacy code, one of our top priorities is to preserve
   existing functionality. Failure to do so could mean anything from creating a
   minor inconvenience to blocking trades from executing. That means the next
   step was to build a killer testing framework. The novelty of our approach was
   to essentially build partial, precise scaffolding around our current
   platform. This kind of scaffolding allowed us to go in and out of the
    current platform to capture and store precise inputs and outputs, while
   isolating them away from any unnecessary stuff that wasn’t relevant to the
   core trading processes. 2. Isolate the right information With this
   abstraction, we were able to isolate the absolute core objects that we need
   to perform trades, and ignore the rest. This did two things: it took testing
   off the developers’ plates early in the process, allowing  them to focus on
   writing production code, and also helped isolate the central objects that
   required most of their attention. The parent object of any activity inside
   the Betterment platform is a “user transaction” — that includes deposits or
   withdrawals to a goal, dividends, allocation changes, transfer of money
   between goals and so on. The parent object of any activity inside the
   Betterment platform is a “user transaction” — that includes deposits or
   withdrawals for a goal, dividends, allocation changes, transfer of money
   between goals and so on. These were our inputs. In most cases, a user
   transaction will eventually be the parent of several trade objects. These
   were our outputs.  In our updated portfolio, the number of possible
   transactions types did not change. What did change, however, was how each
   transaction type was translated into trading activity, which is what we
   wanted to test exhaustively. We captured a mass of user transaction objects
   from production for use in testing. However, a user transaction object
   contains a host of data that isn’t relevant to the trades that will
   eventually be created, and is associated with other objects that are also not
   relevant.  So stripping out all non-trading data was the key to focusing on
   the right things to test for this project. 3. Use SQLite database to be
   efficient The best way to store the user transaction objects was to use JSON,
   a human-readable representation of Java objects. To do this, we used GSON,
   which lets you convert Java objects into JSON, and vice versa. We didn’t want
   to store the JSON in a MySQL database, because managing it would be
   unnecessary overhead for this purpose. Instead, we stored them in a flat
   SQLite database. On the way into SQLite, GSON allowed us to “flatten” the
   objects, leaving only the bits that pertained to trading and discarding the
   rest. Then, we could rearrange these chunks to replicate all sorts of trading
   activity patterns. On the way out, GSON would re-inflate the JSON back into
   Java objects, using dummy values for the irrelevant fields, providing us with
   test inputs ready to be pushed through our system. We did the same for
   outputs, which were also full of “noise” for our purposes. We’d shrink the
   expected results we got from production, then re-inflate and compare them to
   what our tests produced. 4. Do no harm to others' work At Betterment, we are
   constantly pushing through new features and enhancements, some visible to
   customers, but many not. Development on these is concurrent,  sometimes
   impacting global objects and schemas, and it was essential to insulate the
   team working on core trading functionality from all other development being
   done at the company. Just the portfolio transition work alone includes
   significant new code for front-end enhancements which have nothing to do with
   trading. The GSON/JSON/SQLite testing framework helped the trading team
   maintain laser focus on their task, as they worked under the hood. Otherwise,
   we’d be putting a sweet new set of tires on a car that won’t start!
   5 min read


 * THREE THINGS I LEARNED IN MY ENGINEERING INTERNSHIP
   
   Three Things I Learned In My Engineering Internship I knew I had a lot to
   learn about how a Web app works, but I never imagined that it involved as
   much as it does. This post is part of series of articles written by
   Betterment’s 2013 summer interns. This summer, I had the privilege of
   participating in a software engineering internship with Betterment. My
   assignment was to give everyone in the office a visual snapshot of how the
   company is doing. This would be accomplished through the use of dashboards
   displayed on TV screens inside the office. We wanted to highlight metrics
   such as net deposits, assets under management, and conversions from visitors
   to the site into Betterment customers. Coming in with experience in only
   Java, this was definitely a challenging project to tackle. Now that the
   summer has ended, I have accomplished my goal — I created five dashboards
   displaying charts, numbers and maps with valuable data that everyone can see.
   From this experience, there are three very important things that I’ve
   learned. 1. School has taught me nothing. Maybe this is a bit of an
   exaggeration. As a computer science major, school has taught me how to code
   in Java, and maybe some of the theoretical stuff that I’ve had drilled into
   my head will come in handy at some point in my life. However, writing
   mathematical proofs and small Java codes that complete standalone tasks seems
   pretty pointless now that I’ve experienced the real world of software
   development. There are so many links in the development chain, and what I
   have learned in school barely covers half of a link. Not to mention almost
   everything else I needed I was able to learn through Google, which makes me
   wonder if I could have learned Java through the Internet in a few weeks
   rather than spending the past two years in school? Needless to say I
   definitely wish I could stay and work with Betterment rather than going back
   to school next week, but today’s society is under the strange impression that
   a college degree is important, so I guess I’ll finish it out. 2. The
   structure of a Web app is a lot more complex than what the user sees on the
   page. Before I began my internship, I had never worked on a Web app before. I
   knew I had a lot to learn about how it all works, but I never imagined that
   it involved as much as it does. There’s a database on the bottom, then the
   backend code is layered on top of that — and then that is broken up into
   multiple levels in order to keep different kinds of logic separate. And on
   top of all that, is the front end code. All of it is kept together with
   frameworks that allow the different pieces to communicate with each other,
   and there are servers that the app needs to run on.This was extremely
   eye-opening for me, and I’m so glad that the engineers at Betterment spent
   time during my first week getting me up to speed on all of it. I was able to
   build my dashboards as a Web app, so I not only needed to understand this
   structure, but I needed to implement it as well. 3. A software engineer needs
   to be multilingual. I’m not talking about spoken languages. The different
   pieces in the structure of a web app are usually written in different
   computer languages. Being that Java only covered a small piece of this
   structure, I had a lot of languages to learn. Accessing the database requires
   knowledge of SQL, a lot of scripts are written in Python, front end structure
   and design is written in HTML and CSS, and front end animation is written in
   javascript. In order to effectively work on multiple pieces of an app, an
   engineer needs to be fluent in multiple different languages. Thankfully, the
   Internet makes learning languages quick and easy, and I was able to pick up
   on so many new languages throughout the summer. My experience this summer has
   been invaluable, and I will be returning to school with a brand new view on
   software development and what a career in this awesome field will be like.
   4 min read


 * KEEPING OUR CODE BASE SIMPLE, OPTIMALLY
   
   Keeping Our Code Base Simple, Optimally Betterment engineers turned
   regulatory compliance rules into an optimization problem to keep the code
   base simple. Here's how they did it. At Betterment, staying compliant with
   regulators, such as the Securities and Exchange Commission, is a part of
   everyday life.  We’ve talked before about how making sure everything is
   running perfectly -- especially given all the cases we need to handle --
   makes us cringe at the cyclomatic complexity of some of our methods. It’s a
   constant battle to keep things maintainable, readable, testable, and
   efficient. We recently put some code into production that uses an optimizer
   to cut down on the  amount of code we’re maintaining ourselves, and it turned
   out to be pretty darn cool. It makes communicating with our regulators
   easier, and is doing so in a pretty impressive fashion. We were tasked with
   coming up with an algorithm that, at first pass, made me nervous about all
   the different cases it would need to handle in order to do things
   intelligently. Late one night, we started bouncing ideas off each other on
   how to pull it off. We needed to make decisions at a granular level, test how
   they affected the big picture, and then adjust accordingly. To use a
   Seinfield analogy, the decisions we would make for Jerry had an effect on
   what the best decisions were for Elaine. But, if Elaine was set up a certain
   way, we wanted to go back to Jerry and adjust the decisions we made for him.
    Then George. Then Newman. Then Kramer. Soon we had thought about so many
   if-statements that they no longer seemed like if-statements, and all the
   abstractions I was formulating were already leaking. Then a light came on. We
   could not only make good decisions for Elaine, Jerry, and Newman, we could
   make those decisions optimally. A little bit of disclaimer here before we
   start digging in a little more: I can barely scratch the surface of how
   solvers work. I just happen to know that it was a tool available to us, and
   it happened to model the problem we needed to solve very well. This is meant
   as an introduction to using one specific solver as a way to model and solve a
   problem. An example Let’s say at the last minute, the Soup Nazi is out to
   make the biggest batch of soup he possibly can. For his recipe he needs a
   ratio of: 40% chicken 12% carrots 8% thyme 15% onions 15% noodles 5% garlic
   5% parsley All of the stores around him only keep limited amounts in stock.
   He calls around to all the stores just to see what the have in stock and puts
   together each store’s inventory: Ingredients in stock (lbs) Elaine’s George’s
   Jerry’s Newman’s Chicken 5 6 2 3 Carrots 1 8 5 2 Thyme 3 19 16 6 Onions 6 12
   10 4 Noodles 5 0 3 9 Garlic 2 1 1 0 Parsley 3 6 2 1 Also, the quality of the
   bags at all of the stores vary, limiting the total number pounds of food the
   Soup Nazi can carry back. (We’re also assuming he only wants to make at most
   one visit to each store.) Pound of food limits Elaine’s 12 George’s 8 Jerry’s
   15 Newman’s 17 With the optimizer, the function that we are trying to
   minimize or maximize is called the objective function. In this example, we
   are trying to maximize the number of pounds of ingredients he can buy because
   that will result in the most soup. If we say that,
   a1 = pounds of chicken purchased from Elaine’s
   a2 = pounds of carrots purchased from Elaine’s
   a3 = pounds of thyme purchased from Elaine’s …
   a7 = pounds of parsley purchased from Elaine’s
   b1 = pounds of chicken purchased from George’s …
   c1 = pounds of chicken purchased from Jerry’s …
   d1 = pounds of chicken purchased from Newman’s … We’re looking to maximize,
   a1 + a2 + a3 … + b1 + … + d7 = total pounds We then have to throw in all of
   the constraints to our problem. First to make sure the Soup Nazi gets the
   ratio of ingredients he needs: .40 * total pounds = a1 + b1 + c1 + d1
   .12 * total pounds = a2 + b2 + c2 + d2 .08 * total pounds = a3 + b3 + c3 + d3
   .15 * total pounds = a4 + b4 + c4 + d4 .15 * total pounds = a5 + b5 + c5 + d5
   .05 * total pounds = a6 + b6 + c6 + d6 .05 * total pounds = a7 + b7 + c7 + d7
   Then to make sure that the Soup Nazi doesn’t buy more pounds of food from one
   store than he can carry back: a1 + a2 + … + a7 <= 12 b1 + b2 + … + b7 <= 8
   c1 + c2 + … + c7 <= 15 d1 + d2 + … + d7 <= 17 We then have to put bounds on
   all of our variables to say that we can’t take more pounds of any ingredient
   than any store has in stock. 0 <= a1 <= 5 0 <= a2 <= 1 0 <= a3 <= 3
   0 <= a4 <= 6 … 0 <= d7 <= 1 That expresses all of the constraints and bounds
   to our problem and the optimizer works to maximize or minimize the objective
   function subject to those bounds and constraints. The optimization package
   we’re using in this example, python’s scipy.optimize, provides a very
   expressive interface for specifying all of those bounds and constraints.
   Translating the problem into code If you want to jump right in, check out the
   full sample code. However, there are still a few more things to note: Get
   numpy and scipy installed. The variables we’re solving for are put into a
   single list. That means, x = [a1, a2, … , a7, b1, b2 … d7]. With python, it’s
   helpful to know that we can pull the pounds of food for a particular
   ingredient out of x,  i.e, [a1, b1, c1, d1] with
   x[ingredient_index :: num_of_ingredients] Likewise, we can pull out the
   ingredients for a given store with
   x[store_index * num_of_ingredients : store_index * num_of_ingredients + num_of_ingredients]
   e.g, [b1, b2, b3, b4, b5, b6, b7] For this example, we’re using the
   scipy.optimize.minimize function using the ‘NLSQP’ method. Arguments provided
   to the minimize function Objective function With the package we’re using,
   there is no option to maximize. This might seem like a show stopper, but we
   get around it by negating our objective function, minimizing, and then
   negating the results. Therefore our objective function becomes,  
   −a1 − a2 − a3 − a4 − … − d6 − d7 And expressing that with numpy is pretty
   painless: numpy.sum(x) * −1.0 Bounds Bounds make sure that we don’t take more
   than any one ingredient than the store has in stock. The minimize function
   takes this in as a list of tuples where the indices line up with x. We can’t
   take negative ingredients from the store, so the lower bound it always 0.
   Therefore, [(0, 5), (0, 1) … (0, 1)] In the code example, for readability, I
   threw all of the inputs into the program into some globals dictionaries.
   Therefore, we can calculate our bounds with, def calc_bounds(): bounds = []
   for s in stores: for i in ingredients:
   bounds.append((0, store_inventory[s][i])) return bounds Guess Providing a
   good initial guess can go a long way in getting you to a desirable solution.
   It can also dramatically reduce the amount of time it takes to solve a
   problem. If you’re not seeing numbers you expect, or it is taking a long time
   to come up with a solution, the initial guess is often the first place to
   start. For this problem, we made our initial guess to be  what each store had
   in stock, and we supplied it to the minimize method as a list. Constraints
   One thing to note is that for the packages we’re using, constraints only deal
   with ‘ineq’ and ‘eq’ where ‘ineq’ means greater than. The right hand side of
   the equation is assumed to be zero. Also, we are providing the constraints as
   tuple of dictionaries. (a1 + b1 + c1 + d1) − (.40 * total pounds) > 0 ...
   (a7 + b7 + c7 + d7) − (.05 * total pounds) > 0 Note here that I changed the
   constraints from equal-to to greater-than because comparing floats to be
   exactly equal is a hard problem when you’re multiplying and adding numbers.
   Therefore, to make sure we limit chicken to 40% of the overall ingredients,
   one element of the constraints tuple will be, {'type' : 'ineq',
   'fun' : lambda x : sum(extract_ingredient_specific_pounds(x, chicken)) − (calc_total_pounds_of_food(x) * .4) }
   Making sure the soup nazi is able to carry everything back from the store:
   12 − a1 − a2 − … − a7 >= 0 … 17 − d1 − d2 − … − d7 >=  17 Leads to,
   {'type' : 'ineq',
   'fun' : lambda x : max_per_store[store] − np.sum(extract_store_specific_pounds(x, store))}
   Hopefully this gives you enough information to make sense of the code
   example. The Results? Pretty awesome. The Soup Nazi should only buy a total
   of 40 lbs worth ingredients because Elaine, George, Jerry, and Newman just
   don’t have enough chicken.
   9.830 lbs of food from Elaine's. Able to carry 12.0 pounds.
   chicken: 5.000 lbs (5.0 in stock) carrots: 0.000 lbs (1.0 in stock)
   thyme: 0.000 lbs (3.0 in stock) onions: 0.699 lbs (6.0 in stock)
   noodles: 1.000 lbs (5.0 in stock) garlic: 1.565 lbs (2.0 in stock)
   parsley: 1.565 lbs (3.0 in stock)
   7.582 lbs of food from George's. Able to carry 8.0 pounds.
   chicken: 6.000 lbs (6.0 in stock) carrots: 0.667 lbs (8.0 in stock)
   thyme: 0.183 lbs (19.0 in stock) onions: 0.733 lbs (12.0 in stock)
   noodles: 0.000 lbs (0.0 in stock) garlic: 0.000 lbs (1.0 in stock)
   parsley: 0.000 lbs (6.0 in stock)
   13.956 lbs of food from Jerry's. Able to carry 15.0 pounds.
   chicken: 2.000 lbs (2.0 in stock) carrots: 3.501 lbs (5.0 in stock)
   thyme: 3.017 lbs (16.0 in stock) onions: 4.568 lbs (10.0 in stock)
   noodles: 0.000 lbs (3.0 in stock) garlic: 0.435 lbs (1.0 in stock)
   parsley: 0.435 lbs (2.0 in stock)
   8.632 lbs of food from Newman's. Able to carry 17.0 pounds.
   chicken: 3.000 lbs (3.0 in stock) carrots: 0.632 lbs (2.0 in stock)
   thyme: 0.000 lbs (6.0 in stock) onions: 0.000 lbs (4.0 in stock)
   noodles: 5.000 lbs (9.0 in stock) garlic: 0.000 lbs (0.0 in stock)
   parsley: 0.000 lbs (1.0 in stock)
   16.000 lbs of chicken. 16.0 available across all stores. 40.00%
   4.800 lbs of carrots. 16.0 available across all stores. 12.00%
   3.200 lbs of thyme. 44.0 available across all stores. 8.00%
   6.000 lbs of onions. 32.0 available across all stores. 15.00%
   6.000 lbs of noodles. 17.0 available across all stores. 15.00%
   2.000 lbs of garlic. 4.0 available across all stores. 5.00%
   2.000 lbs of parsley. 12.0 available across all stores. 5.00% Bringing it all
   together Hopefully this gives you a taste of the types of problems optimizers
   can be used for. At Betterment, instead of picking pounds of ingredients from
   a given store, we are using it to piece together a mix of securities, in
   order to keep us compliant with certain regulatory specifications. While
   there was a lot of work involved in making our actual implementation
   production-ready (and a lot more work can be done to improve it), being able
   to express rules coming out of a regulatory document as a series of bounds
   and constraints via anonymous functions was a win for the readability of our
   code base. I’m also hoping that it will make tacking on additional rules
   painless in comparison to weaving them into a one off algorithm.
   8 min read

Show more


SUBSCRIBE TO NEW ARTICLES

Email*

Notification Frequency


Add to your RSS Feed


JOIN OUR OPEN SOURCE PROJECTS


 * TEST_TRACK
   
   Server app for the TestTrack multi-platform split-testing and feature-gating
   system.
   
   See it on GitHub


 * WEBVALVE
   
   Betterment’s framework for locally developing and testing service-oriented
   apps in isolation with WebMock and Sinatra-based fakes.
   
   See it on GitHub


 * BETTER_TEST_REPORTER
   
   Tooling and libraries for processing dart test output into dev-friendly
   formats.
   
   See it on GitHub


 * DELAYED
   
   A multi-threaded, SQL-driven ActiveJob backend used at Betterment to process
   millions of background jobs per day.
   
   See it on GitHub


Triangle illustration


COME BUILD WITH US.



Explore open roles


Betterment Logo Icon SVG Footer
 * Accounts
   * Investing
   * IRAs and 401(k)s
   * Roth IRAs
   * Cash Reserve
   * Checking
   * Trusts
 * Investments
   * Portfolio options
   * Socially responsible investing
   * Tax-smart investing
   * Charitable giving
   * 401(k) rollovers
   * Retirement income
 * Tools
   * Retirement planning
   * Track your goals
   * All-in-one dashboard
   * Compare robo-advisors
   * Rewards
   * Refer-a-friend program
 * Help
   * Help center
   * FAQ
   * Expert guidance
   * Investment philosophy
   * Article library
   * Video
   * Legal
 * Company
   * Pricing
   * About us
   * Mobile app
   * How Betterment works
   * Product roadmap
   * Press
   * Betterment shop
   * Careers

 * Betterment on Instagram
 * Betterment on Facebook
 * Betterment on Twitter
 * Betterment on LinkedIn

 * 
 * 

 * Contact us
 * Betterment 401(k)
 * Betterment for Advisors

This page is operated and maintained by Betterment Holdings Inc. and it is not
associated with Betterment LLC or MTG LLC. © 2021 Betterment Holdings Inc.

You are viewing a web property located at Betterment.com. Different properties
may be provided by a different entity with different marketing standards.

 * Site Map
 * Terms of Use
 * Privacy Policy
 * Trademark
 * Legal Directory

Google Play and the Google Play logo are trademarks of Google, Inc.

Apple, the Apple logo, and iPhone are trademarks of Apple, Inc., registered in
the U.S.

Any links provided to other websites are offered as a matter of convenience and
are not intended to imply that Betterment or its authors endorse, sponsor,
promote, and/or are affiliated with the owners of or participants in those
sites, unless stated otherwise.

© Betterment. All rights reserved