www.betterment.com
Open in
urlscan Pro
2606:2c40::c73c:67fe
Public Scan
Submitted URL: http://app.upsider.ai/engage/SiumrxyS0MVp99mHkUX7slKFUzqwy88A/click?signature=7a84746c2ab8b4f9f1ae37be82af8efb9ee441ba...
Effective URL: https://www.betterment.com/engineering
Submission Tags: demotag1 demotag2 Search All
Submission: On January 17 via api from US — Scanned from DE
Effective URL: https://www.betterment.com/engineering
Submission Tags: demotag1 demotag2 Search All
Submission: On January 17 via api from US — Scanned from DE
Form analysis
2 forms found in the DOM<form class="js-popular-topic-filter" onsubmit="event.preventDefault()">
<input type="text" class="search-input" name="term" autocomplete="off" aria-label="Search" placeholder="Search your topic">
<button aria-label="Search" disabled=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" aria-label="Submit search" role="img">
<title>Submit search</title>
<g clip-path="url(#ui--search_svg__ui--search_svg__clip0)">
<path d="M2 5.5a3.5 3.5 0 117 0 3.5 3.5 0 01-7 0zm8 3.2c.7-.9 1-2 1-3.2a5.5 5.5 0 10-2.3 4.6l5.8 5.9 1.4-1.4-5.8-5.9H10z"></path>
</g>
<defs>
<clipPath data-iconid="ui--search_svg__ui--search_svg__clip0">
<rect width="16" height="16"></rect>
</clipPath>
</defs>
</svg> </button>
</form>
POST https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e
<form novalidate="" accept-charset="UTF-8" action="https://forms.hsforms.com/submissions/v3/public/submit/formsnext/multipart/5274572/6f7868a5-674b-46e4-8c4e-e7018716724e" enctype="multipart/form-data"
id="hsForm_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" method="POST"
class="hs-form stacked hs-custom-form hs-form-private hsForm_6f7868a5-674b-46e4-8c4e-e7018716724e hs-form-6f7868a5-674b-46e4-8c4e-e7018716724e hs-form-6f7868a5-674b-46e4-8c4e-e7018716724e_bc482810-c0cd-471d-b4f3-322f8609890a"
data-form-id="6f7868a5-674b-46e4-8c4e-e7018716724e" data-portal-id="5274572" target="target_iframe_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0">
<div class="hs_email hs-email hs-fieldtype-text field hs-form-field" data-reactid=".hbspt-forms-0.1:$0"><label id="label-email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="" placeholder="Enter your Email"
for="email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0.1:$0.0"><span data-reactid=".hbspt-forms-0.1:$0.0.0">Email</span><span class="hs-form-required" data-reactid=".hbspt-forms-0.1:$0.0.1">*</span></label>
<legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$0.1"></legend>
<div class="input" data-reactid=".hbspt-forms-0.1:$0.$email"><input id="email-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="hs-input" type="email" name="email" required="" placeholder="john.dev@betterment.com" value="" autocomplete="email"
data-reactid=".hbspt-forms-0.1:$0.$email.0" inputmode="email"></div>
</div>
<div class="hs_blog_engineering_blog_57138234906_subscription hs-blog_engineering_blog_57138234906_subscription hs-fieldtype-radio field hs-form-field" style="display:none;" data-reactid=".hbspt-forms-0.1:$1"><label
id="label-blog_engineering_blog_57138234906_subscription-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" class="" placeholder="Enter your Notification Frequency"
for="blog_engineering_blog_57138234906_subscription-6f7868a5-674b-46e4-8c4e-e7018716724e_8405" data-reactid=".hbspt-forms-0.1:$1.0"><span data-reactid=".hbspt-forms-0.1:$1.0.0">Notification Frequency</span></label>
<legend class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.1:$1.1"></legend>
<div class="input" data-reactid=".hbspt-forms-0.1:$1.$blog_engineering_blog_57138234906_subscription"><input name="blog_engineering_blog_57138234906_subscription" class="hs-input" type="hidden" value=""
data-reactid=".hbspt-forms-0.1:$1.$blog_engineering_blog_57138234906_subscription.0"></div>
</div><noscript data-reactid=".hbspt-forms-0.2"></noscript>
<div class="hs_submit hs-submit" data-reactid=".hbspt-forms-0.5">
<div class="hs-field-desc" style="display:none;" data-reactid=".hbspt-forms-0.5.0"></div>
<div class="actions" data-reactid=".hbspt-forms-0.5.1"><input type="submit" value="Subscribe" class="hs-button primary large" data-reactid=".hbspt-forms-0.5.1.0"></div>
</div><noscript data-reactid=".hbspt-forms-0.6"></noscript><input name="hs_context" type="hidden"
value="{"rumScriptExecuteTime":5347.5,"rumServiceResponseTime":5532.10000038147,"rumFormRenderTime":2,"rumTotalRenderTime":5846.299999237061,"rumTotalRequestTime":183.10000038146973,"embedAtTimestamp":"1642428780097","formDefinitionUpdatedAt":"1639779970100","pageUrl":"https://www.betterment.com/engineering","pageTitle":"Betterment Engineering Blog","source":"FormsNext-static-5.432","sourceName":"FormsNext","sourceVersion":"5.432","sourceVersionMajor":"5","sourceVersionMinor":"432","timestamp":1642428780100,"userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36","originalEmbedContext":{"portalId":"5274572","formId":"6f7868a5-674b-46e4-8c4e-e7018716724e","formInstanceId":"8405","pageId":"57138234906","region":"na1","inlineMessage":true,"rawInlineMessage":"Thanks for submitting the form.","hsFormKey":"614aaedb59683c891e5f0c577d21cbd0","target":"#hs_form_target_form_493431243","contentType":"listing-page","formsBaseUrl":"/_hcms/forms/","formData":{"cssClass":"hs-form stacked hs-custom-form"}},"canonicalUrl":"https://www.betterment.com/engineering","pageId":"57138234906","formInstanceId":"8405","renderedFieldsIds":["email"],"rawInlineMessage":"Thanks for submitting the form.","hsFormKey":"614aaedb59683c891e5f0c577d21cbd0","formTarget":"#hs_form_target_form_493431243","correlationId":"fee6b933-17db-4581-aee7-606ab71c22d7","contentType":"listing-page","hutk":"da3ac8a6abc857df5b696649f350a5a9","captchaStatus":"NOT_APPLICABLE","isHostedOnHubspot":true}"
data-reactid=".hbspt-forms-0.7"><iframe name="target_iframe_6f7868a5-674b-46e4-8c4e-e7018716724e_8405" style="display:none;" data-reactid=".hbspt-forms-0.8"></iframe>
</form>
Text Content
Skip to main content Betterment Logo Open menu * Careers * Engineering * Blogs * Product and design blog * Engineering blog About us Explore openings ENGINEERING AT BETTERMENT High quality code. Beautiful, practical design. Innovative problem solving. Explore our engineering community and nerd out with us on all things tech. RECENT ARTICLES Filter articles Submit search * Building culture * Data & Algorithms * Designing experiences * Inclusivity * Operating software * Security * Solving problems * Testing software * All articles No results found * FINDING A MIDDLE GROUND BETWEEN SCREEN AND UI TESTING IN FLUTTER Finding a Middle Ground Between Screen and UI Testing in Flutter We outline the struggles we had testing our flutter app, our approaches to those challenges, and the solutions we arrived at to solve those problems. Flutter provides good solutions for both screen testing and UI testing, but what about the middle-ground? With integration testing being a key level of the testing pyramid, we needed to find a way to test how features in our app interacted without the overhead involved with setting up UI tests. I’m going to take you through our testing journey from a limited native automated testing suite and heavy dependence on manual testing, to trying flutter’s integration testing solutions, to ultimately deciding to build out our own framework to increase confidence in the integration of our components. The beginning of our Flutter testing journey Up until early 2020, our mobile app was entirely native with separate android and iOS codebases. At the onset of our migration to flutter, the major testing pain point was that a large amount of manual regression testing was required in order to approve each release. This manual testing was tedious and time consuming for engineers, whose time is expensive. Alongside this manual testing pain, the automated testing in the existing iOS and android codebases was inconsistent. iOS had a larger unit testing suite than android did, but neither had integration tests. iOS also had some tests that were flaky, causing CI builds to fail unexpectedly. As we transitioned to flutter, we made unit/screen testing and code testability a high priority, pushing for thorough coverage. That said, we still relied heavily on the manual testing checklist to ensure the user experience was as expected. This led us to pursue an integration testing solution for flutter. In planning out integration testing, we had a few key requirements for our integration testing suite: Easily runnable in CI upon each commit An API that would be familiar to developers who are used to writing flutter screen tests The ability to test the integration between features within the system without needing to set up the entire app. The Flutter integration testing landscape At the very beginning of our transition to flutter, we started trying to write integration tests for our features using flutter’s solution at the time: flutter_driver. The benefit we found in flutter_driver was that we could run it in our production-like environment against preset test users. This meant there was minimal test environment setup. We ran into quite a few issues with flutter_driver though. Firstly, there wasn’t a true entry point we could launch the app into because our app is add-to-app, meaning that the flutter code is embedded into our iOS and Android native applications rather than being a pure flutter app runnable from a main.dart entry point. Second, `flutter_driver` is more about UI/E2E testing rather than integration testing, meaning we’d need to run an instance of the app on a device, navigate to a flow we wanted to test, and then test the flow. Also, the flutter_driver API worked differently than the screen testing API and was generally more difficult to use. Finally, flutter_driver is not built to run a suite of tests or to run easily in CI. While possible to run in CI, it would be incredibly costly to run on each commit since the tests need to run on actual devices. These barriers led us to not pursue `flutter_driver` tests as our solution. We then pivoted to investigating Flutter’s newer replacement for flutter_driver : integation_test. Unfortunately `integration_test` was very similar to flutter_driver, in that it took the same UI/E2E approach, which meant that it had the same benefits and drawbacks that flutter_driver had. The one additional advantage of `integration_test` is that it uses the same API as screen tests do, so writing tests with it feels more familiar for developers experienced with writing screen tests. Regardless, given that it has the same problems that flutter_driver does, we decided not to pursue `integration_test` as our framework. Our custom solution to integration testing After trying flutter’s solutions fruitlessly, we decided to build out a solution of our own. Before we dive into how we built it, let’s revisit our requirements from above: Easily runnable in CI upon each commit An API that would be familiar to developers who are used to writing flutter screen tests The ability to test the integration between features within the system without needing to set up the entire app. Given those requirements, we took a step back to make a few overarching design decisions. First, we needed to decide what pieces of code we were interested in testing and which parts we were fine with stubbing. Because we didn’t want to run the whole app with these tests in order to keep the tests lightweight enough to run on each commit, we decided to stub out a few problem areas. The first was our flutter/native boundary. With our app being add-to-app and utilizing plugins, we didn’t want to have to run anything native in our testing. We stubbed out the plugins by writing lightweight wrappers around them then providing them to the app at a high level that we could easily override with fakes for the purpose of integration testing. The add-to-app boundary was similar. The second area we wanted to stub out was the network. In order to do this, we built out a fake http client that allows us to configure network responses for given requests. We chose to fake the http client since it is the very edge of our network layer. Faking it left as much of our code as possible under test. The next thing we needed to decide was what user experiences we actually wanted to test with our integration tests. Because integration tests are more expensive to write and maintain than screen tests, we wanted to make sure the flows we were testing were the most impactful. Knowing this, we decided to focus on “happy paths” of flows. Happy paths are non-exceptional flows (flows not based on bad user state or input). On top of being less impactful, these sad paths usually give feedback on the same screen as the input, meaning those sad path cases are usually better tested at the screen test level anyway. From here, we set out to break down responsibilities of the components of our integration tests. We wanted to have a test harness that we could use to set up the app under test and the world that the app would run in, however we knew this configuration code would be mildly complicated and something that would be in flux. We also wanted a consistent framework by which we could write these tests. In order to ensure changes to our test harness didn’t have far reaching effects on the underlying framework, we decided to split out the testing framework into an independent package that is completely agnostic to how our app operates. This keeps the tests feeling familiar to normal screen tests since the exposed interface is very similar to how widget tests are written. The remaining test harness code was put in our normal codebase where it can be iterated on freely. The other separation we wanted to make was between the screen interactions and the tests themselves. For this we used a modified version of Very Good Venture's robot testing pattern that would allow us to reuse screen interactions across multiple tests while also making our tests very readable from even a non-engineering perspective. In order to fulfill two of our main requirements: being able to run as part of our normal test suite in CI and having a familiar API, we knew we’d need to build our framework on top of flutter’s existing screen test framework. Being able to integrate (ba dum tss) these new tests into our existing test suite is excellent because it meant that we would get quick feedback when code breaks while developing. The last of our requirements was to be able to launch into a specific feature rather than having to navigate through the whole app. We were able to do this by having our app widget that handles dependency setup take a child, then pumping the app widget wrapped around whatever feature widget we wanted to test. With all these decisions made, we arrived at a well-defined integration testing framework that isolated our concerns and fulfilled our testing requirements. The Nitty Gritty Details In order to describe how our integration tests work, let's start by describing an example app that we may want to test. Let's imagine a simple social network app, igrastam, that has an activity feed screen, a profile screen, a flow for updating your profile information, and a flow for posting images. For this example, we’ll say we’re most interested in testing the profile information edit flows to start. First, how would we want to make a test harness for this app? We know it has some sort of network interactions for fetching profile info and posts as well as for posting images and editing a profile. For that, our app has a thin wrapper around the http package called HttpClient. We may also have some interactions with native code through a plugin such as image_cropper. In order to have control over that plugin, this app has also made a thin wrapper service for that. This leaves our app looking something like this: Given that this is approximately what the app looks like, the test harness needs to grant control of the HttpClient and the ImageCropperService. We can do that by just passing our own fake versions into the app. Awesome, now that we have an app and a harness we can use to test it, how are the tests actually written? Let’s start out by exploring that robot testing technique I mentioned earlier. Say that we want to start by testing the profile edit flow. One path through this flow contains a screen for changing your name and byline, then it bounces out to picking and cropping a profile image, then allows you to choose a preset border to put on your profile picture. For the screen for changing your name and byline, we can build a robot to interact with the screen that looks something like this: By using this pattern, we are able to reuse test code pertaining to this screen across many tests. It also keeps the test file clean of WidgetTester interaction, making the tests read more like a series of human actions rather than a series of code instructions. Okay, we’ve got an app, a test harness, and robots to interact with the screens. Let’s put it all together now into an actual test. The tests end up looking incredibly simple once all of these things are in place(which was the goal!) This test would go on to have a few more steps detailing the interactions on the subsequent screens. With that, we’ve been able to test the integration of all the components for a given flow, all written in widget-test-like style without needing to build out the entire app. This test could be added into our suite of other tests and run with each commit. Back to the bigger picture Integration testing in flutter can be daunting due to how heavy the `flutter_driver`/`integration_test` solutions are with their UI testing strategies. We were able to overcome this and begin filling out the middle level of our testing pyramid by adding structure on top of the widget testing API that allows us to test full flows from start to finish. When pursuing this ourselves, we found it valuable to evaluate our testing strategy deficits, identify clear-cut boundaries around what code we wanted to test, and establish standards around what flows through the app should be tested. By going down the path of integration testing, we’ve been able to increase confidence in everyday changes as well as map out a plan for eliminating our manual test cases. 11 min read * WHY (AND HOW) BETTERMENT IS USING JULIA Why (And How) Betterment Is Using Julia Betterment is using Julia to solve our own version of the “two-language problem." At Betterment, we’re using Julia to power the projections and recommendations we provide to help our customers achieve their financial goals. We’ve found it to be a great solution to our own version of the “two-language problem”–the idea that the language in which it is most convenient to write a program is not necessarily the language in which it makes the most sense to run that program. We’re excited to share the approach we took to incorporating it into our stack and the challenges we encountered along the way. Working behind the scenes, the members of our Quantitative Investing team bring our customers the projections and recommendations they rely on for keeping their goals on-track. These hard-working and talented individuals spend a large portion of their time developing models, researching new investment ideas and maintaining our research libraries. While they’re not engineers, their jobs definitely involve a good amount of coding. Historically, the team has written code mostly in a research environment, implementing proof-of-concept models that are later translated into production code with help from the engineering team. Recently, however, we’ve invested significant resources in modernizing this research pipeline by converting our codebase from R to Julia and we’re now able to ship updates to our quantitative models quicker, and with less risk of errors being introduced in translation. Currently, Julia powers all the projections shown inside our app, as well as a lot of the advice we provide to our customers. The Julia library we built for this purpose serves around 18 million requests per day, and very efficiently at that. Examples of projections and recommendations at Betterment. Does not reflect any actual portfolio and is not a guarantee of performance. Why Julia? At QCon London 2019, Steve Klabnik gave a great talk on how the developers of the Rust programming language view tradeoffs in programming language design. The whole talk is worth a watch, but one idea that really resonated with us is that programming language design—and programming language choice—is a reflection of what the end-users of that language value and not a reflection of the objective superiority of one language over another. Julia is a newer language that looked like a perfect fit for the investing team for a number of reasons: Speed. If you’ve heard one thing about Julia, it’s probably about it’s blazingly fast performance. For us, speed is important as we need to be able to provide real-time advice to our customers by incorporating their most up-to-date financial scenario in our projections and recommendations. It is also important in our research code where the iterative nature of research means we often have to re-run financial simulations or models multiple times with slight tweaks. Dynamicism. While speed of execution is important, we also require a dynamic language that allows us to test out new ideas and prototype rapidly. Julia ticks the box for this requirement as well by using a just-in-time compiler that accommodates both interactive and non-interactive workflows well. Julia also has a very rich type system where researchers can build prototypes without type declarations, and then later refactoring the code where needed with type declarations for dispatch or clarity. In either case, Julia is usually able to generate performant compiled code that we can run in production. Relevant ecosystem. While the nascency of Julia as a language means that the community and ecosystem is much smaller than those of other languages, we found that the code and community oversamples on the type of libraries that we care about. Julia has excellent support for technical computing and mathematical modelling. Given these reasons, Julia is the perfect language to serve as a solution to the “two-language problem”. This concept is oft-quoted in Julian circles and is perfectly exemplified by the previous workflow of our team: Investing Subject Matter Experts (SMEs) write domain-specific code that’s solely meant to serve as research code, and that code then has to be translated into some more performant language for use in production. Julia solves this issue by making it very simple to take a piece of research code and refactor it for production use. Our approach We decided to build our Julia codebase inside a monorepo, with separate packages for each conceptual project we might work on, such as interest rate models, projections, social security amount calculations and so on. This works well from a development perspective, but we soon faced the question of how best to integrate this code with our production code, which is mostly developed in Ruby. We identified two viable alternatives: Build a thin web service that will accept HTTP requests, call the underlying Julia functions, and then return a HTTP response. Compile the Julia code into a shared library, and call it directly from Ruby using FFI. Option 1 is a very common pattern, and actually quite similar to what had been the status quo at Betterment, as most of the projections and recommendation code existed in a JavaScript service. It may be surprising then to learn that we actually went with Option 2. We were deeply attracted to the idea of being able to fully integration-test our projections and recommendations working within our actual app (i.e. without the complication of a service boundary). Additionally, we wanted an integration that we could spin-up quickly and with low ongoing cost; there’s some fixed cost to getting a FFI-embed working right—but once you do, it’s an exceedingly low cost integration to maintain. Fully-fledged services require infrastructure to run and are (ideally) supported by a full team of engineers. That said, we recognize the attractive properties of the more well-trodden Option 1 path and believe it could be the right solution in a lot of scenarios (and may become the right solution for us as our usage of Julia continues to evolve). Implementation Given how new Julia is, there was minimal literature on true interoperability with other programming languages (particularly high-level languages–Ruby, Python, etc). But we saw that the right building blocks existed to do what we wanted and proceeded with the confidence that it was theoretically possible. As mentioned earlier, Julia is a just-in-time compiled language, but it’s possible to compile Julia code ahead-of-time using PackageCompiler.jl. We built an additional package into our monorepo whose sole purpose was to expose an API for our Ruby application, as well as compile that exposed code into a C shared library. The code in this package is the glue between our pure Julia functions and the lower level library interface—it’s responsible for defining the functions that will be exported by the shared library and doing any necessary conversions on input/output. As an example, consider the following simple Julia function which sorts an array of numbers using the insertion sort algorithm: In order to be able to expose this in a shared library, we would wrap it like this: Here we’ve simplified memory management by requiring the caller to allocate memory for the result, and implemented primitive exception handling (see Challenges & Pitfalls below). On the Ruby end, we built a gem which wraps our Julia library and attaches to it using Ruby-FFI. The gem includes a tiny Julia project with the API library as it’s only dependency. Upon gem installation, we fetch the Julia source and compile it as a native extension. Attaching to our example function with Ruby-FFI is straightforward: From here, we could begin using our function, but it wouldn’t be entirely pleasant to work with–converting an input array to a pointer and processing the result would require some tedious boilerplate. Luckily, we can use Ruby’s powerful metaprogramming abilities to abstract all that away–creating a declarative way to wrap an arbitrary Julia function which results in a familiar and easy-to-use interface for Ruby developers. In practice, that might look something like this: Resulting in a function for which the fact that the underlying implementation is in Julia has been completely abstracted away: Challenges & Pitfalls Debugging an FFI integration can be challenging; any misconfiguration is likely to result in the dreaded segmentation fault–the cause of which can be difficult to hunt down. Here are a few notes for practitioners about some nuanced issues we ran into, that will hopefully save you some headaches down the line: The Julia runtime has to be initialized before calling the shared library. When loading the dynamic library (whether through Ruby-FFI or some other invocation of `dlopen`), make sure to pass the flags `RTLD_LAZY` and `RTLD_GLOBAL` (`ffi_lib_flags :lazy, :global` in Ruby-FFI). If embedding your Julia library into a multi-threaded application, you’ll need additional tooling to only initialize and make calls into the Julia library from a single thread, as multiple calls to `jl_init` will error. We use a multi-threaded web server for our production application, and so when we make a call into the Julia shared library, we push that call onto a queue where it gets picked up and performed by a single executor thread which then communicates the result back to the calling thread using a promise object. Memory management–if you’ll be passing anything other than primitive types back from Julia to Ruby (e.g. pointers to more complex objects), you’ll need to take care to ensure the memory containing the data you’re passing back isn’t cleared by the Julia garbage collector prior to being read on the Ruby side. Different approaches are possible. Perhaps the simplest is to have the Ruby side allocate the memory into which the Julia function should write it’s result (and pass the Julia function a pointer to that memory). Alternatively, if you want to actually pass complex objects out, you’ll have to ensure Julia holds a reference to the objects beyond the life of the function, in order to keep them from being garbage collected. And then you’ll probably want to expose a way for Ruby to instruct Julia to clean up that reference (i.e. free the memory) when it’s done with it (Ruby-FFI has good support for triggering a callback when an object goes out-of-scope on the Ruby side). Exception handling–conveying unhandled exceptions across the FFI boundary is generally not possible. This means any unhandled exception occurring in your Julia code will result in a segmentation fault. To avoid this, you’ll probably want to implement catch-all exception handling in your shared library exposed functions that will catch any exceptions that occur and return some context about the error to the caller (minimally, a boolean indicator of success/failure). Tooling To simplify development, we use a lot of tooling and infrastructure developed both in-house and by the Julia community. Since one of the draws of using Julia in the first place is the performance of the code, we make sure to benchmark our code during every pull request for potential performance regressions using the BenchmarkTools.jl package. To facilitate versioning and sharing of our Julia packages internally (e.g. to share a version of the Ruby-API package with the Ruby gem which wraps it) we also maintain a private package registry. The registry is a separate Github repository, and we use tooling from the Registrator.jl package to register new versions. To process registration events, we maintain a registry server on an EC2 instance provisioned through Terraform, so updates to the configuration are as easy as running a single `terraform apply` command. Once a new registration event is received, the registry server opens a pull request to the Julia registry. There, we have built in automated testing that resolves the version of the package that is being tested, looks up any reverse dependencies of that package, resolves the compatibility bounds of those packages to see if the newly registered version could lead to a breaking change, and if so, runs the full test suites of the reverse dependencies. By doing this, we can ensure that when we release a patch or minor version of one of our packages, we can ensure that it won’t break any packages that depend on it at registration time. If it would, the user is instead forced to either fix the changes that lead to a downstream breakage, or to modify the registration to be a major version increase. Takeaways Though our venture into the Julia world is still relatively young compared to most of the other code at Betterment, we have found Julia to be a perfect fit in solving our two-language problem within the Investing team. Getting the infrastructure into a production-ready format took a bit of tweaking, but we are now starting to realize a lot of the benefits we hoped for when setting out on this journey, including faster development of production ready models, and a clear separation of responsibilities between the SMEs on the Investing team who are best suited for designing and specifying the models, and the engineering team who have the knowledge on how to scale that code into a production-grade library. The switch to Julia has allowed us not only to optimize and speed up our code by multiple orders of magnitude, but also has given us the environment and ecosystem to explore ideas that would simply not be possible in our previous implementations. 11 min read * INTRODUCING “DELAYED”: RESILIENT BACKGROUND JOBS ON RAILS Introducing “Delayed”: Resilient Background Jobs on Rails In the past 24 hours, a Ruby on Rails application at Betterment performed somewhere on the order of 10 million asynchronous tasks. While many of these tasks merely sent a transactional email, or fired off an iOS or Android push notification, plenty involved the actual movement of money—deposits, withdrawals, transfers, rollovers, you name it—while others kept Betterment’s information systems up-to-date—syncing customers’ linked account information, logging events to downstream data consumers, the list goes on. What all of these tasks had in common (aside from being, well, really important to our business) is that they were executed via a database-backed job-execution framework called Delayed, a newly-open-sourced library that we’re excited to announce… right now, as part of this blog post! And, yes, you heard that right. We run millions of these so-called “background jobs” daily using a SQL-backed queue—not Redis, or RabbitMQ, or Kafka, or, um, you get the point—and we’ve very intentionally made this choice, for reasons that will soon be explained! But first, let’s back up a little and answer a few basic questions. Why Background Jobs? In other words, what purpose do these background jobs serve? And how does running millions of them per day help us? Well, when building web applications, we (as web application developers) strive to build pages that respond quickly and reliably to web requests. One might say that this is the primary goal of any webapp—to provide a set of HTTP endpoints that reliably handle all the success and failure cases within a specified amount of time, and that don’t topple over under high-traffic conditions. This is made possible, at least in part, by the ability to perform units of work asynchronously. In our case, via background jobs. At Betterment, we rely on said jobs extensively, to limit the amount of work performed during the “critical path” of each web request, and also to perform scheduled tasks at regular intervals. Our reliance on background jobs even allows us to guarantee the eventual consistency of our distributed systems, but more on that later. First, let’s take a look at the underlying framework we use for enqueuing and executing said jobs. Frameworks Galore! And, boy howdy, are there plenty of available frameworks for doing this kind of thing! Ruby on Rails developers have the choice of resque, sidekiq, que, good_job, delayed_job, and now... delayed, Betterment’s own flavor of job queue! Thankfully, Rails provides an abstraction layer on top of these, in the form of the Active Job framework. This, in theory, means that all jobs can be written in more or less the same way, regardless of the job-execution backend. Write some jobs, pick a queue backend with a few desirable features (priorities, queues, etc), run some job worker processes, and we’re off to the races! Sounds simple enough! Unfortunately, if it were so simple we wouldn’t be here, several paragraphs into a blog post on the topic. In practice, deciding on a job queue is more complicated than that. Quite a bit more complicated, because each backend framework provides its own set of trade-offs and guarantees, many of which will have far-reaching implications in our codebase. So we’ll need to consider carefully! How To Choose A Job Framework The delayed rubygem is a fork of both delayed_job and delayed_job activerecord, with several targeted changes and additions, including numerous performance & scalability optimizations that we’ll cover towards the end of this post. But first, in order to explain how Betterment arrived where we did, we must explain what it is that we need our job queue to be capable of, starting with the jobs themselves. You see, a background job essentially represents a tiny contract. Each consists of some action being taken for / by / on behalf of / in the interest of one or more of our customers, and that must be completed within an appropriate amount of time. Betterment’s engineers decided, therefore, that it was critical to our mission that we be capable of handling each and every contract as reliably as possible. In other words, every job we attempt to enqueue must, eventually, reach some form of resolution. Of course, job “resolution” doesn’t necessarily mean success. Plenty of jobs may complete in failure, or simply fail to complete, and may require some form of automated or manual intervention. But the point is that jobs are never simply dropped, or silently deleted, or lost to the cyber-aether, at any point, from the moment we enqueue them to their eventual resolution. This general property—the ability to enqueue jobs safely and ensure their eventual resolution—is the core feature that we have optimized for. Let’s call it resilience. Optimizing For Resilience Now, you might be thinking, shouldn’t all of these ActiveJob backends be, at the very least, safe to use? Isn’t “resilience” a basic feature of every backend, except maybe the test/development ones? And, yeah, it’s a fair question. As the author of this post, my tactful attempt at an answer is that, well, not all queue backends optimize for the specific kind of end-to-end resilience that we look for. Namely, the guarantee of at-least-once execution. Granted, having “exactly-once” semantics would be preferable, but if we cannot be sure that our jobs run at least once, then we must ask ourselves: how would we know if something didn’t run at all? What kind of monitoring would be necessary to detect such a failure, across all the features of our app, and all the types of jobs it might try to run? These questions open up an entirely different can of worms, one that we would prefer remained firmly sealed. Remember, jobs are contracts. A web request was made, code was executed, and by enqueuing a job, we said we'd eventually do something. Not doing it would be... bad. Not even knowing we didn't do it... very bad. So, at the very least, we need the guarantee of at-least-once execution. Building on at-least-once guarantees If we know for sure that we’ll fully execute all jobs at least once, then we can write our jobs in such a way that makes the at-least-once approach reliable and resilient to failure. Specifically, we’ll want to make our jobs idempotent—basically, safely retryable, or resumable—and that is on us as application developers to ensure on a case-by-case basis. Once we solve this very solvable idempotency problem, then we’re on track for the same net result as an “exactly-once” approach, even if it takes a couple extra attempts to get there. Furthermore, this combination of at-least-once execution and idempotency can then be used in a distributed systems context, to ensure the eventual consistency of changes across multiple apps and databases. Whenever a change occurs in one system, we can enqueue idempotent jobs notifying the other systems, and retry them until they succeed, or until we are left with stuck jobs that must be addressed operationally. We still concern ourselves with other distributed systems pitfalls like event ordering, but we don’t have to worry about messages or events disappearing without a trace due to infrastructure blips. So, suffice it to say, at-least-once semantics are crucial in more ways than one, and not all ActiveJob backends provide them. Redis-based queues, for example, can only be as durable (the “D” in “ACID”) as the underlying datastore, and most Redis deployments intentionally trade-off some durability for speed and availability. Plus, even when running in the most durable mode, Redis-based ActiveJob backends tend to dequeue jobs before they are executed, meaning that if a worker process crashes at the wrong moment, or is terminated during a code deployment, the job is lost. These frameworks have recently begun to move away from this LPOP-based approach, in favor of using RPOPLPUSH (to atomically move jobs to a queue that can then be monitored for orphaned jobs), but outside of Sidekiq Pro, this strategy doesn’t yet seem to be broadly available. And these job execution guarantees aren’t the only area where a background queue might fail to be resilient. Another big resilience failure happens far earlier, during the enqueue step. Enqueues and Transactions See, there’s a major “gotcha” that may not be obvious from the list of ActiveJob backends. Specifically, it’s that some queues rely on an app’s primary database connection—they are “database-backed,” against the app’s own database—whereas others rely on a separate datastore, like Redis. And therein lies the rub, because whether or not our job queue is colocated with our application data will greatly inform the way that we write any job-adjacent code. More precisely, when we make use of database transactions (which, when we use ActiveRecord, we assuredly do whether we realize it or not), a database-backed queue will ensure that enqueued jobs will either commit or roll back with the rest of our ActiveRecord-based changes. This is extremely convenient, to say the least, since most jobs are enqueued as part of operations that persist other changes to our database, and we can in turn rely on the all-or-nothing nature of transactions to ensure that neither the job nor the data mutation is persisted without the other. Meanwhile, if our queue existed in a separate datastore, our enqueues will be completely unaware of the transaction, and we’d run the risk of enqueuing a job that acts on data that was never committed, or (even worse) we’d fail to enqueue a job even when the rest of the transactional data was committed. This would fundamentally undermine our at-least-once execution guarantees! We already use ACID-compliant datastores to solve these precise kinds of data persistence issues, so with the exception of really, really high volume operations (where a lot of noise and data loss can—or must—be tolerated), there’s really no reason not to enqueue jobs co-transactionally with other data changes. And this is precisely why, at Betterment, we start each application off with a database-backed queue, co-located with the rest of the app’s data, with the guarantee of at-least-once job execution. By the way, this is a topic I could talk about endlessly, so I’ll leave it there for now. If you’re interested in hearing me say even more about resilient data persistence and job execution, feel free to check out Can I break this?, a talk I gave at RailsConf 2021! But in addition to the resiliency guarantees outlined above, we’ve also given a lot of attention to the operability and the scalability of our queue. Let’s cover operability first. Maintaining a Queue in the Long Run Operating a queue means being able to respond to errors and recover from failures, and also being generally able to tell when things are falling behind. (Essentially, it means keeping our on-call engineers happy.) We do this in two ways: with dashboards, and with alerts. Our dashboards come in a few parts. Firstly, we host a private fork of delayedjobweb, a web UI that allows us to see the state of our queues in real time and drill down to specific jobs. We’ve extended the gem with information on “erroring” jobs (jobs that are in the process of retrying but have not yet permanently failed), as well as the ability to filter by additional fields such as job name, priority, and the owning team (which we store in an additional column). We also maintain two other dashboards in our cloud monitoring service, DataDog. These are powered by instrumentation and continuous monitoring features that we have added directly to the delayed gem itself. When jobs run, they emit ActiveSupport::Notification events that we subscribe to and then forward along to a StatsD emitter, typically as “distribution” or “increment” metrics. Additionally, we’ve included a continuous monitoring process that runs aggregate queries, tagged and grouped by queue and priority, and that emits similar notifications that become “gauge” metrics. Once all of these metrics make it to DataDog, we’re able to display a comprehensive timeboard that graphs things like average job runtime, throughput, time spent waiting in the queue, error rates, pickup query performance, and even some top 10 lists of slowest and most erroring jobs. On the alerting side, we have DataDog monitors in place for overall queue statistics, like max age SLA violations, so that we can alert and page ourselves when queues aren’t working off jobs quickly enough. Our SLAs are actually defined on a per-priority basis, and we’ve added a feature to the delayed gem called “named priorities” that allows us to define priority-specific configs. These represent integer ranges (entirely orthogonal to queues), and default to “interactive” (0-9), “user visible” (10-19), “eventual” (20-29), and “reporting” (30+), with default alerting thresholds focused on retry attempts and runtime. There are plenty of other features that we’ve built that haven’t made it into the delayed gem quite yet. These include the ability for apps to share a job queue but run separate workers (i.e. multi-tenancy), team-level job ownership annotations, resumable bulk orchestration and batch enqueuing of millions of jobs at once, forward-scheduled job throttling, and also the ability to encrypt the inputs to jobs so that they aren’t visible in plaintext in the database. Any of these might be the topic for a future post, and might someday make their way upstream into a public release! But Does It Scale? As we've grown, we've had to push at the limits of what a database-backed queue can accomplish. We’ve baked several improvements into the delayed gem, including a highly optimized, SKIP LOCKED-based pickup query, multithreaded workers, and a novel “max percent of max age” metric that we use to automatically scale our worker pool up to ~3x its baseline size when queues need additional concurrency. Eventually, we could explore ways of feeding jobs through to higher performance queues downstream, far away from the database-backed workers. We already do something like this for some jobs with our journaled gem, which uses AWS Kinesis to funnel event payloads out to our data warehouse (while at the same time benefiting from the same at-least-once delivery guarantees as our other jobs!). Perhaps we’d want to generalize the approach even further. But the reality of even a fully "scaled up" queue solution is that, if it is doing anything particularly interesting, it is likely to be database-bound. A Redis-based queue will still introduce DB pressure if its jobs execute anything involving ActiveRecord models, and solutions must exist to throttle or rate limit these jobs. So even if your queue lives in an entirely separate datastore, it can be effectively coupled to your DB's IOPS and CPU limitations. So does the delayed approach scale? To answer that question, I’ll leave you with one last takeaway. A nice property that we’ve observed at Betterment, and that might apply to you as well, is that the number of jobs tends to scale proportionally with the number of customers and accounts. This means that when we naturally hit vertical scaling limits, we could, for example, shard or partition our job table alongside our users table. Then, instead of operating one giant queue, we’ll have broken things down to a number of smaller queues, each with their own worker pools, emitting metrics that can be aggregated with almost the same observability story we have today. But we’re getting into pretty uncharted territory here, and, as always, your mileage may vary! Try it out! If you’ve read this far, we’d encourage you to take the leap and test out the delayed gem for yourself! Again, it combines both DelayedJob and its ActiveRecord backend, and should be more or less compatible with Rails apps that already use ActiveJob or DelayedJob. Of course, it may require a bit of tuning on your part, and we’d love to hear how it goes! We’ve also built an equivalent library in Java, which may also see a public release at some point. (To any Java devs reading this: let us know if that interests you!) Already tried it out? Any features you’d like to see added? Let us know what you think! 14 min read * FOCUSING ON WHAT MATTERS: USING SLOS TO PURSUE USER HAPPINESS Focusing on What Matters: Using SLOs to Pursue User Happiness Proper reliability is the greatest operational requirement for any service. If the service doesn’t work as intended, no user (or engineer) will be happy. This is where SLOs come in. The umbrella term “observability” covers all manner of subjects, from basic telemetry to logging, to making claims about longer-term performance in the shape of service level objectives (SLOs) and occasionally service level agreements (SLAs). Here I’d like to discuss some philosophical approaches to defining SLOs, explain how they help with prioritization, and outline the tooling currently available to Betterment Engineers to make this process a little easier. What is an SLO? At a high level, a service level objective is a way of measuring the performance of, correctness of, validity of, or efficacy of some component of a service over time by comparing the functionality of specific service level indicators (metrics of some kind) against a target goal. For example, 99.9% of requests complete with a 2xx, 3xx or 4xx HTTP code within 2000ms over a 30 day period The service level indicator (SLI) in this example is a request completing with a status code of 2xx, 3xx or 4xx and with a response time of at most 2000ms. The SLO is the target percentage, 99.9%. We reach our SLO goal if, during a 30 day period, 99.9% of all requests completed with one of those status codes and within that range of latency. If our service didn’t succeed at that goal, the violation overflow — called an “error budget” — shows us by how much we fell short. With a goal of 99.9%, we have 40 minutes and 19 seconds of downtime available to us every 28 days. Check out more error budget math here. If we fail to meet our goals, it’s worthwhile to step back and understand why. Was the error budget consumed by real failures? Did we notice a number of false positives? Maybe we need to reevaluate the metrics we’re collecting, or perhaps we’re okay with setting a lower target goal because there are other targets that will be more important to our customers. It’s all about the customer This is where the philosophy of defining and keeping track of SLOs comes into play. It starts with our users - Betterment users - and trying to provide them with a certain quality of service. Any error budget we set should account for our fiduciary responsibilities, and should guarantee that we do not cause an irresponsible impact to our customers. We also assume that there is a baseline degree of software quality baked-in, so error budgets should help us prioritize positive impact opportunities that go beyond these baselines. Sometimes there are a few layers of indirection between a service and a Betterment customer, and it takes a bit of creativity to understand what aspects of the service directly affects them. For example, an engineer on a backend or data-engineering team provides services that a user-facing component consumes indirectly. Or perhaps the users for a service are Betterment engineers, and it’s really unclear how that work affects the people who use our company’s products. It isn’t that much of a stretch to claim that an engineer’s level of happiness does have some effect on the level of service they’re capable of providing a Betterment customer! Let’s say we’ve defined some SLOs and notice they are falling behind over time. We might take a look at the metrics we’re using (the SLIs), the failures that chipped away at our target goal, and, if necessary, re-evaluate the relevancy of what we’re measuring. Do error rates for this particular endpoint directly reflect an experience of a user in some way - be it a customer, a customer-facing API, or a Betterment engineer? Have we violated our error budget every month for the past three months? Has there been an increase in Customer Service requests to resolve problems related to this specific aspect of our service? Perhaps it is time to dedicate a sprint or two to understanding what’s causing degradation of service. Or perhaps we notice that what we’re measuring is becoming increasingly irrelevant to a customer experience, and we can get rid of the SLO entirely! Benefits of measuring the right things, and staying on target The goal of an SLO based approach to engineering is to provide data points with which to have a reasonable conversation about priorities (a point that Alex Hidalgo drives home in his book Implementing Service Level Objectives). In the case of services not performing well over time, the conversation might be “focus on improving reliability for service XYZ.” But what happens if our users are super happy, our SLOs are exceptionally well-defined and well-achieved, and we’re ahead of our roadmap? Do we try to get that extra 9 in our target - or do we use the time to take some creative risks with the product (feature-flagged, of course)? Sometimes it’s not in our best interest to be too focused on performance, and we can instead “use up our error budget” by rolling out some new A/B test, or upgrading a library we’ve been putting off for a while, or testing out a new language in a user-facing component that we might not otherwise have had the chance to explore. The tools to get us there Let’s dive into some tooling that the SRE team at Betterment has built to help Betterment engineers easily start to measure things. Collecting the SLIs and Creating the SLOs The SRE team has a web-app and CLI called coach that we use to manage continuous integration (CI) and continuous delivery (CD), among other things. We’ve talked about Coach in the past here and here. At a high level, the Coach CLI generates a lot of yaml files that are used in all sorts of places to help manage operational complexity and cloud resources for consumer-facing web-apps. In the case of service level indicators (basically metrics collection), the Coach CLI provides commands that generate yaml files to be stored in GitHub alongside application code. At deploy time, the Coach web-app consumes these files and idempotently create Datadog monitors, which can be used as SLIs (service level indicators) to inform SLOs, or as standalone alerts that need immediate triage every time they're triggered. In addition to Coach explicitly providing a config-driven interface for monitors, we’ve also written a couple handy runtime specific methods that result in automatic instrumentation for Rails or Java endpoints. I’ll discuss these more below. We also manage a separate repository for SLO definitions. We left this outside of application code so that teams can modify SLO target goals and details without having to redeploy the application itself. It also made visibility easier in terms of sharing and communicating different team’s SLO definitions across the org. Monitors in code Engineers can choose either StatsD or Micrometer to measure complicated experiences with custom metrics, and there’s various approaches to turning those metrics directly into monitors within Datadog. We use Coach CLI driven yaml files to support metric or APM monitor types directly in the code base. Those are stored in a file named .coach/datadog_monitors.yml and look like this: monitors: - type: metric metric: "coach.ci_notification_sent.completed.95percentile" name: "coach.ci_notification_sent.completed.95percentile SLO" aggregate: max owner: sre alert_time_aggr: on_average alert_period: last_5m alert_comparison: above alert_threshold: 5500 - type: apm name: "Pull Requests API endpoint violating SLO" resource_name: api::v1::pullrequestscontroller_show max_response_time: 900ms service_name: coach page: false slack: false It wasn’t simple to make this abstraction intuitive between a Datadog monitor configuration and a user interface. But this kind of explicit, attribute-heavy approach helped us get this tooling off the ground while we developed (and continue to develop) in-code annotation approaches. The APM monitor type was simple enough to turn into both a Java annotation and a tiny domain specific language (DSL) for Rails controllers, giving us nice symmetry across our platforms. . This owner method for Rails apps results in all logs, error reports, and metrics being tagged with the team’s name, and at deploy time it's aggregated by a Coach CLI command and turned into latency monitors with reasonable defaults for optional parameters; essentially doing the same thing as our config-driven approach but from within the code itself class DeploysController < ApplicationController owner "sre", max_response_time: "10000ms", only: [:index], slack: false end For Java apps we have a similar interface (with reasonable defaults as well) in a tidy little annotation. @Sla @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.METHOD) public @interface Sla { @AliasFor(annotation = Sla.class) long amount() default 25_000; @AliasFor(annotation = Sla.class) ChronoUnit unit() default ChronoUnit.MILLIS; @AliasFor(annotation = Sla.class) String service() default "custody-web"; @AliasFor(annotation = Sla.class) String slackChannelName() default "java-team-alerts"; @AliasFor(annotation = Sla.class) boolean shouldPage() default false; @AliasFor(annotation = Sla.class) String owner() default "java-team"; } Then usage is just as simple as adding the annotation to the controller: @WebController("/api/stuff/v1/service_we_care_about") public class ServiceWeCareAboutController { @PostMapping("/search") @CustodySla(amount = 500) public SearchResponse search(@RequestBody @Valid SearchRequest request) {...} } At deploy time, these annotations are scanned and converted into monitors along with the config-driven definitions, just like our Ruby implementation. SLOs in code Now that we have our metrics flowing, our engineers can define SLOs. If an engineer has a monitor tied to metrics or APM, then they just need to plug in the monitor ID directly into our SLO yaml interface. - last_updated_date: "2021-02-18" approval_date: "2021-03-02" next_revisit_date: "2021-03-15" category: latency type: monitor description: This SLO covers latency for our CI notifications system - whether it's the github context updates on your PRs or the slack notifications you receive. tags: - team:sre thresholds: - target: 99.5 timeframe: 30d warning_target: 99.99 monitor_ids: - 30842606 The interface supports metrics directly as well (mirroring Datadog’s SLO types) so an engineer can reference any metric directly in their SLO definition, as seen here: # availability - last_updated_date: "2021-02-16" approval_date: "2021-03-02" next_revisit_date: "2021-03-15" category: availability tags: - team:sre thresholds: - target: 99.9 timeframe: 30d warning_target: 99.99 type: metric description: 99.9% of manual deploys will complete successfully over a 30day period. query: # (total_events - bad_events) over total_events == good_events/total_events numerator: sum:trace.rack.request.hits{service:coach,env:production,resource_name:deployscontroller_create}.as_count()-sum:trace.rack.request.errors{service:coach,env:production,resource_name:deployscontroller_create}.as_count() denominator: sum:trace.rack.request.hits{service:coach,resource_name:deployscontroller_create}.as_count() We love having these SLOs defined in GitHub because we can track who's changing them, how they're changing, and get review from peers. It's not quite the interactive experience of the Datadog UI, but it's fairly straightforward to fiddle in the UI and then extract the resulting configuration and add it to our config file. Notifications When we merge our SLO templates into this repository, Coach will manage creating SLO resources in Datadog and accompanying SLO alerts (that ping slack channels of our choice) if and when our SLOs violate their target goals. This is the slightly nicer part of SLOs versus simple monitors - we aren’t going to be pinged for every latency failure or error rate spike. We’ll only be notified if, over 7 days or 30 days or even longer, they exceed the target goal we’ve defined for our service. We can also set a “warning threshold” if we want to be notified earlier when we’re using up our error budget. Fewer alerts means the alerts should be something to take note of, and possibly take action on. This is a great way to get a good signal while reducing unnecessary noise. If, for example, our user research says we should aim for 99.5% uptime, that’s 3h 21m 36s of downtime available per 28 days. That’s a lot of time we can reasonably not react to failures. If we aren’t alerting on those 3 hours of errors, and instead just once if we exceed that limit, then we can direct our attention toward new product features, platform improvements, or learning and development. The last part of defining our SLOs is including a date when we plan to revisit that SLO specification. Coach will send us a message when that date rolls around to encourage us to take a deeper look at our measurements and possibly reevaluate our goals around measuring this part of our service. What if SLOs don’t make sense yet? It’s definitely the case that a team might not be at the level of operational maturity where defining product or user-specific service level objectives is in the cards. Maybe their on-call is really busy, maybe there are a lot of manual interventions needed to keep their services running, maybe they’re still putting out fires and building out their team’s systems. Whatever the case may be, this shouldn’t deter them from collecting data. They can define what is called an “aspirational” SLO - basically an SLO for an important component in their system - to start collecting data over time. They don’t need to define an error budget policy, and they don’t need to take action when they fail their aspirational SLO. Just keep an eye on it. Another option is to start tracking the level of operational complexity for their systems. Perhaps they can set goals around "Bug Tracker Inbox Zero" or "Failed Background Jobs Zero" within a certain time frame, a week or a month for example. Or they can define some SLOs around types of on-call tasks that their team tackles each week. These aren’t necessarily true-to-form SLOs but engineers can use this framework and tooling provided to collect data around how their systems are operating and have conversations on prioritization based on what they discover, beginning to build a culture of observability and accountability Conclusion Betterment is at a point in its growth where prioritization has become more difficult and more important. Our systems are generally stable, and feature development is paramount to business success. But so is reliability and performance. Proper reliability is the greatest operational requirement for any service2. If the service doesn’t work as intended, no user (or engineer) will be happy. This is where SLOs come in. SLOs should align with business objectives and needs, which will help Product and Engineering Managers understand the direct business impact of engineering efforts. SLOs will ensure that we have a solid understanding of the state of our services in terms of reliability, and they empower us to focus on user happiness. If our SLOs don’t align directly with business objectives and needs, they should align indirectly via tracking operational complexity and maturity. So, how do we choose where to spend our time? SLOs (service level objectives) - including managing their error budgets - will permit us - our product engineering teams - to have the right conversations and make the right decisions about prioritization and resourcing so that we can balance our efforts spent on reliability and new product features, helping to ensure the long term happiness and confidence of our users (and engineers). 2 Alex Hidalgo, Implementing Service Level Objectives 13 min read * FINDING AND PREVENTING RAILS AUTHORIZATION BUGS Finding and Preventing Rails Authorization Bugs This article walks through finding and fixing common Rails authorization bugs. At Betterment, we build public facing applications without an authorization framework by following three principles, discussed in another blog post. Those three principles are: Authorization through Impossibility Authorization through Navigability Authorization through Application Boundaries This post will explore the first two principles and provide examples of common patterns that can lead to vulnerabilities as well as guidance for how to fix them. We will also cover the custom tools we’ve built to help avoid these patterns before they can lead to vulnerabilities. If you’d like, you can skip ahead to the tools before continuing on to the rest of this post. Authorization through Impossibility This principle might feel intuitive, but it’s worth reiterating that at Betterment we never build endpoints that allow users to access another user’s data. There is no /api/socialsecuritynumbers endpoint because it is a prime target for third-party abuse and developer error. Similarly, even our authorized endpoints never allow one user to peer into another user’s object graph. This principle keeps us from ever having the opportunity to make some of the mistakes addressed in our next section. We acknowledge that many applications out there can’t make the same design decisions about users’ data, but as a general principle we recommend reducing the ways in which that data can be accessed. If an application absolutely needs to be able to show certain data, consider structuring the endpoint in a way such that a client can’t even attempt to request another user’s data. Authorization through Navigability Rule #1: Authorization should happen in the controller and should emerge naturally from table relationships originating from the authenticated user, i.e. the “trust root chain”. This rule is applicable for all controller actions and is a critical component of our security story. If you remember nothing else, remember this. What is a “trust root chain”? It’s a term we’ve co-opted from ssl certificate lingo, and it’s meant to imply a chain of ownership from the authenticated user to a target resource. We can enforce access rules by using the affordances of our relational data without the need for any additional “permission” framework. Note that association does not imply authorization, and the onus is on the developer to ensure that associations are used properly. Consider the following controller: So long as a user is authenticated, they can perform the show action on any document (including documents belonging to others!) provided they know or can guess its ID - not great! This becomes even more dangerous if the Documents table uses sequential ids, as that would make it easy for an attacker to start combing through the entire table. This is why Betterment has a rule requiring UUIDs for all new tables. This type of bug is typically referred to as an Insecure Direct Object Reference vulnerability. In short, these bugs allow attackers to access data directly using its unique identifiers – even if that data belongs to someone else – because the application fails to take authorization into account. We can use our database relationships to ensure that users can only see their own documents. Assuming a User has many Documents then we would change our controller to the following: Now any document_id that doesn’t exist in the user’s object graph will raise a 404 and we’ve provided authorization for this endpoint without a framework - easy peezy. Rule #2: Controllers should pass ActiveRecord models, rather than ids, into the model layer. As a corollary to Rule #1, we should ensure that all authorization happens in the controller by disallowing model initialization with *_id attributes. This rule speaks to the broader goal of authorization being obvious in our code. We want to minimize the hops and jumps required to figure out what we’re granting access to, so we make sure that it all happens in the controller. Consider a controller that links attachments to a given document. Let’s assume that a User has many Attachments that can be attached to a Document they own. Take a minute and review this controller - what jumps out to you? At first glance, it looks like the developer has taken the right steps to adhere to Rule #1 via the document method and we’re using strong params, is that enough? Unfortunately, it’s not. There’s actually a critical security bug here that allows the client to specify any attachment_id, even if they don’t own that attachment - eek! Here’s simple way to resolve our bug: Now before we create a new AttachmentLink, we verify that the attachment_id specified actually belongs to the user and our code will raise a 404 otherwise - perfect! By keeping the authorization up front in the controller and out of the model, we’ve made it easier to reason about. If we buried the authorization within the model, it would be difficult to ensure that the trust-root chain is being enforced – especially if the model is used by multiple controllers that handle authorization inconsistently. Reading the AttachmentLink model code, it would be clear that it takes an attachment_id but whether authorization has been handled or not would remain a bit of a mystery. Automatically Detecting Vulnerabilities At Betterment, we strive to make it easy for engineers to do the right thing – especially when it comes to security practices. Given the formulaic patterns of these bugs, we decided static analysis would be a worthwhile endeavor. Static analysis can help not only with finding existing instances of these vulnerabilities, but also prevent new ones from being introduced. By automating detection of these “low hanging fruit” vulnerabilities, we can free up engineering effort during security reviews and focus on more interesting and complex issues. We decided to lean on RuboCop for this work. As a Rails shop, we already make heavy use of RuboCop. We like it because it’s easy to introduce to a codebase, violations break builds in clear and actionable ways, and disabling specific checks requires engineers to comment their code in a way that makes it easy to surface during code review. Keeping rules #1 and #2 in mind, we’ve created two cops: Betterment/UnscopedFind and Betterment/AuthorizationInController; these will flag any models being retrieved and created in potentially unsafe ways, respectively. At a high level, these cops track user input (via params.permit et al.) and raise offenses if any of these values get passed into methods that could lead to a vulnerability (e.g. model initialization, find calls, etc). You can find these cops here. We’ve been using these cops for over a year now and have had a lot of success with them. In addition to these two, the Betterlint repository contains other custom cops we’ve written to enforce certain patterns -- both security related as well as more general ones. We use these cops in conjunction with the default RuboCop configurations for all of our Ruby projects. Let’s run the first cop, Betterment/UnscopedFind against DocumentsController from above: $ rubocop app/controllers/documents_controller.rb Inspecting 1 file C Offenses: app/controllers/documents_controller.rb:3:17: C: Betterment/UnscopedFind: Records are being retrieved directly using user input. Please query for the associated record in a way that enforces authorization (e.g. "trust-root chaining"). INSTEAD OF THIS: Post.find(params[:post_id]) DO THIS: currentuser.posts.find(params[:postid]) See here for more information on this error: https://github.com/Betterment/betterlint/blob/main/README.md#bettermentunscopedfind @document = Document.find(params[:document_id]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1 file inspected, 1 offense detected The cop successfully located the vulnerability. If we attempted to deploy this code, RuboCop would fail the build, preventing the code from going out while letting reviewers know exactly why. Now let’s try running Betterment/AuthorizationInController on the AttachmentLink example from earlier: $ rubocop app/controllers/documents/attachments_controller.rb Inspecting 1 file C Offenses: app/controllers/documents/attachments_controller.rb:3:24: C: Betterment/AuthorizationInController: Model created/updated using unsafe parameters. Please query for the associated record in a way that enforces authorization (e.g. "trust-root chaining"), and then pass the resulting object into your model instead of the unsafe parameter. INSTEAD OF THIS: postparameters = params.permit(:albumid, :caption) Post.new(post_parameters) DO THIS: album = currentuser.albums.find(params[:albumid]) post_parameters = params.permit(:caption).merge(album: album) Post.new(post_parameters) See here for more information on this error: https://github.com/Betterment/betterlint/blob/main/README.md#bettermentauthorizationincontroller AttachmentLink.new(create_params.merge(document: document)).save! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1 file inspected, 1 offense detected The model initialization was flagged because it was seen using create_params, which contains user input. Like with the other cop, this would fail the build and prevent the code from making it to production. You may have noticed that unlike the previous example, the vulnerable code doesn’t directly reference a params.permit call or any of the parameter names, but the code was still flagged. This is because both of the cops keep a little bit of state to ensure they have the appropriate context necessary when analyzing potentially unsafe function calls. We also made sure that when developing these cops that we tested them with real code samples and not just contrived scenarios that no developer would actually ever attempt. False Positives With any type of static analysis, there’s bound to be false positives. When working on these cops, we narrowed down false positives to two scenarios: The flagged code could be considered insecure only in other contexts: e.g. the application or models in question don’t have a concept of “private” data The flagged code isn’t actually insecure: e.g. the initialization happens to take a parameter whose name ends in _id but it doesn’t refer to a unique identifier for any objects In both these cases, the developer should feel empowered to either rewrite the line in question or locally disable the cop, both of which will prevent the code from being flagged. Normally we’d consider opting out of security analysis to be an unsafe thing to do, but we actually like the way RuboCop handles this because it can help reduce some code review effort; the first solution eliminates the vulnerable-looking pattern (even if it wasn’t a vulnerability to begin with) while the second one signals to reviewers that they should confirm this code is actually safe (making it easy to pinpoint areas of focus). Testing & Code Review Strategies Rubocop and Rails tooling can only get us so far in mitigating authorization bugs. The remainder falls on the shoulders of the developer and their peers to be cognizant of the choices they are making when shipping new application controllers. In light of that, we’ll cover some helpful strategies for keeping authorization front of mind. Testing When writing request specs for a controller action, write a negative test case to prove that attempts to circumvent your authorization measures return a 404. For example, consider a request spec for our Documents::AttachmentsController: These test cases are an inexpensive way to prove to yourself and your reviewers that you’ve considered the authorization context of your controller action and accounted for it properly. Like all of our tests, this functions both as regression prevention and as documentation of your intent. Code Review Our last line of defense is code review. Security is the responsibility of every engineer, and it’s critical that our reviewers keep authorization and security in mind when reviewing code. A few simple questions can facilitate effective security review of a PR that touches a controller action: Who is the authenticated user? What resource is the authenticated user operating on? Is the authenticated user authorized to operate on the resource in accordance with Rule #1? What parameters is the authenticated user submitting? Where are we authorizing the user’s access to those parameters? Do all associations navigated in the controller properly signify authorization? Getting in the habit of asking these questions during code review should lead to more frequent conversations about security and data access. Our hope is that linking out to this post and its associated Rules will reinforce a strong security posture in our application development. In Summary Unlike authentication, authorization is context specific and difficult to “abstract away” from the leaf nodes of application code. This means that application developers need to consider authorization with every controller we write or change. We’ve explored two new rules to encourage best practices when it comes to authorization in our application controllers: Authorization should happen in the controller and should emerge naturally from table relationships originating from the authenticated user, i.e. the “trust root chain”. Controllers should pass ActiveRecord models, rather than ids, into the model layer. We’ve also covered how our custom cops can help developers avoid antipatterns, resulting in safer and easier to read code. Keep these in mind when writing or reviewing application code that an authenticated user will utilize and remember that authorization should be clear and obvious. 11 min read * USING TARGETED UNIVERSALISM TO BUILD INCLUSIVE FEATURES Using Targeted Universalism To Build Inclusive Features The best products are inclusive at every stage of the design and engineering process. Here's how we turned a request for more inclusion into a feature all Betterment customers can benefit from. Earlier this year, a coworker asked me how difficult it would be to add a preferred name option into our product. They showed me how we were getting quite a few requests from trans customers to quit deadnaming them. The simplest questions tend to be the hardest to answer. For me, simple questions bring to mind this interesting concept called The Illusion Of Explanatory Depth, which is when “people feel they understand complex phenomena with far greater precision, coherence, and depth than they really do.” Simple questions tend to shed light on subjects shrouded in this illusion and force you to confront your lack of knowledge. Asking for someone’s name is simple, but full of assumptions. Deadnaming is when, intentionally or not, you refer to a trans person by the name they used before transitioning. For many trans folks like myself, this is the name assigned at birth which means all legal and government issued IDs and documents use this non-affirming name. According to Healthline, because legal name changes are “expensive, inaccessible, and not completely effective at eliminating deadnaming”, institutions like Betterment can and should make changes to support our trans customers. This simple question from our trans customers “Can you quit deadnaming me?” was a sign that our original understanding of our customers' names was not quite right, and we were lacking knowledge around how names are commonly used. Now, our work involved dispelling our previous understanding of what a name is. How to turn simple questions into solutions. At Betterment, we’re required by the government to have a record of a customer’s legal first name, but that shouldn’t prevent us from letting customers share their preferred or chosen first name, and then using that name in the appropriate places. This was a wonderful opportunity to practice targeted universalism: a concept that explains how building features specifically for a marginalized audience not only benefit the people in that marginalized group, but also people outside of it, which increases its broad impact. From a design standpoint, executing a preferred name feature was pretty straightforward—we needed to provide a user with a way to share their preferred name with us, and then start using it. The lead designer for this project, Crys, did a lovely job of incorporating compassionate design into how we show the user which legal name we have on file for them, without confronting that user with their deadname every time they go to change their settings. They accomplished that by hiding the user’s legal name in a dropdown accordion that is toggled closed by default. Crys also built out a delightful flow that shows the user why we require their legal name, that answers a few common questions, and allows them to edit their preferred first name in the future if needed. With a solid plan for gathering user input, we pivoted to the bigger question: Where should we use a customer’s preferred first name? From an engineering standpoint, this question revealed a few hurdles that we needed to clear up. First, I needed to provide a translation of my own understanding of legal first names and preferred first names to our codebase. The first step in this translation was to deprecate our not-very-descriptively named #firstname method and push engineers to start using two new, descriptive methods called #legalfirstname and #commonfirstname (#commonfirstname is essentially a defaulting method that falls back to #legalfirstname if #preferredfirst_name is not present for that user). To do this, I used a tool built by our own Betterment engineer, Nathan, called Uncruft, which not only gave engineers a warning whenever they tried to use the old #first_name method but also created a list of all the places in our code where we were currently using that old method. This was essentially a map for us engineers to be able to reference and go update those old usages in our codebase whenever we wanted. This new map leads us to our second task: addressing those deprecated usages. At first glance the places where we used #firstname in-app seemed minimal—emails, in-app greetings, tax documents. But once we looked under the surface, #firstname was sprinkled nearly everywhere in our codebase. I identified the most visible spots where we address a user and changed them, but for less visible changes I took this new map and delegated cross-squad ownership of each usage. Then, a group of engineers from each squad began tackling each deprecation one by one. In order to help these engineers, we provided guidelines around where it was necessary to use a legal first name, but in general we pushed to use a customer’s preferred first name wherever possible. From a high level view I essentially split this large engineering lift into two different streams of work. There was the feature work stream which involved: Storing the user’s new name information. Building out the user interface. Updating the most visible spots in our application. Modifying our integration with SimonData in order to bulk update our outgoing emails, and Changing how we share a user’s name with our customer service (CX) team through a Zendesk integration, as well as in our internal CX application. Then there was the foundational work stream, which involved mapping out and addressing every single depreciation. Thanks to Uncruft, once I generated that initial map of deprecations the large foundational work stream could then be further split into smaller brooks of work that could be tackled by different squads at different times. Enabling preferred first names moves us towards a more inclusive product. Once this feature went live, it was extremely rewarding to see our targeted universalism approach reveal its benefits. Our trans customers got the solution they needed, which makes this work crucial for that fact alone—but because of that, our cis customers also received a feature that delighted them. Ultimately, we now know that if people are given a tool to personalize their experience within our product, folks of many different backgrounds will use it. 6 min read * GUIDELINES FOR TESTING RAILS APPLICATIONS Guidelines for Testing Rails Applications Discusses the different responsibilities of model, request, and system specs, and other high level guidelines for writing specs using RSpec & Capybara. Testing our Rails applications allows us to build features more quickly and confidently by proving that code does what we think it should, catching regression bugs, and serving as documentation for our code. We write our tests, called “specs” (short for specification) with RSpec and Capybara. Though there are many types of specs, in our workflow we focus on only three: model specs, request specs, and system specs. This blog post discusses the different responsibilities of these types of specs, and other related high level guidelines for specs. Model Specs Model specs test business logic. This includes validations, instance and class method inputs and outputs, Active Record callbacks, and other model behaviors. They are very specific, testing a small portion of the system (the model under test), and cover a wide range of corner cases in that area. They should generally give you confidence that a particular model will do exactly what you intended it to do across a range of possible circumstances. Make sure that the bulk of the logic you’re testing in a model spec is in the method you’re exercising (unless the underlying methods are private). This leads to less test setup and fewer tests per model to establish confidence that the code is behaving as expected. Model specs have a live database connection, but we like to think of our model specs as unit tests. We lean towards testing with a bit of mocking and minimal touches to the database. We need to be economical about what we insert into the database (and how often) to avoid slowing down the test suite too much over time. Don’t persist a model unless you have to. For a basic example, you generally won’t need to save a record to the database to test a validation. Also, model factories shouldn’t by default save associated models that aren’t required for that model’s persistence. At the same time, requiring a lot of mocks is generally a sign that the method under test either is doing too many different things, or the model is too highly coupled to other models in the codebase. Heavy mocking can make tests harder to read, harder to maintain, and provide less assurance that code is working as expected. We try to avoid testing declarations directly in model specs - we’ll talk more about that in a future blog post on testing model behavior, not testing declarations. Below is a model spec skeleton with some common test cases: System Specs System specs are like integration tests. They test the beginning to end workflow of a particular feature, verifying that the different components of an application interact with each other as intended. There is no need to test corner cases or very specific business logic in system specs (those assertions belong in model specs). We find that there is a lot of value in structuring a system spec as an intuitively sensible user story - with realistic user motivations and behavior, sometimes including the user making mistakes, correcting them, and ultimately being successful. There is a focus on asserting that the end user sees what we expect them to see. System specs are more performance intensive than the other spec types, so in most cases we lean towards fewer system specs that do more things, going against the convention that tests should be very granular with one assertion per test. One system spec that asserts the happy path will be sufficient for most features. Besides the performance benefits, reading a single system spec from beginning to end ends up being good high-level documentation of how the software is used. In the end, we want to verify the plumbing of user input and business logic output through as few large specs per feature that we can get away with. If there is significant conditional behavior in the view layer and you are looking to make your system spec leaner, you may want to extract that conditional behavior to a presenter resource model and test that separately in a model spec so that you don’t need to worry about testing it in a system spec. We use SitePrism to abstract away bespoke page interactions and CSS selectors. It helps to make specs more readable and easier to fix if they break because of a UI or CSS change. We’ll dive more into system spec best practices in a future blog post. Below is an example system spec. Note that the error path and two common success paths are exercised in the same spec. Request Specs Request specs test the traditional responsibilities of the controller. These include authentication, view rendering, selecting an http response code, redirecting, and setting cookies. It’s also ok to assert that the database was changed in some way in a request spec, but like system specs, there is no need for detailed assertions around object state or business logic. When controllers are thin and models are tested heavily, there should be no need to duplicate business logic test cases from a model spec in a request spec. Request specs are not mandatory if the controller code paths are exercised in a system spec and they are not doing something different from the average controller in your app. For example, a controller that has different authorization restrictions because the actions it is performing are more dangerous might require additional testing. The main exception to these guidelines is when your controller is an API controller serving data to another app. In that case, your request spec becomes like your system spec, and you should assert that the response body is correct for important use cases. API boundary tests are even allowed to be duplicative with underlying model specs if the behavior is explicitly important and apparent to the consuming application. Request specs for APIs are owned by the consuming app’s team to ensure that the invariants that they expect to hold are not broken. Below is an example request spec. We like to extract standard assertions such as ones relating to authentication into shared examples. More on shared examples in the section below. Why don’t we use Controller Specs? Controller specs are notably absent from our guide. We used to use controller specs instead of request specs. This was mainly because they were faster to run than request specs. However, in modern versions of Rails, that has changed. Under the covers, request specs are just a thin wrapper around Rails integration tests. In Rails 5+, integration tests have been made to run very fast. Rails is so confident in the improvements they’ve made to integration tests that they’ve removed controller tests from Rails core in Rails 5.1. Additionally, request specs are much more realistic than controller specs since they actually exercise the full request / response lifecycle – routing, middleware, etc – whereas controller specs circumvent much of that process. Given the changes in Rails and the limitations of controller specs, we’ve changed our stance. We no longer write controller specs. All of the things that we were testing in controller specs can instead be tested by some combination of system specs, model specs, and request specs. Why don’t we use Feature Specs? Feature specs are also absent from our guide. System specs were added to Rails 5.1 core and it is the core team’s preferred way to test client-side interactions. In addition, the RSpec team recommends using system specs instead of feature specs. In system specs, each test is wrapped in a database transaction because it’s run within a Rails process, which means we don’t need to use the DatabaseCleaner gem anymore. This makes the tests run faster, and removes the need for having any special tables that don’t get cleaned out. Optimal Testing Because we use these three different categories of specs, it’s important to keep in mind what each type of spec is for to avoid over-testing. Don’t write the same test three times - for example, it is unnecessary to have a model spec, request spec, and a system spec that are all running assertions on the business logic responsibilities of the model. Over-testing takes more development time, can add additional work when refactoring or adding new features, slows down the overall test suite, and sets the wrong example for others when referencing existing tests. Think critically about what each type of spec is intended to be doing while writing specs. If you’re significantly exercising behavior not in the layer you’re writing a test for, you might be putting the test in the wrong place. Testing requires striking a fine balance - we don’t want to under-test either. Too little testing doesn’t give any confidence in system behavior and does not protect against regressions. Every situation is different and if you are unsure what the appropriate test coverage is for a particular feature, start a discussion with your team! Other Testing Recommendations Consider shared examples for last-mile regression coverage and repeated patterns. Examples include request authorization and common validation/error handling: Each spec’s description begins with an action verb, not a helping verb like “should,” “will” or something similar. 8 min read * WEBVALVE – THE MAGIC YOU NEED FOR HTTP INTEGRATION WebValve – The Magic You Need for HTTP Integration Struggling with HTTP integrations locally? Use WebValve to define HTTP service fakes and toggle between real and fake services in non-production environments. When I started at Betterment (the company) five years ago, Betterment (the platform) was a monolithic Java application. As good companies tend to do, it began growing—not just in terms of users, but in terms of capabilities. And our platform needed to grow along with it. At the time, our application had no established patterns or tooling for the kinds of third-party integrations that customers were increasingly expecting from fintech products (e.g., like how Venmo connects to your bank to directly deposit and withdraw money). We were also feeling the classic pain points of a growing team contributing to a single application. To keep the momentum going, we needed to transition towards a service-oriented architecture that would allow the engineers of different business units to run in parallel against their specific business goals, creating even more demand for repeatable solutions to service integration. This brought up another problem (and the starting point for this blog post): in order to ensure tight feedback loops, we strongly believed that our devs should be able to do their work on a modern, modestly-specced laptop without internet connectivity. That meant no guaranteed connection to a cloud service mesh. And unfortunately, it’s not possible to run a local service mesh on a laptop without it melting. In short, our devs needed to be able to run individual services in isolation; by default they were set to communicate with one another, meaning an engineer would have to run all of the services locally in order to work on any one service. To solve this problem, we developed WebValve—a tool that allows us to define and register fake implementations of HTTP services and toggle between real and fake services in non-production environments. I’m going to walk you through how we got there. Start with the test Here’s a look at what a test would look like to see if a deposit from a bank was initiated: The five lines of code on the bottom is the meat of the test. Easy right? Not quite. Notice the two WebMock stub_requests calls at the top. The second one has the syntax you’d expect to execute the test itself. But take a look at the first one—notice the 100+ lines of (omitted) code. Without getting into the gory details, this essentially requires us, for every test we write, to stub a request for user data—with differences across minor things like ID values, we can’t share these stubs between tests. In short it’s a sloppy feature spec. So how do we narrow this feature spec down to something like this? Through the magic of libraries. First things first—defining our view of the problem space. The success of projects like these don’t come down to the code itself—it comes down to the ‘design’ of the solution based on its specific needs. In this case, it meant paring the conditions down to making it work using just rails. Those come to life in four major principles, which guide how we engage with the problem space for our shift to a service-oriented architecture: We use HTTP & REST to communicate with collaborator services We define the boundaries and limit the testing of integrations with contract tests We don't share code across service boundaries Engineers must remain nimble and building features must remain enjoyable. A little bit of color on each, starting with HTTP and REST. For APIs that we build for ourselves (e.g. internal services) we have full control over how we build them, so using HTTP and REST is no issue. We have a strong preference to use a single integration pattern for both internal and external service integrations; this reduces cognitive overhead for devs. When we’re communicating with external services, we have less control, but HTTP is the protocol of the web and REST has been around since 2000—the dawn of modern web applications— so the majority of integrations we build will use them. REST is semantic, evolvable, limber, and very familiar to us as Rails developers —a natural ‘other side of the coin’ for HTTP to make up the lingua franca of the web. Secondly, we need to define the boundaries in terms of ‘contracts.’ Contracts are a point of exchange between the consumption side (the app) and producer side (the collaborator service). The contract defines the expectations of input and output for the exchange. They’re an alternative to the kind of high-level systems integration tests that would include a critical mass of components that would render the test slow and non-repeatable. Thirdly, we don't want to have shared code across service boundaries. Shared code between services creates shared ownership, and shared ownership leads to undesirable coupling. We want the API provider to own and version their APIs, and we want the API consumer to own their integration with each version of a collaborator service's API. If we were willing to accept tight coupling between our services, specifically in their API contracts, we'd be well-served by a tool like Pact. With Pact, you create a contract file based on the consumer's expectations of an API and you share it with the provider. The contract files themselves are about the syntax and structure of requests and responses rather than the interpretation. There's a human conversation and negotiation to be had about these contracts, and you can fool yourself into thinking you don't need to have that conversation if you've got a file that guarantees that you and your collaborator service are speaking the same language; you may be speaking the same words, but you might not infer the same meaning. Pact's docs encourage these human conversations, but as a tool it doesn't require them. By avoiding shared code between services, we force ourselves to have a conversation about every API we build with the consumers of those APIs. Finally, these tests’ effectiveness is directly related to how we can apply them to reality, so we need to be simple—we want to be able to test and build features without connections to other features. We want them to be able to work without an internet connection, and if we do want to integrate with a real service in local development, we should be able to do that—meaning we should be able to test and integrate locally at will, without having to rely on cumbersome, extra-connected services (think Docker, Kubernetes; anything that pairs cloud features with the local environment.) Straightforward tests are easy to write, read, and maintain. That keeps us moving fast and not breaking things. So, to recap, there are four principles that will drive our solution: Service interactions happen over HTTP & REST Contract tests ensure that service interactions behave as expected Providing an API contract requires no shared code Building features remains fast and fun Okay, okay, but how? So we’ve established that we don’t want to hit external services in tests, which we can do through WebMock or similar libraries. The challenge becomes: how do we replicate the integration environment without the integration environment? Through fakes. We’ll fake the integration by using Sinatra to build a rack app that quacks like the real thing. In the rack app, we define the routes we care about for the things we normally would have stubbed in the tests. From here, we do the things we couldn’t do before—pull real parameters out of the requests and feed them back into the fake response to make it more realistic. Additionally, we can use things like ActiveRecord to make these fake responses even more realistic based on the data stored in our actual database. So what does the fake look like? It's a class with a route defined for each URL we care about faking. We can use WebMock to wire the fake to requests that match a certain pattern. If we receive a request for a URL we didn't define, it will 404. Simple. However, this doesn’t allow us to solve all the things we were working for. What’s missing? First, an idiomatic setup stance. We want to be able to define fakes in a single place, so when we add a new one, we can easily find it and change it. In the same vein, we want to be able to answer similar questions about registering fakes in one spot. Finally, convention over configuration—if we can load, register, and wire-up a fake based on its name, for example, that would be handy. Secondly, it’s missing environment-specific behavior, which in this case, translates into the ability to toggle the library on and off and separately toggle the connection to specific collaborator services on and off. We need to be able to have the library active when running tests or doing local development, but do not want to have it running in a production environment—if it remains active in a real environment, it might affect real customer accounts, which we cannot afford. But, there will also be times when we're running in a local development environment and we want to communicate with a real collaborator service to do some true integration testing. Thirdly, we want to be able to autoload our fakes. If they’re in our codebase, we should be able to iterate on the fakes without having to restart our server; the behavior isn’t always right the first time, and restarting is tedious and it's not the Rails Way. Finally, to bolt this on to an IRL application, we need the ability to define fakes incrementally and migrate them into existing integrations that we have, one by one. Okay brass tacks. No existing library allows us to integrate this way and map HTTP requests to in-process fakes for integration and development. Hence, WebValve. TL;DR—WebValve is an open-source gem that uses Sinatra and WebMock to provide fake HTTP service behavior. The special sauce is that it works for more than just your tests. It allows you to run your fakes in your dev environment as well, providing functionality akin to real environments with the toggles we need to access the real thing when we need to. Let’s run it through the gauntlet to show how it works and how it solves for all our requirements. First we add the gem to our Gemfile and run bundle install. With the gem installed, we can use the generator rails g webvalve:install to bootstrap a default config file where we can register our fakes. Then we can generate a fake for our "trading" collaborator service using rails generate webvalve:fake_service Trading. This gives us a class in a conventional location that inherits from WebValve::FakeService. This looks very similar to a Sinatra app, and that's because it is one—with some additional magic baked in. To make this fake work, all we have to do is define the conventionally-named environment variable, TRADINGAPIURL. That tells WebValve what requests to intercept and route to this fake. By inheriting from this WebValve class, we gain the ability to toggle the fake behavior on or off based on another conventionally-named environment variable, in this case TRADING_ENABLED. So let’s take our feature spec. First, we configure out test suite to use WebValve with the RSpec config helper require 'webvalve/rspec'. Then, we look at the user API call—we define a new route for user, in FakeTrading. Then we flesh out that fake route by scooping out our json from the test file and probably making it a little more dynamic when we drop it into the fake. Then we do the same for the deposit API call. And now our test, which doesn't care about the specifics of either of those API calls, is much clearer. It looks just like our ideal spec from before: We leverage all the power of WebMock and Sinatra through our conventions and the teeniest configuration to provide all the same functionality as before, but we can write cleaner tests, we get the ability to use these fakes in local development instead of the real services—and we can enable a real service integration without missing a beat. We’ve achieved our goal—we’ve allowed for all the functionality of integration without the threats of actual integration. Check it out on GitHub. This article is part of Engineering at Betterment. 11 min read * BUILDING FOR BETTER: GENDER INCLUSION AT BETTERMENT Building for Better: Gender Inclusion at Betterment Betterment sits at the intersection of two industries with large, historical gender gaps. We’re working to change that—for ourselves and our industries. Since our founding, we’ve maintained a commitment to consistently build a better company and product for our customers and our customers-to-be. Part of that commitment includes reflecting the diversity of those customers. Betterment sits at the intersection of finance and technology—two industries with large, historical diversity gaps, including women and underrepresented populations. We’re far from perfect, but this is what we’re doing to embrace the International Women’s Day charge and work toward better gender balance at Betterment and in our world. Building Diversity And Inclusion At Betterment Change starts at the heart of the matter. For Betterment, this means working to build a company of passionate individuals who reflect our customers and bring new and different perspectives to our work. Our internal Diversity and Inclusion Committee holds regular meetings to discuss current events and topics, highlights recognition months (like Black History and Women’s History Months), and celebrates the many backgrounds and experiences of our employees. We’ve also developed a partnership with Peoplism. According to Caitlin Tudor-Savin, HR Business Partner, “This is more than a check-the-box activity, more than a one-off meeting with an attendance sheet. By partnering with Peoplism and building a long-term, action-oriented plan, we’re working to create real change in a sustainable fashion.” One next step we’re excited about is an examination of our mentorship program to make sure that everyone at Betterment has access to mentors. The big idea: By building empathy and connection among ourselves, we can create an inclusive environment that cultivates innovative ideas and a better product for our customers. Engaging The Tech Community At Large At Betterment, we’re working to creating change in the tech industry and bringing women into our space. By hosting meetups for Women Who Code, a non-profit organization that empowers women through technology, we’re working to engage this community directly. Rather than getting together to hear presentations, meetups are designed to have a group-led dynamic. Members break out and solve problems together, sharing and honing skills, while building community and support. This also fosters conversation, natural networking, and the chance for women to get their foot in the door. Jess Harrelson, a Betterment Software Engineer, not only leads our hosting events, they found a path to Betterment through Women Who Code. “Consistency is key,” said Jess. “Our Women Who Code meetups become a way to track your progression. It’s exciting to see how I’ve developed since I first started attending meetups, and how some of our long-time attendees have grown as engineers and as professionals.” Building A Community Of Our Own In 2018, our Women of Betterment group had an idea. They’d attended a number of networking and connection events, and the events never felt quite right. Too often, the events involved forced networking and stodgy PowerPoint presentations, with takeaways amounting to little more than a free glass of wine. Enter the SHARE (Support, Hire, Aspire, Relate, Empower) Series. Co-founder Emily Knutsen wanted “to build a network of diverse individuals and foster deeper connections among women in our community.” Through the SHARE Series, we hope to empower future leaders in our industry to reach their goals and develop important professional connections. While the series focuses on programming for women and those who identify as women, it is inclusive to everyone in our community who wish to be allies and support our mission. We developed the SHARE Series to create an authentic and conversational environment, one where attendees help guide the conversations and future event themes. Meetings thus far have included a panel discussion on breaking into tech from the corporate world and a small-group financial discussion led by financial experts from Betterment and beyond. “We’re excited that organizations are already reaching out to collaborate,” Emily said. “We’ve gotten such an enthusiastic response about designing future events around issues that women (and everyone!) face, such as salary negotiations.” Getting Involved Want to join us as we work to build a more inclusive and dynamic community? Our next SHARE Series event features CBS News Business Analyst and CFP® professional Jill Schlesinger, as we celebrate her new book, The Dumb Things Smart People Do with Their Money: Thirteen Ways to Right Your Financial Wrongs. You can also register to attend our Women Who Code meetups, and join engineers from all over New York as we grow, solve, and connect with one another. 4 min read * CI/CD: STANDARDIZING THE INTERFACE CI/CD: Standardizing the Interface Meet our CI/CD platform, Coach and learn how wee increased consistent adoption of Continuous Integration (CI) across our engineering organization. And why that's important. This is the second part of a series of posts about our new CI/CD platform, Coach. Part I explores several design choices we made in building out our notifications pipeline and describes how those choices are emblematic of our overarching engineering principles here at Betterment. Today I’d like to talk about how we increased consistent adoption of Continuous Integration (CI) across our engineering organization, and why. Our Principles in Action: Standardizing the Interface At Betterment, we want to empower our engineers to do their best work. CI plays an important role in all of our teams’ workflows. Over time, a handful of these teams formed deviating opinions on what kind of acceptance criteria they had for CI. While we love the concern that our engineers show toward solving these problems, these deviations became problematic for applications of the same runtime that should abide by the same set of rules; for example, all Ruby apps should run RSpec and Rubocop, not just some of them. In building a platform as a service (PaaS), we realized that in order to mitigate the problem of nurturing pets vs herding cattle we would need to identify a firm set of acceptance criteria for different runtimes. In the first post of this series we mention one of our principles, Standardize the Pipeline. In this post, we’ll explore that principle and dive into how we committed 5000 line configuration files to our repositories with confidence by standardizing CI for different runtimes, automating configuration generation in code, and testing the process that generates that configuration. What’s so good about making everything the same? Our goals in standardizing the CI interface were to: Make it easier to distribute new CI features more quickly across the organization. Onboard new applications more quickly. Ensure the same set of acceptance criteria is in place for all codebases in the org. For example, by assuming that any Java library will run the PMDlinter and unit tests in a certain way we can bootstrap a new repository with very little effort. Allow folks outside of the SRE team to contribute to CI. In general, our CI platform categorizes projects into applications and libraries and divides those up further by language runtime. Combined together we call this a project_type. When we make improvements to one project type’s base configuration, we can flip a switch and turn it on for everyone in the org at once. This lets us distribute changes across the org quickly. How we managed to actually execute on this will become clearer in the next section, but for the sake of hand-wavy-expediency, we have a way to run a few commands and distribute CI changes to every project in a matter of minutes. How did we do it? Because we use CircleCI for our CI pipelines, we knew we would have to define our workflows using their DSL inside a .circleci/config.yml file at the root of a project’s repository. With this blank slate in front of us we were able to iterate quickly by manually adding different jobs and steps to that file. We would receive immediate feedback in the CircleCI interface when those jobs ran, and this feedback loop helped us iterate even faster. Soon we were solving for our acceptance criteria requirements left and right — that Java app needs the PMD linter! This Ruby app needs to run integration tests! And then we reached the point where manual changes were hindering our productivity. The .circleci/config.yml file was getting longer than a thousand lines fast, partly because we didn’t want to use any YAML shortcuts to hide away what was being run, and partly because there were no higher-level mechanisms available at the time for re-use when writing YAML (e.g. CircleCI’s orbs). Defining the system Our solution to this problem was to build a system, a Coach CLI for our Coach app, designed according to CLI 12-factor conventions. This system’s primary goal is to create .circleci/config.yml files for repositories to encapsulate the necessary configuration for a project’s CI pipeline. The CLI reads a small project-level configuration definition file (coach.yml) located in a project’s directory and extrapolates information to create the much larger repo-level CircleCI specific configuration file (.circleci/config.yml), which we were previously editing ourselves. To clarify the hierarchy of how we thought about CI, here are the high level terms and components of our Coach CLI system: There are projects. Each project needs a configuration definition file (coach.yml) that declares its project_type. We support wordpress_app, java_library, java_app, ruby_gem, ruby_app, and javascript_libraryfor now. There are repos, each repo has one or more projects of any type. There needs to be a way to set up a new project. There needs to be a way to idempotently generate the CircleCI configuration (.circleci/config.yml) for all the projects in a repo at once. Each project needs to be built, tested, and linted. We realized that the dependency graph of repository → projects → project jobs was complicated enough that we would need to recreate the entire .circleci/config.yml file whenever we needed to update it, instead of just modifying the YAML file in place. This was one reason for automating the process, but the downsides of human-managed software were another. Manual updates to this file allow the configuration for infrequently-modified projects to drift. And leaving it up to engineers to own their own configuration lets folks modify the file in an unsupported way which could break their CI process. And then we’re back to square one. We decided to create that large file by ostensibly concatenating smaller components together. Each of those smaller components would be the output of specific functions, and each of those functions would be written in code and be tested. The end result was a lot of small files that look a little like this: https://gist.github.com/agirlnamedsophia/4b4a11acbe5a78022ecba62cb99aa85a Every time we make a change to the Coach CLI codebase we are confident that the thousands of lines of YAML that are idempotently generated as a result of the coach update ci command will work as expected because they’re already tested in isolation, in unit tests. We also have a few heftier integration tests to confirm our expectations. And no one needs to manually edit the .circleci/config.yml file again. Defining the Interface In order to generate the .circleci/config.yml that details which jobs to run and what code to execute we first needed to determine what our acceptance criteria was. For each project type we knew we would need to support: Static code analysis Unit tests Integration tests Build steps Test reports We define the specific jobs a project will run during CI by looking at the projecttype value inside a project’s coach.yml. If the value for projecttype is ruby_app then the .circleci/config.yml generator will follow certain conventions for Ruby programs, like including a job to run tests with RSpec or including a job to run static analysis commands like Rubocopand Brakeman. For Java apps and libraries we run integration and unit tests by default as well as PMD as part of our static code analysis. Here’s an example configuration section for a single job, the linter job for our Coach repository: https://gist.github.com/agirlnamedsophia/4b4a11acbe5a78022ecba62cb99aa85a And here’s an example of the Ruby code that helps generate that result: https://gist.github.com/agirlnamedsophia/a96f3a79239988298207b7ec72e2ed04 For each job that is defined in the .circleci/config.yml file, according to the project type’s list of acceptance criteria, we include additional steps to handle notifications and test reporting. By knowing that the Coach app is a ruby_appwe know how many jobs will need to be run and when. By writing that YAML inside of Ruby classes we can grow and expand our pipeline as needed, trusting that our tests confirm the YAML looks how we expect it to look. If our acceptance criteria change, because everything is written in code, adding a new job involves a simple code change and a few tests, and that’s it. We’ll go into contributing to our platform in more detail below. Onboarding a new project One of the main reasons for standardizing the interface and automating the configuration generation was to onboard new applications more quickly. To set up a new app all you need to do is be in the directory for your project and then run coach create project --type $project_type. -> % coach create project --type ruby_app 'coach.yml' configuration file added -- update it based on your project's needs When you run that, the CLI creates the small coach.yml configuration definition file discussed earlier. Here’s what an example Ruby app’s coach.yml looks like: https://gist.github.com/agirlnamedsophia/2f966ab69ba1c7895ce312aec511aa6b The CLI will refer back to a project’s coach.yml to decide what kind of CircleCI DSL needs to be written to the .circleci/config.yml file to wire up the right jobs to run at the right time. Though our contract with projects of different types is standardized, we permit some level of customization. The coach.yml file allows our users to define certain characteristics of their CI flow that vary and require more domain knowledge about a specific project: like the level of test parallelism their application test suite requires, or the list of databases required for tests to run, or an attribute composed of a matrix of Ruby versions and Gemfiles to run the whole test suite against. Using this declarative configuration is more extensible and more user friendly and doesn’t break the contract we’ve put in place for projects that use our CI platform. Contributing to CI Before, if you wanted to add an additional linter or CI tool to our pipeline, it would require adding a few lines of untested bash code to an existing Jenkins job, or adding a new job to a precarious graph of jobs, and crossing your fingers that it would “just work.” The addition couldn’t be tested and it was often only available to one project or one repository at a time. It couldn’t scale out to the rest of the org with ease. Now, updating CI requires opening a PR to make the change. We encourage all engineers who want to add to their own CI pipeline to make changes on a branch from our Coach repository, where all the configuration generation magic happens, verify its effectiveness for their use-case, and open a pull request. If it’s a reasonable addition to CI, our thought is that everyone should benefit. By having these changes in version control, each addition to the CI pipeline goes through code review and requires tests be written. We therefore have the added benefit of knowing that updates to CI have been tested and are deemed valid and working before they’re distributed, and we can prevent folks from removing a feature without considering the impact it may have. When a PR is merged, our team takes care of redistributing the new version of the library so engineers can update their configuration. CI is now a mechanism for instantly sharing the benefits of discovery made in isolated exploration, with everyone. Putting it all together Our configuration generator is doing a lot more than just taping together jobs in a workflow — we evaluate dependency graphs and only run certain jobs that have upstream changes or are triggered themselves. We built our Coach CLI into the Docker images we use in CircleCI and so those Coach CLI commands are available to us from inside the .circleci/config.yml file. The CLI handles notifications, artifact generation, and deployment triggers. As we stated in our requirements for Coach in the first post, we believe there should be one way to test code, and one way to deploy it. To get there we had to make all of our Java apps respond to the same set of commands, and all of our Ruby apps to do the same. Our CLI and the accompanying conventions make that possible. When before it could take weeks of both product engineering and SRE time to set up CI for an application or service within a complex ecosystem of bash scripts and Jenkins jobs and application configuration, now it takes minutes. When before it could take days or weeks to add a new step to a CI pipeline, now it takes hours of simple code review. We think engineers should focus on what they care about the most, shipping great features quickly and reliably. And we think we made it a little easier for them (and us) to do just that. What’s Next? Now that we’ve wrangled our CI process and encoded the best practices into a tool, we’re ready to tackle our Continuous Deployment pipeline. We’re excited to see how the model of projects and project types that we built for CI will evolve to help us templatize our Kubernetes deployments. Stay tuned. 11 min read * CI/CD: SHORTENING THE FEEDBACK LOOP CI/CD: Shortening the Feedback Loop As we improve and scale our CD platform, shortening the feedback loop with notifications was a small, effective, and important piece. Continuous Delivery (CD) at scale is hard to get right. At Betterment, we define CD as the process of making every small change to our system shippable as soon as it’s been built and tested. It’s part of the CI/CD (continuous integration and continuous delivery) process. We’ve been doing CD at Betterment for a long time, but it had grown to be quite a cumbersome process over the last few years because our infrastructure and tools hadn’t evolved to meet the needs of our growing engineering team. We reinvented our Site Reliability Engineering (SRE) team last fall with our sights set on building software to help developers move faster, be happier, and feel empowered. The focus of our work has been on delivering a platform as a service to make sense of the complex process of CD. Coach is the beginning of that platform. Think of something like Heroku, but for engineers here at Betterment. We wanted to build a thoughtfully composed platform based on the tried and true principles of 12-factor apps. In order to build this, we needed to do two overhauls: 1) Build a new CI pipeline and 2) Build a new CD pipeline. Continuous Integration — Our Principles For years, we used Jenkins, an open-source tool for automation, and a mess of scripts to provide CI/CD to our engineers. Jenkins is a powerful tool and well-used in the industry, but we decided to cut it because the way that we were using it was wrong, we weren’t pleased with its feature set, and there was too much technical debt to overcome. Tests were flakey and we didn’t know if it was our Jenkins setup, the tests themselves, or both. Dozens of engineers contribute to our biggest repository every day and as the code base and engineering team have grown, the complexity of our CI story has increased and our existing pipeline couldn’t keep up. There were task forces cobbled together to drive up reliability of the test suite, to stamp out flakes, to rewrite, and to refactor. This put a band-aid on the problem for a short while. It wasn’t enough. We decided to start fresh with CircleCI, an alternative to Jenkins that comes with a lot more opinions, far fewer rough edges, and a lot more stability built-in. We built a tool (Coach) to make the way that we build and test code conventional across all of our of apps, regardless of language, application owner, or business unit. As an added bonus, since our CI process itself was defined in code, if we ever need to switch platforms again, it would be much easier. Coach was designed and built with these principles: Standardize the pipeline — there should be one way to test code, and one way to deploy it Test code often — code should be tested as often as it’s committed Build artifacts often — code should be built as often as it’s tested so that it can be deployed at any time Be environment agnostic — artifacts should be built in an environment-agnostic way with maximum portability Give consistent feedback — the CI output should be consistent no matter the language runtime Shorten the feedback loop — engineers should receive actionable feedback as soon as possible Standardizing CI was critical to our growth as an organization for a number of reasons. It ensures that new features can be shipped more quickly, it allows new services to adopt our standardized CI strategy with ease, and it lets us recover faster in the face of disaster — a hurricane causing a power outage at one of our data centers. Our goal was to replace the old way of building and testing our applications (what we called the “Old World”) and start fresh with these principles in mind (what we deemed the “New World”). Using our new platform to build and test code would allow our engineers to receive automated feedback sooner so they could iterate faster. One of our primary aims in building this platform was to increase developer velocity, so we needed to eliminate any friction from commit to deploy. Friction here refers to ambiguity of CI results and the uncertainty of knowing where your code is in the CI/CD process. Shortening the feedback loop was one of the first steps we took in building out our new platform, and we’re excited to share the story of how we designed that solution. Our Principles in Action: Shortening the Feedback Loop The feedback loop in the Old World run by Jenkins was one of the biggest hurdles to overcome. Engineers never really knew where their code was in the pipeline. We use Slack, like a lot of other companies, so that part of the messaging story wouldn’t change, but there were bugs we needed to fix and design flaws we needed to update. How much feedback should we give? When do we want to give feedback? How detailed should our messages be? These were some of the questions we asked ourselves during this part of the design phase. What our Engineers Needed For pull requests, developers would commit code and push it up to GitHub and then eventually they would receive a Slack message that said “BAD” for every test suite that failed, or “GOOD” if everything passed, or nothing at all in the case of a Jenkins agent getting stuck and hanging forever. The notifications were slightly more nuanced than good/bad, but you get the idea. We valued sending Slack messages to our engineers, as that’s how the company communicates most effectively, but we didn’t like the rate of communication or the content of those messages. We knew both of those would need to change. As for merges into master, the way we sent Slack messages to communicate to engineering teams (as opposed to just individuals) was limited because of how our CI/CD process was constructed. The entire CI and CD process happened as a series of interwoven Jenkins freestyle jobs. We never got the logic quite right around determining whose code was being deployed — the deploy logic was contingent on a pretty rough shell script called “inside a Jenkins job.” The best we had was a Slack message that was sent roughly five minutes before a deploy began, tagging a good estimation of contributors but often missing someone if their Github email address was different from their Slack email address. More critically, the one-off script solution wasn’t stored in source control, therefore it wasn’t tested. We had no idea when it failed or missed tagging some contributors. We liked notifying engineers when a deploy began, but we needed to be more accurate about who we were notifying. What our SRE Team Needed Our design and UX was informed by what our engineers using our platform needed, but Coach was built based on our needs. What did we need? Well-tested code stored in version control that could easily be changed and developed. All of the code that handles changesets and messaging logic in the New World is written in one central location, and it’s tested in isolation. Our CI/CD process invokes this code when it needs to, and it works great. We can be confident that the right people are notified at the right time because we wrote code that does that and we tested it. It’s no longer just a script that sometimes works and sometimes doesn’t. Because it’s in source control and it runs through its own CI process, we can also easily roll out changes to notifications without breaking things. We wanted to build our platform around what our engineers would need to know, when they need to know it, and how often. And so one of the first components we built out was this new communication pipeline. Next we’ll explore in more detail some of our design choices regarding the content of our messages and the rate at which we send them. Make sure our engineers don’t mute their slack notifications In leaving the Old World of inconsistent and contextually sparse communication we looked at our blank canvas and initially thought “every time the tests pass, send a notification! That will reduce friction!” So we tried that. If we merged code into a tracked branch — a branch that multiple engineers contribute to, like master — for one of our biggest repos, which contained 20 apps and 20 test suites, we would be notified at every transition: every rubocop failure, every flakey occurrence of a feature test. We quickly realized it was too much. We sat back and thought really hard about what we would want, considering we were dogfooding our own pipeline. How often did we want to be notified by the notification system when our tests that tested the code that built the notification system, succeeded? Sheesh, that’s a mouthful. Our Slack bot could barely keep up! We decided it was necessary to be told only once when everything ran successfully. However, for failures, we didn’t want to sit around for five minutes crossing our fingers hoping that everything was successful only to be told that we could have known three minutes earlier that we’d forgotten a newline at the end of one of our files. Additionally, in CircleCI where we can easily parallelize our test suites, we realized we wouldn’t want to notify someone for every chunk of the test suite that failed, just the first time a failure happened for the suite. We came up with a few rules to design this part of the system: Let the author know as soon as possible when something is red but don’t overdo it for redundant failures within the same job (e.g. if unit tests ran on 20 containers and 18 of them saw failures, only notify once) Only notify once about all the green things Give as much context as possible without being overwhelming: be concise but clear Next we’ll explore the changes we made in content. What to say when things fail This is what engineers would see in the Old World when tests failed for an open pull request: Among other deficiencies, there’s only one link and it takes us to a Jenkins job. There’s no context to orient us quickly to what the notification is for. After considering what we were currently sending our engineers, we realized that 1) context and 2) status were the most important things to communicate, which were the aspects of our old messaging that were suffering the most. Here’s what we came up with: Thanks Coach bot! Right away we know what’s happened. A PR build failed. It failed for a specific GitHub branch (“what-to-say-when-things-fail-branch”), in a specific repo (“Betterment/coach”), for a specific PR (#430),for a specific job in the test suite (“coach_cli — lint (Gemfile)”). We can click on any of these links and know exactly where they go based on the logo of the service. Messages about failures are now actionable and full of context,prompting the engineer to participate in CI, to go directly to their failures or to their PR. And this bounty of information helps a lot if the engineer has multiple PRs open and needs to quickly switch context. The messaging that happened for failures when you merged a pull request into master was a little different in that it included mentions for the relevant contributors (maybe all of them, if we were lucky!): The New World is cleaner, easier to grok, and more immediately helpful: The link title to GitHub is the commit diff itself, and it takes you to the compare URL for that changeset. The CircleCI info includes the title of the job that failed (“coach_cli — lint (Gemfile)”), the build number (“#11389”) to reference for context in case there are multiple occurrences of the failure in multiple workflows, a link to the top-level “Workflow”, and @s for each contributor. What to say when things succeed We didn’t change the frequency of messaging for success — we got that right the first time around. You got one notification message when everything succeeded and you still do. But in the Old World there wasn’t enough context to make the message immediately useful. Another disappointment we had with the old messaging was that it didn’t make us feel very good when our tests passed. It was just a moment in time that came and went: In the New World we wanted to proclaim loudly (or as loudly as you can proclaim in a Slack message) that the pull request was successful in CI: Tada! We did it! We wanted to maintain the same format as the new failure messages for consistency and ease of reading. The links to the various services we use are in the same order as our new failure messages, but the link to CircleCI only goes to the workflow that shows the graph of all the tests and jobs that ran. It’s delightful and easy to parse and has just the right amount of information. What’s next? We have big dreams for the future of this platform with more and more engineers using our product. Shortening the feedback loop with notifications is only one small, but rather important, part of our CD platform. In the next post of this series on CD, we’ll explore how we committed 5000 line configuration files to our repositories with confidence by standardizing CI for different runtimes, automating config generation in code, and testing that code generation. We believe in a world where shipping code, even in really large codebases with lots of contributors, should be done dozens of times a day. Where engineers can experience feedback about their code with delight and simplicity. We’re building that at Betterment. 12 min read * SHH… IT’S A SECRET: MANAGING SECRETS AT BETTERMENT Shh… It’s a Secret: Managing Secrets at Betterment Opinionated secrets management that helps us sleep at night. Secrets management is one of those things that is talked about quite frequently, but there seems to be little consensus on how to actually go about it. In order to understand our journey, we first have to establish what secrets management means (and doesn’t mean) to us. What is Secrets Management? Secrets management is the process of ensuring passwords, API keys, certificates, etc. are kept secure at every stage of the software development lifecycle. Secrets management does NOT mean attempting to write our own crypto libraries or cipher algorithms. Rolling your own crypto isn’t a great idea. Suffice it to say, crypto will not be the focus of this post. There’s such a wide spectrum of secrets management implementations out there ranging from powerful solutions that require a significant amount of operational overhead, like Hashicorp Vault, to solutions that require little to no operational overhead, like a .env file. No matter where they fall on that spectrum, each of these solutions has tradeoffs in its approach. Understanding these tradeoffs is what helped our Engineering team at Betterment decide on a solution that made the most sense for our applications. In this post, we’ll be sharing that journey. How it used to work We started out using Ansible Vault. One thing we liked about Ansible Vault is that it allows you to encrypt a whole file or just a string. We valued the ability to encrypt just the secret values themselves and leave the variable name in plain-text. We believe this is important so that we can quickly tell which secrets an app is dependent on just by opening the file. So the string option was appealing to us, but that workflow didn’t have the best editing experience as it required multiple steps in order to encrypt a value, insert it into the correct file, and then export it into the environment like the 12-factor appmethodology tells us we should. At the time, we also couldn’t find a way to federate permissions with Ansible Vault in a way that didn’t hinder our workflow by causing a bottleneck for developers. To assist us in expediting this workflow, we had an alias in our bash_profiles that allowed us to run a shortcut at the command line to encrypt the secret value from our clipboard and then insert that secret value in the appropriate Ansible variables file for the appropriate environment. alias prod-encrypt="pbpaste | ansible-vault encrypt_string --vault-password-file=~/ansible-vault/production.key" This wasn’t the worst setup, but didn’t scale well as we grew. As we created more applications and hired more engineers, this workflow became a bit much for our small SRE team to manage and introduced some key-person risk, also known as the Bus Factor. We needed a workflow with less of a bottleneck, but allowing every developer access to all the secrets across the organization was not an acceptable answer. We needed a solution that not only maintained our security posture throughout the software development lifecycle, but also enforced our opinions about how secrets should be managed across environments. Decisions, decisions… While researching our options, we happened upon a tool called sops. Maintained and open-sourced by Mozilla, sops is a command line utility written in Go that facilitates slick encryption and decryption workflows by using your terminal’s default editor. Sops encrypts and decrypts your secret values using your cloud provider’s Key Management Service (AWS KMS, GCP KMS, Azure Key Vault) and PGP as a backup in the event those services are not available. It leaves the variable name in plain-text while only encrypting the secret value itself and supports YAML, JSON, or binary format. We use the YAML format because of its readability and terseness. See a demo of how it works. We think this tool works well with the way we think about secrets management. Secrets are code. Code defines how your application behaves. Secrets also define how your application behaves. So if you can encrypt them safely, you can ship your secrets with your code and have a single change management workflow. Github pull request reviews do software change management right. YAML does human readable key/value storage right. AWS KMS does anchored encryption right. AWS Regions do resilience right. PGP does irreversible encryption better than anything else readily available and is broadly supported. In sops, we’ve found a tool that combines all of these things enabling a workflow that makes secrets management easier. Who’s allowed to do what? Sops is a great tool by itself, but operations security is hard. Key handling and authorization policy design is tricky to get right and sops doesn’t do it all for us. To help us with that, we took things a step further and wrote a wrapper around sops we call sopsorific. Sopsorific, also written in Go, makes a few assumptions about application environments. Most teams need to deploy to multiple environments: production, staging, feature branches, sales demos, etc. Sopsorific uses the term “ecosystem” to describe this concept, as well as collectively describe a suite of apps that make up a working Betterment system. Some ecosystems are ephemeral and some are durable, but there is only one true production ecosystem holding sensitive PII (Personally Identifiable Information) and that ecosystem must be held to a higher standard of access control than all others. To capture that idea, we introduced a concept we call “security zones” into sopsorific. There are only two security zones per GitHub repository — sensitive, and non-sensitive — even if there are multiple apps in a repository. In the case of mono-repos, if an app in that repository shouldn’t have its secrets visible to all engineers who work in that repository, then the app belongs in a different repository. With sopsorific, secrets for the non-sensitive zone can be made accessible to a broader subset of the app team than sensitive zone secrets helping to eliminate some of bottleneck issues we’ve experienced with our previous workflow. By default, sopsorific wants to be configured with a production (sensitive zone) secrets file and a default (non-sensitive zone) secrets file. The default file makes it easy to spin up new non-sensitive one-off ecosystems without having to redefine every secret in every ecosystem. It should “just work” unless there are secrets that have different values than already configured in the default file. In that case, we would just need to define the secrets that have different values in a separate secrets file like devintest.yml below where devintest is the name of the ecosystem. Here’s an example of the basic directory structure: .sops.yaml app/ |_ deployment_secrets/ |_ sensitive/ |_ production.yml |_ nonsensitive/ |_ default.yml |_ devin_test.yml The security zone concept allows a more granular access control policy as we can federate decrypt permissions on a per application and per security zone basis by granting or revoking access to KMS keys with AWS Identity and Access Management (IAM) roles. Sopsorific bootstraps these KMS keys and IAM roles for a given application. It generates a secret-editor role that privileged humans can assume to manage the secrets and an application role for the application to assume at runtime to decrypt the secrets. Following the principle of least privilege, our engineering team leads are app owners of the specific applications they maintain. App owners have permissions to assume the secret-editor role for sensitive ecosystems of their specific application. Non app owners have the ability to assume the secret-editor role for non-sensitive ecosystems only. How it works now Now that we know who can do what, let’s talk about how they can do what they can do. Explaining how we use sopsorific is best done by exploring how our secrets management workflow plays out for each stage of the software development lifecycle. Development Engineers have permissions to assume the secret-editor role for the security zones they have access to. Secret-editor roles are named after their corresponding IAM role which includes the security zone and the name of the GitHub repository. For example, secreteditorsensitive_coach where coach is the name of the repository. We use a little command line utility to assume the role and are dropped into a secret-editor session where they use sops to add or edit secrets with their editor in the same way they add or edit code in a feature branch. assuming a secret-editor role The sops command will open and decrypt the secrets in their editor and, if changed, encrypt them and save them back to the file’s original location. All of these steps, apart from the editing, are transparent to the engineer editing the secret. Any changes are then reviewed in a pull request along with the rest of the code. Editing a file is as simple as: sops deployment_secrets/sensitive/production.yml Testing We built a series of validations into sopsorific to further enforce our opinions about secrets management. Some of these are: Secrets are unguessable — Short strings like “password” are not really secrets and this check enforces strings that are at least 128 bits of entropy expressed in unpadded base64. Each ecosystem defines a comprehensive set of secrets — The 12-factor app methodology reminds us that all environments should resemble production as closely as possible. When a secret is added to production, we have a check that makes sure that same secret is also added to all other ecosystems so that they continue to function properly. All crypto keys match — There are checks to ensure the multi-region KMS key ARNs and backup PGP key fingerprint in the sops config file matches the intended security zones. These validations are run as a step in our Continuous Integration suite. Running these checks is a completely offline operation and doesn’t require access to the KMS keys making it trivially secure. Developers can also run these validations locally: sopsorific check Deployment The application server is configured with the instance profile generated by sopsorific so that it can assume the IAM role that it needs to decrypt the secrets at runtime. Then, we configure our init system, upstart, to execute the process wrapped in the sopsorific run command. sopsorific run is another custom command we built to make our usage of sops seamless. When the app starts up, the decrypted secrets will be available as environment variables only to the process running the application instead of being available system wide. This makes our secrets less likely to unintentionally leak and our security team a little happier. Here’s a simplified version of our upstart configuration. start on starting web-app stop on stopping web-app respawn exec su -s /bin/bash -l -c '\ cd /var/www/web-app; \ exec "$0" "$@"' web-app-owner -- sopsorific run 'bundle exec puma -C config/puma.rb' >> /var/log/upstart.log 2>&1 >Operations The 12-factor app methodology reminds us that sometimes developers need to be able to run one-off admin tasks by starting up a console on a live running server. This can be accomplished by establishing a secure session on the server and running what you would normally run to get a console with the sopsorific run command. For our Ruby on Rails apps, that looks like this: sopsorific run 'bundle exec rails c' What did we learn? Throughout this journey, we learned many things along the way. One of these things was having an opinionated tool to help us manage secrets helped to make sure we didn’t accidentally leave around low-entropy secrets from when we were developing or testing out a feature. Having a tool to protect ourselves from ourselves is vital to our workflow. Another thing we learned was that some vendors provide secrets with lower entropy than we’d like for API tokens or access keys and they don’t provide the option to choose stronger secrets. As a result, we had to build features into sopsorific to allow vendor provided secrets that didn’t meet the sopsorific standards by default to be accepted by sopsorific’s checks. In the process of adopting sops and building sopsorific, we discovered the welcoming community and thoughtful maintainers of sops. We had the pleasure of contributing a few changes to sops, and that left us feeling like we left the community a little bit better than we found it. In doing all of these things, we’ve reduced bottlenecks for developers so they can focus more on shipping features and less on managing secrets. 11 min read * HOW WE DEVELOP DESIGN COMPONENTS IN RAILS How We Develop Design Components in Rails Learn how we use Rails components to keep our code D.R.Y. (Don’t Repeat Yourself) and to implement UX design changes effectively and uniformly.. A little over a year ago, we rebranded our entire site . And we've even written on why we did it. We were able to achieve a polished and consistent visual identity under a tight deadline which was pretty great, but when we had our project retrospective, we realized there was a pain point that still loomed over us. We still lacked a good way to share markup across all our apps. We repeated multiple styles and page elements throughout the app to make the experience consistent, but we didn’t have a great way to reuse the common elements. We used Rails partials in an effort to keep the code DRY (Don’t Repeat Yourself) while sharing the same chunks of code and that got us pretty far, but it had its limitations. There were aspects of the page elements (our shared chunks) that needed to change based on their context or the page where they were being rendered. Since these contexts change, we found ourselves either altering the partials or copying and pasting their code into new views where additional context-specific code could be added. This resulted in app code (the content-specific code) becoming entangled with “system” (the base HTML) code. Aside from partials, there was corresponding styling, or CSS, that was being copied and sometimes changed when these shared partials were altered. This meant when the designs were changed, we needed to find all of the places this code was used to update it. Not only was this frustrating, but it was inefficient. To find a solution, we drew inspiration from the component approach used by modern design systems and JavaScript frameworks. A component is a reusable code building block. Pages are built from a collection of components that are shared across pages, but can be expanded upon or manipulated in the context of the page they’re on. To implement our component system, we created our internal gem, Style Closet. There are a few other advantages and problems this system solves too: We’re able to make global changes in a pretty painless way. If we need to change our brand colors, let’s say, we can just change the CSS in Style Closet instead of scraping our codebase and making sure we catch it everywhere. Reusable parts of code remove the burden from engineers for things like CSS and allows time to focus on and tackle other problems. Engineers and designers can be confident they’re using something that’s been tested and validated across browsers. We’re able to write tests specific to the component without worrying about the use-case or increasing testing time for our apps. Every component is on brand and consistent with every other app, feels polished, high quality and requires lower effort to implement. It allows room for future growth which will inevitably happen. The need for new elements in our views is not going to simply vanish because we rebranded, so this makes us more prepared for the future. How does it work? Below is an example of one of our components, the flash. A flash message/warning is something you may use throughout your app in different colors and with different text, but you want it to look consistent. In our view, or the page where we write our HTML, we would write the following to render what you see above: Here’s a breakdown of how that one line, translates into what you see on the page. The component consists of 3 parts: structure, behavior and appearance. The view (the structure): a familiar html.erb file that looks very similar to what would exist without a component but a little more flexible since it doesn’t have its content hard coded in. These views can also leverage Rails’ view yield functionality when needed. Here’s the view partial from Style Closet: You can see how the component.message is passed into the dedicated space/ slot keeping this code flexible for reuse. A Ruby class (the behavior aside from any JavaScript): the class holds the “props” the component allows to be passed in as well as any methods needed for the view, similar to a presenter model. The props are a fancier attr_accessor with the bonus of being able to assign defaults. Additionally, all components can take a block, which is typically the content for the component. This allows the view to be reusable. CSS (the appearance): In this example, we use it to set things like the color, alignment and the border. A note on behavior: Currently, if we need to add some JS behavior, we use unobtrusive JavaScript or UJS sprinkles. When we add new components or make changes, we update the gem (as well as the docs site associated with Style Closet) and simply release the new version. As we develop and experiment with new types of components, we test these bigger changes out in the real world by putting them behind a feature flag using our open source split testing framework, Test Track. What does the future hold? We’ve used UJS sprinkles in similar fashion to the rest of the Rails world over the years, but that has its limitations as we begin to design more complex behaviors and elements of our apps. Currently we’re focusing on building more intricate and and interactive components using React. A bonus of Style Closet is how well it’s able to host these React components since they can simply be incorporated into a view by being wrapped in a Style Closet component. This allows us to continue composing a UI with self contained building blocks. We’re always iterating on our solutions, so if you’re interested in expanding on or solving these types of problems with us, check out our career page! Addition information Since we introduced our internal Rails component code, a fantastic open-source project emerged, Komponent, as well as a really great and in-depth blog post on component systems in Rails from Evil Martians. 6 min read * ENGINEERING THE LAUNCH OF A NEW BRAND FOR BETTERMENT Engineering the Launch of a New Brand for Betterment In 2017, Betterment set out to launch a new brand to better define the voice and feel of our product. After months of planning across all teams at the company, it was time for our engineering team to implement new and responsive designs across all user experiences. The key to the success of this project was to keep the build simple, maintain a low risk of regressions, and ensure a clear path to remove the legacy brand code after launch. Our team learned a lot, but a few key takeaways come to mind. Relieving Launch Day Stress with Feature Flags Embarking on this rebrand project, we wanted to keep our designs under wrap until launch day. This would entail a lot of code changes, however, as an engineering team we believe deeply in carving up big endeavors into small pieces. We’re constantly shipping small, vertical slices of work hidden behind feature flags and we’ve even built our own open-source system, TestTrack, to help us do so. This project would be no exception. On day one, we created a feature flag and started shipping rebranded code to production. Our team could then use TestTrack’s browser plugin to preview and QA the new views along the way. When the day of the big reveal arrived, all that would be left to do was toggle the flag to unveil the code we’d shipped and tested weeks before. We then turned to the challenge of rebranding our entire user experience. Isolating New Code with ActionPack Variants ActionPack variants provide an elegant solution to rolling out significant front end changes. Typically, variants are prescribed to help render distinct views for different device types, but they are equally powerful when rendering distinct HTML/CSS for any significant redesign. We created a variant for our rebrand, which would be exposed based on the status of our new feature flag. Our variant also required a new CSS file, where all our new styles would live. Rails provides rich template resolver logic at every level of the view hierarchy, and we were able to easily hook into it by simply modifying the extensions of our new layout files. The rebranded version of our application’s core layout imported the new CSS file and just like that, we were in business. Implementing the Rebrand without a Spaghetti of “IF” Statements Our rebranded experience would become the default at launch time, so another challenge we faced was maintaining two worlds without creating unneeded complexity. The “rebrand” variant and correlating template file helped us avoid a tangled web of conditionals, and instead boiled down the overhead to a toggle in our ApplicationController. This created a clean separation between the old and new world and protected us against regressions between the two. Rebranding a feature involved adding new styles to the application_rebrand.css and implementing them in new rebrand view files. Anything that didn’t get a new, rebranded template stayed in the world of plain old production. This freedom from legacy stylesheets and markup were critical to building and clearly demonstrating the new brand and value proposition we wanted to demonstrate to the world. De-scoping with a Lightweight Reskin To rebrand hundreds of pages in time, we had to iron out the precise requirements of what it meant for our views to be “on brand”. Working with our product team, we determined that the minimum amount of change to consider a page rebranded was adoption of the new header, footer, colors, and fonts. These guidelines constituted our “opted out” experience — views that would receive this lightweight reskin immediately but not the full rebrand treatment. This light coat of paint was applied to our production layer, so any experience that couldn’t be fully redesigned within our timeline would still get a fresh header and the fonts and colors that reflected our new brand. As we neared the finish line, the rebranded world became our default and this opt-out world became a variant. A controller-level hook allowed us to easily distinguish which views were to display opt-out mode with a single line of code. We wrote a controller-level hook to update the variant and render the new layout files, reskinning the package. Using a separate CSS manifest with the core changes enumerated above, we felt free to dedicate resources to more thoroughly rebranding our high traffic experiences, deferring improvements to pages that received the initial reskin until after launch. As we’ve circled back to clean up these lower-traffic views and give them the full rebrand treatment, we’ve come closer to deleting the opt_out CSS manifest and deprecating our our legacy stylesheets for good. Designing an Off Ramp Just as we are committed to rolling out large changes in small portions, we are careful to avoid huge changesets on the other side of a release. Fortunately, variants made removing legacy code quite straightforward. After flipping the feature flag and establishing “rebrand” as the permanent variant context, all that remained was to destroy the legacy files that were no longer being rendered and remove the variant name from the file extension of the new primary view template. Controllers utilizing the opt_out hook made their way onto a to-do list for this work without the stress of a deadline. The Other Side of the Launch As the big day arrived, we enjoyed a smooth rebrand launch thanks to the thoughtful implementation of our existing tools and techniques. We leveraged ActionPack variants built into Rails and feature flags from TestTrack in new ways, ensuring we didn’t need to make any architecture changes. The end result: a completely fresh set of views and a new brand we’re excited to share with the world at large. 5 min read * REFLECTING ON OUR ENGINEERING APPRENTICESHIP PROGRAM Reflecting on Our Engineering Apprenticeship Program Betterment piloted an Apprentice Program to add junior talent to our engineering organization in 2017, and it couldn’t have been more successful or rewarding for all of us. One year later, we’ve asked them to reflect on their experiences. In Spring of 2017, Betterment’s Diversity & Inclusion Steering Committee partnered with our Engineering Team to bring on two developers with non-traditional backgrounds. We hired Jess Harrelson (Betterment for Advisors Team) and Fidel Severino (Retail Team) for a 90 day Apprentice Program. Following their apprenticeship, they joined us as full-time Junior Engineers. I’m Jess, a recruiter here at Betterment, and I had the immense pleasure of working closely with these two. It’s been an incredible journey, so I sat down with them to hear first hand about their experiences. Tell us a bit about your life before Betterment. Jess Harrelson: I was born and raised in Wyoming and spent a lot of time exploring the outdoors. I moved to Nashville to study songwriting and music business, and started a small label through which I released my band’s album. I moved to New York after getting an opportunity at Sony and worked for a year producing video content. Fidel Severino: I’m originally from the Dominican Republic and moved to the United States at age 15. After graduation from Manhattan Center for Science and Mathematics High School, I completed a semester at Lehman College before unfortunate family circumstances required me to go back to the Dominican Republic. When I returned to the United States, I worked in the retail sector for a few years. While working, I would take any available time for courses on websites like Codecademy and Team Treehouse. Can we talk about why you decided to become an Engineer? Jess Harrelson: Coding became a hobby for me when I would make websites for my bands in Nashville, but after meeting up with more and more people in tech in the city, I knew it was something I wanted to do as a career. I found coding super similar from a composition and structure perspective, which allowed me to tap into the creative side of coding. I started applying to every bootcamp scholarship I could find and received a full scholarship to Flatiron School. I made the jump to start becoming an engineer. Fidel Severino: While working, I would take any available time for courses on websites like Codecademy and Team Treehouse. I have always been interested in technology. I was one of those kids who “broke” their toys in order to find out how they worked. I’ve always had a curious mind. My interactions with technology prior to learning about programming had always been as a consumer. I cherished the opportunity and the challenge that comes with building with code. The feeling of solving a bug you’ve been stuck on for a while is satisfaction at its best. Those bootcamps changed all of our lives! You learned how to be talented, dynamic engineers and we reap the benefit. Let’s talk about why you chose Betterment. Jess Harrelson: I first heard of Betterment by attending the Women Who Code — Algorithms meetup hosted at HQ. Paddy, who hosts the meetups, let us know that Betterment was launching an apprenticeship program and after the meetup I asked how I could get involved and applied for the program. I was also applying for another different apprenticeship program but throughout the transparent, straightforward interview process, the Betterment apprenticeship quickly became my first choice. Fidel Severino: The opportunity to join Betterment’s Apprenticeship program came via the Flatiron School. One of the main reasons I was ecstatic to join Betterment was how I felt throughout the recruiting process. At no point did I feel the pressure that’s normally associated with landing a job. Keep in mind, this was an opportunity unlike any other I had up to this point in my life, but once I got to talking with the interviewers, the conversation just flowed. The way the final interview was setup made me rave about it to pretty much everyone I knew. Here was a company that wasn’t solely focused on the traditional Computer Science education when hiring an apprentice/junior engineer. The interview was centered around how well you communicate,work with others, and problem solve. I had a blast pair programming with 3 engineers, which I’m glad to say are now my co-workers! We are so lucky to have you! What would you say has been the most rewarding part of your experience so far? Jess Harrelson: The direct mentorship during my apprenticeship and exposure to a large production codebase. Prior to Betterment, I only had experience with super small codebases that I built myself or with friends. Working with Betterment’s applications gave me a hands-on understanding of concepts that are hard to reproduce on a smaller, personal application level. Being surrounded by a bunch of smart, helpful people has also been super amazing and helped me grow as an engineer. Fidel Severino: Oh man! There’s so many things I would love to list here. However, you asked for the most rewarding, and I would have to say without a doubt — the mentorship. As someone with only self-taught and Bootcamp experience, I didn’t know how much I didn’t know. I had two exceptional mentors who went above and beyond and removed any blocks preventing me from accomplishing tasks. On a related note, the entire company has a collaborative culture that is contagious. You want to help others whenever you can; and it has been the case that I’ve received plenty of help from others who aren’t even directly on my team. What’s kept you here? Fidel Severino: The people. The collaborative environment. The culture of learning. The unlimited supply of iced coffee. Great office dogs. All of the above! Jess Harrelson: Seriously though, it was the combination of all that plus so many other things. Getting to work with talented, smart people who want to make a difference. This article is part of Engineering at Betterment. 6 min read * A JOURNEY TO TRULY SAFE HTML RENDERING A Journey to Truly Safe HTML Rendering We leverage Rubocop’s OutputSafety check to ensure we’re being diligent about safe HTML rendering, so when we found vulnerabilities, we fixed them. As developers of financial software on the web, one of our biggest responsibilities is to keep our applications secure. One area we need to be conscious of is how we render HTML. If we don’t escape content properly, we could open ourselves and our customers up to security risks. We take this seriously at Betterment, so we use tools like Rubocop, the Ruby static analysis tool, to keep us on the right track. When we found that Rubocop’s OutputSafety check had some holes, we plugged them. What does it mean to escape content? Escaping content simply means replacing special characters with entities so that HTML understands to print those characters rather than act upon their special meanings. For example, the < character is escaped using <, the >character is escaped using >, and the & character is escaped using &. What could happen if we don’t escape content? We escape content primarily to avoid opening ourselves up to XSS (cross-site scripting) attacks. If we were to inject user-provided content onto a page without escaping it, we’d be vulnerable to executing malicious code in the user’s browser, allowing an attacker full control over a customer’s session.This resource is helpful to learn more about XSS. Rails makes escaping content easier Rails escapes content by default in some scenarios, including when tag helpers are used. In addition, Rails has a few methods that provide help in escaping content. safejoin escapes the content and returns a SafeBuffer (a String flagged as safe) containing it. On the other hand, some methods are just a means for us to mark content as already safe. For example, the <%==interpolation token renders content as is and raw, htmlsafe, and safe_concat simply return a SafeBuffer containing the original content as is, which poses a security risk. If content is inside a SafeBuffer, Rails won’t try to escape it upon rendering. Some examples: html_safe: [1] pry(main)> include ActionView::Helpers::OutputSafetyHelper => Object [2] pry(main)> result = “hi”.html_safe => “hi” [3] pry(main)> result.class => ActiveSupport::SafeBuffer raw: [1] pry(main)> result = raw(“hi”) => “hi” [2] pry(main)> result.class => ActiveSupport::SafeBuffer safe_concat: [1] pry(main)> include ActionView::Helpers::TextHelper => Object [2] pry(main)> buffer1 = “hi”.html_safe => “hi” [3] pry(main)> result = buffer1.safe_concat(“bye”) => “hibye” [4] pry(main)> result.class => ActiveSupport::SafeBuffer safe_join: [1] pry(main)> include ActionView::Helpers::OutputSafetyHelper => Object [2] pry(main)> result = safe_join([“hi”, “bye”]) => “<p>hi</p><p>bye</p>” [3] pry(main)> result.class => ActiveSupport::SafeBuffer => ActiveSupport::SafeBuffer Rubocop: we’re safe! As demonstrated, Rails provides some methods that mark content as safe without escaping it for us. Rubocop, a popular Ruby static analysis tool, provides a cop (which is what Rubocop calls a “check”) to alert us when we’re using these methods: Rails/OutputSafety. At Betterment, we explicitly enable this cop in our Rubocop configurations so if a developer wants to mark content as safe, they will need to explicitly disable the cop. This forces extra thought and extra conversation in code review to ensure that the usage is in fact safe. … Almost We were thrilled about the introduction of this cop — we had actually written custom cops prior to its introduction to protect us against using the methods that don’t escape content. However, we realized there were some issues with the opinions the cop held about some of these methods. The first of these issues was that the cop allowed usage of raw and htmlsafewhen the usages were wrapped in safejoin. The problem with this is that when raw or htmlsafe are used to mark content as already safe by putting it in a SafeBuffer as is, safejoin will not actually do anything additional to escape the content. This means that these usages of raw and html_safeshould still be violations. The second of these issues was that the cop prevented usages of raw and htmlsafe, but did not prevent usages of safeconcat. safeconcat has the same functionality as raw and htmlsafe — it simply marks the content safe as is by returning it in a SafeBuffer. Therefore, the cop should hold the same opinions about safe_concat as it does about the other two methods. So, we fixed it Rather than continue to use our custom cops, we decided to give back to the community and fix the issues we had found with the Rails/OutputSafety cop. We began with this pull request to patch the first issue — change the behavior of the cop to recognize raw and htmlsafe as violations regardless of being wrapped in safejoin. We found the Rubocop community to be welcoming, making only minor suggestions before merging our contribution. We followed up shortly after with a pull request to patch the second issue — change the behavior of the cop to disallow usages of safe_concat. This contribution was merged as well. Contributing to Rubocop was such a nice experience that when we later found that we’d like to add a configuration option to an unrelated cop, we felt great about opening a pull request to do so, which was merged as well. And here we are! Our engineering team here at Betterment takes security seriously. We leverage tools like Rubocop and Brakeman, a static analysis tool specifically focused on security, to make our software safe by default against many of the most common security errors, even for code we haven’t written yet. We now rely on Rubocop’s Rails/OutputSafety cop (instead of our custom cop) to help ensure that our team is making good decisions about escaping HTML content. Along the way, we were able to contribute back to a great community. This article is part of Engineering at Betterment. These articles are maintained by Betterment Holdings Inc. and they are not associated with Betterment, LLC or MTG, LLC. The content on this article is for informational and educational purposes only. © 2017–2019 Betterment Holdings Inc. 5 min read * BUILDING BETTER SOFTWARE FASTER WITH SHARED PRINCIPLES Building Better Software Faster with Shared Principles Betterment’s playbook for extending the golden hour of startup innovation at scale. Betterment’s promise to customers rests on our ability to execute. To fulfill that promise, we need to deliver the best product and tools available and then improve them indefinitely, which, when you think about it, sounds incredibly ambitious or even foolhardy. For a problem space as large as ours, we can’t fulfill that promise with a single two pizza team. But a scaled engineering org presents other challenges that could just as easily put the goal out of reach. Centralizing architectural decision-making would kill ownership and autonomy, and ensure your best people leave or never join in the first place. On the other hand, shared-nothing teams can lead to information silos, wheel-reinventing, and integration nightmares when an initiative is too big for a squad to deliver alone. To meet those challenges, we believe it’s essential to share more than languages, libraries, and context-free best practices. We can collectively build and share a body of interrelated principles driven by insights that our industry as a whole hasn’t yet realized or is just beginning to understand. Those principles can form chains of reasoning that allow us to run fearlessly, in parallel, and arrive at coherent solutions better than the sum of their parts. I gave a talk about Betterment’s engineering principles at a Rails at Scale meetup earlier last year and promised to share them after our diligent legal team finished reviewing. (Legal helpfully reviewed these principles months ago, but then I had my first child, and, as you can imagine, priorities shifted.) Without any further ado, here are Betterment’s Engineering Principles. You can also watch my Rails at Scale talk to learn why we developed them and how we maintain them. Parting Thoughts on Our Principles Our principles aren’t permanent as-written. Our principles are a living document in an actual git repository that we’ll continue to add to and revise as we learn and grow. Our principles derive from and are matched to Betterment’s collective experience and context. We don’t expect these principles to appeal to everybody. But we do believe strongly that there’s more to agree about than our industry has been able to establish so far. Consider these principles, along with our current and future open source work, part of our contribution to that conversation. What are the principles that your team share? 3 min read * SUPPORTING FACE ID ON THE IPHONE X Supporting Face ID on the iPhone X We look at how Betterment's mobile engineering team developed Face ID for the latest phones, like iPhone X. Helping people do what’s best with their money requires providing them with responsible security measures to protect their private financial data. In Betterment’s mobile apps, this means including trustworthy but convenient local authentication options for resuming active login sessions. Three years ago, in 2014, we implemented Touch ID support as an alternative to using PIN entry in our iOS app. Today, on its first day, we’re thrilled to announce that the Betterment iOS app fully supports Apple’s new Face ID technology on the iPhone X. Trusting the Secure Enclave While we’re certainly proud of shipping this feature quickly, a lot of credit is due to Apple for how seriously the company takes device security and data privacy as a whole. The hardware feature of the Secure Enclave included on iPhones since the 5S make for a readily trustworthy connection to the device and its operating system. From an application’s perspective, this relationship between a biometric scanner and the Secure Enclave is simplified to a boolean response. When requested through the Local Authentication framework, the biometry evaluation either succeeds or fails separate from any given state of an application. The “reply” completion closure of evaluatePolicy(_:localizedReason:reply:) This made testing from the iOS Simulator a viable option for gaining a reasonable degree of certainty that our application would behave as expected when running on a device, thus allowing us to prepare a build in advance of having a device to test on. LABiometryType Since we’ve been securely using Touch ID for years, adapting our existing implementation to include Face ID was a relatively minor change. Thanks primarily to the simple addition of the LABiometryType enum newly available in iOS 11, it’s easy for our application to determine which biometry feature, if any, is available on a given device. This is such a minor change, in fact, that we were able to reuse all of our same view controllers that we had built for Touch ID with only a handful of string values that are now determined at runtime. One challenge we have that most existing iOS apps share is the need to still support older iOS versions. For this reason, we chose to wrap LABiometryTypebehind our own BiometryType enum. This allows us to encapsulate both the need to use an iOS 11 compiler flag and the need to call canEvaluatePolicy(_:error:) on an instance of LAContext before accessing its biometryType property into a single calculated property: See the Gist. NSFaceIDUsageDescription The other difference with Face ID is the new NSFaceIDUsageDescriptionprivacy string that should be included in the application’s Info.plist file. This is a departure from Touch ID which does not require a separate privacy permission, and which uses the localizedReason string parameter when showing its evaluation prompt. Touch ID evaluation prompt displaying the localized reason While Face ID does not seem to make a use of that localizedReason string during evaluation, without the privacy string the iPhone X will run the application’s Local Authentication feature in compatibility mode. This informs the user that the application should work with Face ID but may do so imperfectly. Face ID permissions prompt without (left) and with (right) an NSFaceIDUsageDescription string included in the Info.plist This compatibility mode prompt is undesirable enough on its own, but it also clued us into the need to check for potential security concerns opened up by this forwards-compatibility-by-default from Apple. Thankfully, the changes to the Local Authentication framework were done in such a way that we determined there wasn’t a security risk, but it did leave a problematic user experience in reaching a potentially-inescapable screen when selecting “Don’t Allow” on the privacy permission prompt. Since we believe strongly in our users’ right to say “no”, resolving this design issue was the primary reason we prioritized shipping this update. Ship It If your mobile iOS app also displays sensitive information and uses Touch ID for biometry-based local authentication, join us in making the easy adaption to delight your users with full support for Face ID on the iPhone X. 4 min read * FROM 1 TO N: DISTRIBUTED DATA PROCESSING WITH AIRFLOW From 1 to N: Distributed Data Processing with Airflow Betterment has built a highly available data processing platform to power new product features and backend processing needs using Airflow. Betterment’s data platform is unique in that it not only supports offline needs such as analytics, but also powers our consumer-facing product. Features such as Time Weighted Returns and Betterment for Business balances rely on our data platform working throughout the day. Additionally, we have regulatory obligations to report complex data to third parties daily, making data engineering a mission critical part of what we do at Betterment. We originally ran our data platform on a single machine in 2015 when we ingested far less data with fewer consumer-facing requirements. However, recent customer and data growth coupled with new business requirements require us to now scale horizontally with high availability. Transitioning from Luigi to Airflow Our single-server approach used Luigi, a Python module created to orchestrate long-running batch jobs with dependencies. While we could achieve high availability with Luigi, it’s now 2017 and the data engineering landscape has shifted. We turned to Airflow because it has emerged as a full-featured workflow management framework better suited to orchestrate frequent tasks throughout the day. To migrate to Airflow, we’re deprecating our Luigi solution on two fronts: cross-database replication and task orchestration. We’re using Amazon’s Database Migration Service (DMS) to replace our Luigi-implemented replication solution and re-building all other Luigi workflows in Airflow. We’ll dive into each of these pieces below to explain how Airflow mediated this transition. Cross-Database Replication with DMS We used Luigi to extract and load source data from multiple internal databases into our Redshift data warehouse on an ongoing basis. We recently adopted Amazon’s DMS for continuous cross-database replication to Redshift, moving away from our internally-built solution. The only downside of DMS is that we are not aware of how recent source data is in Redshift. For example, a task computing all of a prior day’s activity executed at midnight would be inaccurate if Redshift were missing data from DMS at midnight due to lag. In Luigi, we knew when the data was pulled and only then would we trigger a task. However, in Airflow we reversed our thinking to embrace DMS, using Airflow’s sensor operators to wait for rows to be pushed from DMS before carrying on with dependent tasks. High Availability in Airflow While Airflow doesn’t claim to be highly available out of the box, we built an infrastructure to get as close as possible. We’re running Airflow’s database on Amazon’s Relational Database Service and using Amazon’s Elasticache for Redis queuing. Both of these solutions come with high availability and automatic failover as add-ons Amazon provides. Additionally, we always deploy multiple baseline Airflow workers in case one fails, in which case we use automated deploys to stand up any part of the Airflow cluster on new hardware. There is still one single point of failure left in our Airflow architecture though: the scheduler. While we may implement a hot-standby backup in the future, we simply accept it as a known risk and set our monitoring system to notify a team member of any deviances. Cost-Effective Scalability Since our processing needs fluctuate throughout the day, we were paying for computing power we didn’t actually need during non-peak times on a single machine, as shown in our Luigi server’s load. Distributed workers used with Amazon’s Auto Scaling Groups allow us to automatically add and remove workers based on outstanding tasks in our queues. Effectively, this means maintaining only a baseline level of workers throughout the day and scaling up during peaks when our workload increases. Airflow queues allow us to designate certain tasks to run on particular hardware (e.g. CPU optimized) to further reduce costs. We found just a few hardware type queues to be effective. For instance, tasks that saturate CPU are best run on a compute optimized worker with concurrency set to the number of cores. Non-CPU intensive tasks (e.g. polling a database) can run on higher concurrency per CPU core to save overall resources. Extending Airflow Code Airflow tasks that pass data to each other can run on different machines, presenting a new challenge versus running everything on a single machine. For example, one Airflow task may write a file and a subsequent task may need to email the file from the dependent task ran on another machine. To implement this pattern, we use Amazon S3 as a persistent storage tier. Fortunately, Airflow already maintains a wide selection of hooks to work with remote sources such as S3. While S3 is great for production, it’s a little difficult to work with in development and testing where we prefer to use the local filesystem. We implemented a “local fallback” mixin for Airflow maintained hooks that uses the local filesystem for development and testing, deferring to the actual hook’s remote functionality only on production. Development & Deployment We mimic our production cluster as closely as possible for development & testing to identify any issues that may arise with multiple workers. This is why we adopted Docker to run a production-like Airflow cluster from the ground up on our development machines. We use containers to simulate multiple physical worker machines that connect to officially maintained local Redis and PostgreSQL containers. Development and testing also require us to stand up the Airflow database with predefined objects such as connections and pools for the code under test to function properly. To solve this programmatically, we adopted Alembicdatabase migrations to manage these objects through code, allowing us to keep our development, testing, and production Airflow databases consistent. Graceful Worker Shutdown Upon each deploy, we use Ansible to launch new worker instances and terminate existing workers. But what happens when our workers are busy with other work during a deploy? We don’t want to terminate workers while they’re finishing something up and instead want them to terminate after the work is done (not accepting new work in the interim). Fortunately, Celery supports this shutdown behavior and will stop accepting new work after receiving an initial TERM signal, letting old work finish up. We use Upstart to define all Airflow services and simply wrap the TERM behavior in our worker’s post-stop script, sending the TERM signal first, waiting until we see the Celery process stopped, then finally poweroff the machine. Conclusion The path to building a highly available data processing service was not straightforward, requiring us to build a few specific but critical additions to Airflow. Investing the time to run Airflow as a cluster versus a single machine allows us to run work in a more elastic manner, saving costs and using optimized hardware for particular jobs. Implementing a local fallback for remote hooks made our code much more testable and easier to work with locally, while still allowing us to run with Airflow-maintained functionality in production. While migrating from Luigi to Airflow is not yet complete, Airflow has already offered us a solid foundation. We look forward to continuing to build upon Airflow and contributing back to the community. This article is part of Engineering at Betterment. These articles are maintained by Betterment Holdings Inc. and they are not associated with Betterment, LLC or MTG, LLC. The content on this article is for informational and educational purposes only. © 2017–2019 Betterment Holdings Inc. 6 min read * A FUNCTIONAL APPROACH TO PENNY-PRECISE ALLOCATION A Functional Approach to Penny-Precise Allocation How we solved the problem allocating a sum of money proportionally across multiple buckets by leaning on functional programming. An easy trap to fall into as an object-oriented developer is to get too caught up in the idea that everything has to be an object. I work in Ruby, for example, where the first thing you learn is that everything is an object. Some problems, however, are better solved by taking a functional approach. For instance, at Betterment, we faced the challenge of allocating a sum of money proportionally across multiple buckets. In this post, I’ll share how we solved the problem by leaning on functional programming to allocate money precisely across proportional buckets. The Problem Proportional allocation comes up often throughout our codebase, but it’s easiest to explain using a fictional example: Suppose your paychecks are $1000 each, and you always allocate them to your different savings accounts as follows: College savings fund: $310 Buy a car fund: $350 Buy a house fund: $200 Safety net: $140 Now suppose you’re an awesome employee and received a bonus of $1234.56. You want to allocate your bonus proportionally in the same way you allocate your regular paychecks. How much money do you put in each account? You may be thinking, isn’t this a simple math problem? Let’s say it is. To get each amount, take the ratio of the contribution from your normal paycheck to the total of your normal paycheck, and multiply that by your bonus. So, your college savings fund would get: (310/1000)*1234.56 = 382.7136 We can do the same for your other three accounts, but you may have noticed a problem. We can’t split a penny into fractions, so we can’t give your college savings fund the exact proportional amount. More generally, how do we take an inflow of money and allocate it to weighted buckets in a fair, penny-precise way? The Mathematical Solution: Integer Allocation We chose to tackle the problem by working with integers instead of decimal numbers in order to avoid rounding. This is easy to do with money — we can just work in cents instead of dollars. Next, we settled on an algorithm which pays out buckets fairly, and guarantees that the total payments exactly sum to the desired payout. This algorithm is called the Largest Remainder Method. Multiply the inflow (or the payout in the example above) by each weight (where the weights are the integer amounts of the buckets, so the contributions to the ticket in our example above), and divide each of these products by the sum of the buckets, finding the integer quotient and integer remainder Find the number of pennies that will be left over to allocate by taking the inflow minus the total of the integer quotients Sort the remainders in descending order and allocate any leftover pennies to the buckets in this order The idea here is that the quotients represent the amounts we should give each bucket aside from the leftover pennies. Then we figure out which bucket deserves the leftover pennies. Let’s walk through this process for our example: Remember that we’re working in cents, so our inflow is 123456 and we need to allocate it across bucket weights of [31000, 35000, 20000, 14000]. We find each integer quotient and remainder by multiplying the inflow by the weight and dividing by the total weight. We took advantage of the divmod method in Ruby to grab the integer quotient and remainder in one shot, like so: buckets.map do |bucket| (inflow * bucket).divmod(total_bucket_weight) end This gives us 12345631000/100000, 12345635000/100000, 12345620000 /100000 and 12345614000/100000. The integer quotients with their respective remainders are [38271, 36000], [43209, 60000], [24691, 20000], [17283, 84000]. Next, we find the leftover pennies by taking the inflow minus the total of the integer quotients, which is 123456 — (38271 + 43209 + 24691 + 17283) = 2. Finally, we sort our buckets in descending remainder order (because the buckets with the highest remainders are most deserving of extra pennies) and allocate the leftover pennies we have in this order. It’s worth noting that in our case, we’re using Ruby’s sort_by method, which gives us a nondeterministic order in the case where remainders are equal. In this case, our fourth bucket and second bucket, respectively, are most deserving. Our final allocations are therefore [38271, 43210, 24691, 17284]. This means that your college savings fund gets $382.71, your car fund gets $432.10, your house fund gets $246.91, and your safety net gets $172.84. The Code Solution: Make It Functional Given we have to manage penny allocations between a person’s goals often throughout our codebase, the last thing we’d want is to have to bake penny-pushing logic throughout our domain logic. Therefore, we decided to extract our allocation code into a module function. Then, we took it even further. Our allocation code doesn’t need to care that we’re looking to allocate money, just that we’re looking to allocate integers. What we ended up with was a black box ‘Allocator’ module, with a public module function to which you could pass 2 arguments: an inflow, and an array of weightings. Our Ruby code looks like this. The takeaway The biggest lesson to learn from this experience is that, as an engineer, you should not be afraid to take a functional approach when it makes sense. In this case, we were able to extract a solution to a complicated problem and keep our OO domain-specific logic clean. 5 min read * HOW WE BUILT TWO-FACTOR AUTHENTICATION FOR BETTERMENT ACCOUNTS How We Built Two-Factor Authentication for Betterment Accounts Betterment engineers implemented Two-Factor Authentication across all our apps, simplifying and strengthening our authentication code in the process. Big change is more stressful than small change for people and software systems alike. Dividing a big software project into small pieces is one of the most effective ways to reduce the risk of introducing bugs. As we incorporated Two-Factor Authentication (2FA) into our security codebase, we used a phased rollout strategy to validate portions of the picture before moving on. Throughout the project, we leaned heavily on our collaborative review processes to both strengthen and simplify our authentication patterns. Along the way, we realized that we could integrate our new code more easily if we reworked surrounding access patterns with 2FA in mind. In other words, the 2F itself was relatively easy. Getting the surrounding A right was much trickier. Lead software engineer Chris LoPresto (right) helped lead the team in building Two-Factor Authentication and App Passwords for Betterment accounts. What We Built Two-factor authentication is a security scheme in which users must provide two separate pieces of evidence to verify their identity prior to being granted access. We recently introduced two different forms of 2FA for Betterment apps: TOTP (Time-based One-Time Passwords) using an authenticator app like Google Authenticator or Authy SMS verification codes While SMS is not as secure as an authenticator app, we decided the increased 2FA adoption it facilitated was worthwhile. Two authentication factors are better than one, and it is our hope that all customers consider taking advantage of TOTP. To Build or Not To Build When designing new software features, there is a set of tradeoffs between writing your own code and integrating someone else's. Even if you have an expert team of developers, it can be quicker and more cost-efficient to use a third-party service to set up something complex like an authentication service. We don't suffer from Not Invented Here Syndrome at Betterment, so we evaluated products like Authy and Duo at the start of this project. Both services offer a robust set of authentication features that provide tremendous value with minimal development effort. But as we envisioned integrating either service into our apps, we realized we had work to do on our end. Betterment has multiple applications for consumers, financial advisors, and 401(k) participants that were built at different times with different technologies. Unifying the authentication patterns in these apps was a necessary first step in our 2FA project and would involve far more time and thought than building the 2F handshake itself. This realization, coupled with the desire to build a tightly integrated user experience, led to our decision to build 2FA ourselves. Validating the Approach Once we decide to build something, we also need to learn what not to build. Typically the best way to do that is to build something disposable, throw it away, and start over atop freshly learned lessons. To estimate the level of effort involved in building 2FA user interactions, we built some rough prototypes. For our TOTP prototype we generated a secret key, formatted it as a TOTP provisioning URI, and ran that through a QR code gem. SMS required a third-party provider, Twilio, whose client gem made it almost too easy to text each other "status updates." In short order, we were confident in our ability to deliver 2FA functionality that would work well. The quick ramp-up time and successful outcome of such experiments are among the reasons we value working within the mature, developer-friendly Rails ecosystem. While our initial prototypes were naive and didn’t actually integrate with our auth systems, they formed the core of the two-factor approaches that ultimately landed in our production codebase. Introducing Concepts Before Behaviors Before 2FA entered the picture, our authentication systems performed several tasks when a Betterment user attempted to log in: Verify the provided email address matches an existing user account Hash the provided password with the user’s salt and verify that it matches the hashed password stored for the user account Verify the user account is not locked for security reasons (e.g., too many incorrect password attempts) Create persistent authorization context (e.g., browser cookie, mobile token) to allow the user in the door Our authentication codebase handled all of these tasks in response to a single user action (the act of providing an email and password). As we began reworking this code to handle a potential second user action (the act of providing a login challenge code) the resultant branching logic became overly complex and difficult to understand. Many of our prior design assumptions no longer held, so we paused 2FA development and spun our chairs around for an impromptu design meeting. With 2FA requirements in mind, we decided to redesign our existing password verification as the first of two potential authentication factors. We built, tested, and released this new code independently. Our test suite gave us confidence that our existing password and user state validations remained unchanged within the new notion of a “first authentication factor.” Taking this remodeling detour enabled us to deliver the concept of authentication factors separately from any new system behaviors that relied on them. When we resumed work on 2FA, the proposed “second authentication factor” functionality now fell neatly into place. As a result, we delivered the new 2FA features far more safely and quickly than we could have if we attempted to do everything in one fell swoop. Adding App Passwords Betterment customers have the option of connecting their account to third-party services like TurboTax and Mint. In keeping with our design principle of authorization through impossibility, we created a dedicated API authentication strategy for this use case, separate from our user-focused web authentication strategy. Dedicated endpoints for these services provide read-only access to the bare minimum content (e.g., account balances, transaction information). This strict separation of concerns helps to keep our customers’ data safe and our code simple. However, in order to connect to third-party services, our customers still had to share their account password with these third parties. While these institutions may be trustworthy, it is best to eliminate shared trust wherever possible when designing secure systems. Because these services do not support 2FA, it was now time to build a more secure password scheme for third-party apps. We started by designing a simple process for customers to generate app passwords for each service they wish to connect. These app passwords are complex enough for safe usage yet employ an alphabet scheme easily transcribed by our customers during setup. We then rewrote our API authentication code to accept app passwords and to reject account passwords for users with 2FA enabled. Our customers can now provide (and revoke) unique read-only passwords for third party services they connect to Betterment. Crucially, our app password scheme is compatible right out of the gate with the new 2FA features we just launched. Slicing Up Deliverables Building 2FA and app passwords involved a complex set of coordinated changes to sensitive security-related code. To minimize the level of risk in this ambitious project, we used the feature-toggling built into our open-source split-testing framework TestTrack. By hiding the new functionality behind a feature flag, we were able to to launch and validate features over the course of months before publicly unveiling them to retail customers. Even experienced programmers sometimes resist the “extra” work necessary to devise a phased approach to a problem. Sometimes we struggle to disentangle pieces that are ready for a partial launch from pieces that aren’t. But the point cannot be overstated: Feature flags are our friends. At Betterment, we use them to orchestrate the partial rollout of big features. We validate new functionality before unveiling it to our user base at large. By facilitating a series of small, testable code changes, feature flags provide one of the most effective means of mitigating risks associated with shipping large features. At the beginning of the 2FA project, we created a feature flag for the engineers working on the project. As the project progressed, we flipped the flag on for Betterment employees followed by a set of external beta testers. By the time we announced 2FA in the release notes for our mobile apps, the “new” code had been battle tested for months. Help Us Iterate The final step of our 2FA project was to delete the aforementioned feature flag from our codebase. While that was a truly satisfying moment, we all know that our work is never done. If you’re interested in approaching our next set of tricky projects in a nimble, iterative fashion, go ahead and apply. 8 min read * HOW WE ENGINEERED BETTERMENT’S TAX-COORDINATED PORTFOLIO™ How We Engineered Betterment’s Tax-Coordinated Portfolio™ For our latest tax-efficiency feature, Tax Coordination, Betterment’s solver-based portfolio management system enabled us to manage and test our most complex algorithms. Tax efficiency is a key consideration of Betterment’s portfolio management philosophy. With our new Tax Coordination feature, we’re continuing the mission to help our customers’ portfolios become as tax efficient as possible. While new products can often be achieved using our existing engineering abstractions, TCP brought the engineering team a new level of complexity that required us to rethink how parts of our portfolio management system were built. Here’s how we did it. A Primer on Tax Coordination Betterment’s TCP feature is our very own, fully automated version of an investment strategy known as asset location. If you’re not familiar with asset location, it is a strategy designed to optimize after-tax returns by placing tax-inefficient securities into more tax-advantaged accounts, such as 401(k)s and Individual Retirement Accounts (IRAs). Before we built TCP, Betterment customers had each account managed as a separate, standalone portfolio. For example, customers could set up a Roth IRA with a portfolio of 90% stocks and 10% bonds to save for retirement. Separately, they could set up a taxable retirement account invested likewise in 90% stocks and 10% bonds. Now, Betterment customers can turn on TCP in their accounts, and their holdings in multiple investment accounts will be managed as a single portfolio allocation, but rearranged in such a way that the holdings across those accounts seek to maximize the overall portfolio’s after-tax returns. To illustrate, let’s suppose you’re a Betterment customer with three different accounts: a Roth IRA, a traditional IRA, and a taxable retirement account. Let’s say that each account holds $50,000, for a total of $150,000 in investments. Now assume that the $50,000 in each account is invested into a portfolio of 70% stocks and 30% bonds. For reference, consider the diagram. The circles represent various asset classes, and the bar shows the allocation for all the accounts, if added together. Each account has a 70/30 allocation, and the accounts will add up to 70/30 in the aggregate, but we can do better when it comes to maximizing after-tax returns. We can maintain the aggregate 70/30 asset allocation, but use the available balances of $50,000 each, to rearrange the securities in such a way that places the most tax-efficient holdings into a taxable account, and the most tax-inefficient ones into IRAs. Here’s a simple animation solely for illustrative purposes: Asset Location in Action The result is the same 70/30 allocation overall, except TCP has now redistributed the assets unevenly, to reduce future taxes. How We Modeled the Problem The fundamental questions the engineering team tried to answer were: How do we get our customers to this optimal state, and how do we maintain it in the presence of daily account activity? We could have attempted to construct a procedural-style heuristic solution to this, but the complexity of the problem led us to believe this approach would be hard to implement and challenging to maintain. Instead, we opted to model our problem as a linear program. This made the problem provably solvable and quick to compute—on the order of milliseconds per customer. Let’s consider a hypothetical customer account example. Meet Joe Joe is a hypothetical Betterment customer. When he signed up for Betterment, he opened a Roth IRA account. As an avid saver, Joe quickly reached his annual Roth IRA contribution limit of $5,500. Wanting to save more for his retirement, he decided to open up a Betterment taxable account, which he funded with an additional $11,000. Note that the contribution limits mentioned in this example are as of the time this article was published. Limits are subject to change from year to year, so please defer to IRS guidelines for current limits. See IRA limits here and 401(k) limits. Joe isn’t one to take huge risks, so he opted for a moderate asset allocation of 50% stocks and 50% bonds in both his Roth IRA and taxable accounts. To make things simple, let’s assume that both portfolios are only invested in two asset classes: U.S. total market stocks and emerging markets bonds. In his taxable account, Joe holds $5,500 worth of U.S. total market stocks in VTI (Vanguard Total Stock Market ETF), and $5,500 worth of emerging markets bonds in VWOB (Vanguard Emerging Markets Bond ETF). Let’s say that his Roth IRA holds $2,750 of VTI, and $2,750 of VWOB. Below is a table summarizing Joe’s holdings: Account Type: VTI (U.S. Total Market) VWOB (Emerging Markets Bonds) Account Total Taxable $5,500 $5,500 $11,000 Roth $2,750 $2,750 $5,500 Asset Class Total $8,250 $8,250 $16,500 To begin to construct our model for an optimal asset location strategy, we need to consider the relative value of each fund in both accounts. A number of factors are used to determine this, but most importantly each fund’s tax efficiency and expected returns. Let’s assume we already know that VTI has a higher expected value in Joe’s taxable account, and that VWOB has a higher expected value in his Roth IRA. To be more concrete about this, let’s define some variables. Each variable represents the expected value of holding a particular fund in a particular account. For example, we’re representing the expected value of holding VTI in your Taxable as which we’ve defined to be 0.07. More generally, Let’s let be the expected value of holding fund F in account A. Circling back to the original problem, we want to rearrange the holdings in Joe’s accounts in a way that’s maximally valuable in the future. Linear programs try to optimize the value of an objective function. In this example, we want to maximize the expected value of the holdings in Joe’s accounts. The overall value of Joe’s holdings are a function of the specific funds in which he has investments. Let’s define that objective function. You’ll notice the familiar terms—measuring the expected value of holding each fund in each account, but also you’ll notice variables of the form Precisely, this variable represents the balance of fund F in account A. These are our decision variables—variables that we’re trying to solve for. Let’s plug in some balances to see what the expected value of V is with Joe’s current holdings: V=0.07*5500+0.04*5500+0.06*2750+0.05*2750=907.5 Certainly, we can do better. We cannot just assign arbitrarily large values to the decision variables due to two restrictions which cannot be violated: Joe must maintain $11,000 in his taxable account and $5,500 in his Roth IRA. We cannot assign Joe more money than he already has, nor can we move money between his Roth IRA and taxable accounts. Joe’s overall portfolio must also maintain its allocation of 50% stocks and 50% bonds—the risk profile he selected. We don’t want to invest all of his money into a single fund. Mathematically, it’s straightforward to represent the first restriction as two linear constraints. Simply put, we’ve asserted that the sum of the balances of every fund in Joe’s taxable account must remain at $11,000. Similarly, the sum of the balances of every fund in his Roth IRA must remain at $5,500. The second restriction—maintaining the portfolio allocation of 50% stocks and 50% bonds—might seem straightforward, but there’s a catch. You might guess that you can express it as follows: The above statements assert that the sum of the balances of VTI across Joe’s accounts must be equal to half of his total balance. Similarly, we’re also asserting that the sum of the balances of VWOB across Joe’s accounts must be equal to the remaining half of his total balance. While this will certainly work for this particular example, enforcing that the portfolio allocation is exactly on target when determining optimality turns out to be too restrictive. In certain scenarios, it’s undesirable to buy or to sell a specific fund because of tax consequences. These restrictions require us to allow for some portfolio drift—some deviation from the target allocation. We made the decision to maximize the expected after-tax value of a customer’s holdings after having achieved the minimum possible drift. To accomplish this, we need to define new decision variables. Let’s add them to our objective function: is the dollar amount above the target balance in asset class AC. Similarly, is the dollar amount below the target balance in asset class AC. For instance, is the dollar amount above the target balance in emerging markets bonds—the asset class to where VWOB belongs. We still want to maximize our objective function V. However, with the introduction of the drift terms, we want every dollar allocated toward a single fund to incur a penalty if it moves the target balance for that fund’s asset class below or above its target amount. To do this, we can relate the terms with the terms using linear constraints. As shown above, we’ve asserted that the sum of the balances in funds including U.S. total market stocks (in this case, only VTI), plus some net drift amount in that asset class, must be equal to the target balance of that asset class in the portfolio (which in this case, is 50% of Joe’s total holdings). Similarly, we’ve also done this for emerging markets bonds. This way, if we can’t achieve perfect allocation, we have a buffer that we can fill—albeit at a penalty. Now that we have our objective function and constraints set up, we just need to solve these equations. For this we can use a mathematical programming solver. Here’s the optimal solution: Managing Engineering Complexity Reaching the optimal balances would require our system to buy and sell securities in Joe’s investment accounts. It’s not always free for Joe to go from his current holdings to optimal ones because buying and selling securities can have tax consequences. For example, if our system sold something at a short-term capital gain in Joe’s taxable account, or bought a security in his Roth IRA that was sold at a loss in the last 30 days—triggering the wash-sale rule, we would be negatively impacting his after-tax return. In the simple example above with two accounts and two funds, there are a total of four constraints. Our production model is orders of magnitude more complex, and considers each Betterment customer’s individual tax lots, which introduces hundreds of individual constraints to our model. Generating these constraints that ultimately determine buying and selling decisions can often involve tricky business logic that examines a variety of data in our system. In addition, we knew that as our work on TCP progressed, we were going to need to iterate on our mathematical model. Before diving head first into the code, we made it a priority to be cognizant of the engineering challenges we would face. Key Principles for Using Tax Coordination on a Retirement Goal As a result, we wanted to make sure that the software we built respected four key principles, which are: Isolation from third-party solver APIs. Ability to keep pace with changes to the mathematical model, e.g., adding, removing, and changing the constraints and the objective function must be quick and painless. Separation of concerns between how we accessed data in our system and the business logic defining algorithmic behavior. Easy and comprehensive testing. We built our own internal framework for modeling mathematical programs that was not tied to our trading system’s domain-specific business logic. This gave us the flexibility to switch easily between a variety of third-party mathematical programming solvers. Our business logic that generates the model knows only about objects defined by our framework, and not about third-party APIs. To incorporate a third-party solver into our system, we built a translation layer that received our system-generated constraints and objective function as inputs, and utilized those inputs to solve the model using a third-party API. Switching between third-party solvers simply meant switching implementations of the interface below. We wanted that same level of flexibility in changing our mathematical model. Changing the objective function and adding new constraints needed to be easy to do. We did this by providing well-defined interfaces that give engineers access to core system data needed to generate our model. This means that an engineer implementing a change to the model would only need to worry about implementing algorithmic behavior, and not about how to retrieve the data needed to do that. To add a new set of constraints, engineers simply provide an implementation of a TradingConstraintGenerator. Each TradingConstraintGenerator knows about all of the system related data it needs to generate constraints. Through dependency injection, the new generator is included among the set of generators used to generate constraints. The sample code below illustrates how we generated the constraints for our model. With hundreds of constraints and hundreds of thousands of unique tax profiles across our customer base, we needed to be confident that our system made the right decisions in the right situations. For us, that meant having clear, readable tests that were a joy to write. Below is a test written in Groovy, which sets up fixture data that mimics the exact situation in our “Meet Joe” example. We not only had unit tests such as the one above to test simple scenarios where a human could calculate the outcome, but we also ran the optimizer in a simulated production-like environment, through hundreds of thousands of scenarios that closely resembled real ones. During testing, we often ran into scenarios where our model had no feasible solution—usually due to a bug we had introduced. As soon as the bug was fixed, we wanted to ensure that we had automated tests to handle a similar issue in the future. However, with so many sources of input affecting the optimized result, writing tests to cover these cases was very labor-intensive. Instead, we automated the test setup by building tools that could snapshot our input data as of the time the error occurred. The input data was serialized and automatically fed back into our test fixtures. Striving for Simplicity At Betterment, we aim to build products that help our customers reach their financial goals. Building new products can often be done using our existing engineering abstractions. However, TCP brought a new level of complexity that required us to rethink the way parts of our trading system were built. Modeling and implementing our portfolio management algorithms using linear programming was not easy, but it ultimately resulted in the simplest possible system needed to reliably pursue optimal after-tax returns. To learn more about engineering at Betterment, visit the engineering page on the Betterment Resource Center. All return examples and return figures mentioned above are for illustrative purposes only. For much more on our TCP research, including additional considerations on the suitability of TCP to your circumstances, please see our white paper. See full disclosure for our estimates and Tax Coordination in general. 13 min read * WHAT’S THE BEST AUTHORIZATION FRAMEWORK? NONE AT ALL What’s the Best Authorization Framework? None At All Betterment’s engineering team builds software more securely by forgoing complicated authorization frameworks. As a financial institution, we take authorization—deciding who is allowed to do what—extremely seriously. But you don't need an authorization framework to build an application with robust security and access control. In fact, the increased complexity and indirection that authorization frameworks require can actually make your software less secure. At Betterment, we follow key principles to avoid authorization frameworks altogether in many of our applications. Of course, it would be impractical to completely avoid authorization features in internal applications that support our team's diverse responsibilities. For these apps, Betterment reframed the problem and built a radically simpler authorization framework by following a few simple ground rules. The Downside of Frameworks Application security is tough to get right. Some problems, like cryptography, are so thorny that even implementing a well-known algorithm yourself would be malpractice. Professional teams lean on proven libraries and frameworks to solve hard problems, such as NaCl for crypto and Devise for authentication. But authorization isn't like crypto or authentication. At Betterment, we've found that authorization rules emerge naturally from our business logic, and we believe that's where they belong. Most authorization frameworks blur the lines around this crucial piece of a business domain, leaving engineers to wonder whether and how to leverage the authorization framework versus treating a given condition as a regular business rule. Over time, these hundreds or thousands of successive decisions can result in a minefield of inconsistent, unauditable semantics, and ultimately the confusion can lead to bugs and vulnerabilities. Betterment has structured our entire platform around the security of our customers. By following the principles in this article, we've simplified the authorization problem, making decisions easy and accountable, and achieving even higher confidence in our systems’ safety. Authorization Without the Framework Here are the principles that keep Betterment's most security-critical apps free of authorization frameworks: Authorization Through Impossibility The most fundamental authorization rule of an app like Betterment’s is that users should only be able to see their own financial profiles. That could be modeled in an authorization framework by specifying that users only have access to their own profiles, and then querying the framework before display. But there's a better way: Make it impossible. We simply don't have an endpoint to allow somebody to request another user's information. The most secure code is the code that never got written. Authorization Through Navigability Most things that could be described as authorization rules emerge naturally from relationships. For instance, if I'm co-owner of a joint account opened by my spouse, then I am allowed to see that account. Rather than add another layer of indirection, we simply rely on our data model, and only expose data that can be reached through the app's natural relationships. The app can’t even locate data that should be inaccessible. Authorization Through Application Boundaries Many arguments for heavyweight authorization arise from administrative access. What if a Customer Support representative needs the ability to help a customer make a change to an account? Shouldn’t there be a simple override available to her? No, at least not within the same app. Each application should have a single audience. For instance, our consumer-facing app lets customers view and manage their investments. Our internal Customer Support app allows our Customer Support representatives to look up the accounts of customers they’re assisting. Our Ops app gives our broker-dealer operations team the tools to monitor risk systems and manage transactions. This isn’t just a boon for security—it's better software. Each app is built for a specific team with exactly the tools and information it needs. But Sometimes You Need a Framework The real world is complicated. At Betterment, where we’re approaching 200 employees across many disciplines, we know this well. Some tasks require a senior team member. Some trainees only need limited access to a system. As an engineering organization, you could build a new app for every single title and level within your company, but it'd be confusing for team members whose jobs are more similar than they are different, and mind-bogglingly expensive to maintain. How do you move forward without going whole-hog on heavyweight authorization? By setting a few ground rules for ourselves, we were able to design a lightweight, auditable, and intuitive approach to authorization that has scaled with our team and stayed dead simple. Here were the rules we followed: 1. Privilege Levels Are Named After the People Who Use the Software As Phil Karlton once said, naming things is one of the two hard things in computer science. Software is built by people, for people. To tend toward security over the long term, the names we use must be intuitive to both the engineers building the software and its users. For instance, our Customer Support app has the levels trainee, staff, and manager. As the organization grows and matures, internal jargon will change too. It's crucial to make sure these names remain meaningful, updating them as needed. 2. Privilege Levels Are Linear Once you've built a separate app for each audience, you don't need to support multiple orthogonal roles—a single ladder is enough. In our Customer Support app, staff can do a superset of what trainees can do, and managers can do a superset of what staff can do. In combination with the naming rule, this means that you can easily add levels above, below, or between the existing levels as your team grows without rethinking every privilege in the system. Eventually, you may find that a single ladder isn’t enough. This is a great opportunity to force the conversation about how roles within your team are diverging, and build software to match. 3. REST Resources Are the Only Resources, and HTTP Verbs Are the Only Actions At their core, all authorization systems determine whether a user has permission to perform an action on a resource. Much of the complexity in a traditional authorization system comes from defining those resources and their relationships to users, usually in terms of database entities. RESTful applications have the concepts of resources and actions in their DNA, so there's no need to reinvent that wheel. REST doesn't just give us the ability to define simple resources like accounts that correspond to database tables. We can build up semantic concepts like a search resource to enable basic user lookup, and a secure_search resource that allows senior team members to query by sensitive details like Social Security number. By treating HTTP verbs as our actions, we can easily allow a trainee to GET an account but not PATCH it. 4. Authorization Framework Calls Are Simple, and Stay in the Controllers and Views If the only way to initiate an action is through a REST endpoint, there's no need to add complexity to your business logic layer. The authorization framework only has two features: To answer whether a user can request a given resource with a given verb. App developers use this feature to customize views (e.g., show or hide a button). To abort requests for a resource if the answer is no. And that's all you need. Help Us Solve the Hard Problems Security is a mindset, philosophy, and practice more than a set of tools or solutions, and many challenges lie ahead. If you’d like to help Betterment design, build, and spread radically simpler and more secure solutions to the hard problems our customers and team face, go ahead and apply. 7 min read * THE EVOLUTION OF THE BETTERMENT ENGINEERING INTERVIEW The Evolution of the Betterment Engineering Interview Betterment’s engineering interview now includes a pair programming experience where candidates are tested on their collaboration and technical skills. Building and maintaining the world’s largest independent robo-advisor requires a world-class team of human engineers. This means we must continuously iterate on our recruiting process to remain competitive in attracting and hiring top talent. As our team has grown impressively from five to more than 50 engineers (and this was just in the last three years), we’ve significantly improved our abilities to make clearer hiring decisions, as well as shortened our total hiring timeline. Back in the Day Here’s how our interview process once looked: Resumé review Initial phone screen Technical phone screen Onsite: Day 1 Technical interview (computer science fundamentals) Technical interview (modelling and app design) Hiring manager interview Onsite: Day 2 Product and design interview Company founder interview Company executive interview While this process helped in growing our engineering team, it began showing some cracks along the way. The main recurring issue was that hiring managers were left uncertain as to whether a candidate truly possessed the technical aptitude and skills to justify making them an employment offer. While we tried to construct computer science and data modelling problems that led to informative interviews, watching candidates solve these problems still wasn’t getting to the heart of whether they’d be successful engineers once at Betterment. In addition to problems arising from the types of questions asked, we saw that one of our primary interview tools, the whiteboard, was actually getting in the way; many candidates struggled to communicate their solutions using a whiteboard in an interview setting. The last straw for using whiteboards came from feedback provided by Betterment’s Women in Technology group. When I sat down with them to solicit feedback on our entire hiring process, they pointed to the whiteboard problem-solving dynamics (one to two engineers sitting, observing, and judging the candidate standing at a whiteboard) as unnatural and awkward. It was clear this part of the interviewing process needed to go. We decided to allow candidates the choice of using a whiteboard if they wished, but it would no longer be the default method for presenting one’s skills. If we did away with the whiteboard, then what would we use? The most obvious alternative was a computer, but then many of our engineers expressed concerns with this method, having had bad experiences with computer-based interviews in the past. After spirited internal discussions we landed on a simple principle: We should provide candidates the most natural setting possible to demonstrate their abilities. As such, our technical interviews switched from whiteboards to computers. Within the boundaries of that principle, we considered multiple interview formats, including take-home and online assessments, and several variations of pair programming interviews. In the end, we landed on our own flavor of a pair programming interview. Today: A Better Interview Here’s our revised interview process: Resumé review Initial phone screen Technical phone screen Onsite: Technical interview 1 Ask the candidate to describe a recent technical challenge in detail Set up the candidate’s laptop Introduce the pair programming problem and explore the problem Pair programming (optional, time permitting) Technical interview 2 Pair programming Technical interview 3 Pair programming Ask-Me-Anything session Product and design interview Hiring manager interview Company executive interview While an interview setting may not offer pair programming in its purest sense, our interviewers truly participate in the process of writing software with the candidates. Instead of simply instructing and watching candidates as they program, interviewers can now work with them on a real-world problem, and they take turns in control of the keyboard. This approach puts candidates at ease, and feels closer to typical pair programming than one might expect. As a result, in addition to learning how well a candidate can write code, we learn how well they collaborate. We also split the main programming portion of our original interview into separate sections with different interviewers. It’s nice to give candidates a short break in between interviews, but the main reason for the separation is to evaluate the handoff. We like to evaluate how well a candidate explains the design decisions and progress from one interviewer to the next. Other Improvements We also streamlined our question-asking process and hiring timeline, and added an opportunity for candidates to speak with non-interviewers. Questions Interviews are now more prescriptive regarding non-technical questions. Instead of multiple interviewers asking a candidate about the same questions based on their resumé, we prescribe topics based on the most important core competencies of successful (Betterment) engineers. Each interviewer knows which competencies (e.g., software craftsmanship) to evaluate. Sample questions, not scripts, are provided, and interviewers are encouraged to tailor the competency questions to the candidates based on their backgrounds. Timeline Another change is that the entire onsite interview is completed in a single day. This can make scheduling difficult, but in a city as competitive as New York is for engineering talent, we’ve found it valuable to get to the final offer stage as quickly as possible. Discussion Finally, we’ve added an Ask-Me-Anything (AMA) session—another idea provided by our Women in Technology group. While we encourage candidates to ask questions of everyone they meet, the AMA provides an opportunity to meet with a Betterment engineer who has zero input on whether or not to hire them. Those “interviewers” don’t fill out a scorecard, and our hiring managers are forbidden from discussing candidates with them. Ship It Our first run of this new process took place in November 2015. Since then, the team has met several times to gather feedback and implement tweaks, but the broad strokes have remained unchanged. As of July 2016, all full-stack, mobile, and site-reliability engineering roles have adopted this new approach. We’re continually evaluating whether to adopt this process for other roles, as well. Our hiring managers now report that they have a much clearer understanding of what each candidate brings to the table. In addition, we’ve consistently received high marks from candidates and interviewers alike, who prefer our revamped approach. While we didn’t run a scientifically valid split-test for the new process versus the old (it would’ve taken years to reach statistical significance), our hiring metrics have improved across the board. We’re happy with the changes to our process, and we feel that it does a great job of fully and honestly evaluating a candidate’s abilities, which helps Betterment to continue growing its world-class team. For more information about working at Betterment, please visit our Careers page. More from Betterment: Server Javascript: A Single-Page App To…A Single-Page App Going to Work at Betterment Engineering at Betterment: Do You Have to Be a Financial Expert? Determination of largest independent robo-advisor reflects Betterment LLC’s distinction of having highest number of assets under management, based on Betterment’s review of assets self-reported in the SEC’s Form ADV, across Betterment’s survey of independent robo-advisor investing services as of March 15, 2016. As used here, “independent” means that a robo-advisor has no affiliation with the financial products it recommends to its clients. 6 min read * SERVER JAVASCRIPT: A SINGLE-PAGE APP TO…A SINGLE-PAGE APP Server JavaScript: A Single-Page App To…A Single-Page App Betterment engineers recently migrated a single-page backbone app to a server-driven Rails experience. Betterment engineers (l-r): Arielle Sullivan, J.P. Patrizio, Harris Effron, and Paddy Estridge We recently changed the way we organize our major business objects. All the new features we’re working on for customers with multiple accounts—be they Individual Retirement Accounts (IRAs), taxable investment accounts, trusts, joint accounts, or even synced outside accounts—required this change. We were also required to rename several core concepts, and make some big changes to the way we display data to our customers. Currently, our Web application is a JavaScript single-page app that uses a frontend MVC framework, backed by a JSON api. We use Marionette.js, a framework built on top of Backbone.js, to help us organize our JavaScript and manage page state. It was built out over the past few years, with many different paradigms and patterns. After some time, we found ourselves with an application that had a lot of complexity and splintered code practices throughout. The complexity partly arose from the fact that we needed to duplicate business logic from the backend and the frontend. By only using the server as a JSON API, the frontend needed to know exactly what to do with that JSON. It needed to be able to organize the different server endpoints (and its data) into models, as well as know how to take those models and render them into views. For example, a core concept such as “an account has some money in it” needed to be separately represented in the frontend codebase, as well as the server. This led to maintenance issues, and it made our application harder to test. The additional layer of frontend complexity made it even harder for new hires to be productive from day one. When we first saw this project on the horizon, we realized it would end up requiring a substantial refactor of our web app. We had a few options: Rewrite the JavaScript in a way that makes it simpler and easier to use. Don’t rewrite JavaScript. We went with option 2. Instead of using a client side MVC framework to help enable us to write a single page app, we opted to use our Rails server to render views, and we used server generated JavaScript responses to make the app feel just as snappy for our customers. We achieved the same UX wins as a single page app with a fraction of the code. Method to the Madness The crux of our new pattern is this: We use Rails’ unobtrusive JavaScript (ujs) library to declare that forms and links should be submitted using AJAX. Our server then gets an AJAX rest request as usual, but instead of rendering the data as JSON, it responds to the request with a snippet of JavaScript. That JavaScript gets evaluated by the browser. The “trick” here is that JavaScript is a simple call to jQuery’s html method, and we use Rails’ built-in partial view rendering to respond with all the HTML we need. Now, the frontend just needs to blindly listen to the server, and render the HTML as instructed. An Example As a simple example, let’s say we want to edit a user’s home address. Using the JavaScript single page app framework, we would need a few things. First, we want an address model, which we map to our “/addresses” endpoint. Next, we need a View, that represents our form for editing the address. We need a frontend template for that view. Then, we need a route in our frontend for navigating to this page. And for our server, we need to add a route, a controller, a model, and a jbuilder to render that model as JSON. A Better Way With our new paradigm, we can skip most of this. All we need is the server. We still have our route, controller, and model, but instead of a jbuilder for returning JSON, we can port our template to embedded Ruby, and let the server do all the work. Using UJS patterns, our view can live completely on the server. There are a few major wins here: Unifying our business logic. The server is responsible for knowing about (1) our data, (2) how to wrap that data into rich domain models that own our business logic, (3) how to render those models into views, and (4) how to render those views on the page. The client needs to know almost nothing. Less JavaScript. We aren’t getting rid of all the JavaScript in our application. Certain snappy user experience elements don’t work as well without JavaScript. Interactive elements, some delightful animations, and other frontend behaviors still need it. For these things, we are using HTML data elements to specify behaviors. For example, we can tag an element with a data-behavior-dropdown, and then we have some simple, well organized global JavaScript that knows how to wrap that element in some code that makes it more interactive. We are hoping that by using these patterns, we can limit our use of JavaScript to only know about how to enhance HTML, not how to automatically calculate net income when trying to distribute excess tax year contributions from an IRA (something that our frontend JavaScript used to know how to do). We can do this migration in small pieces. Even with this plan, migrating a highly complex web application isn’t easy. We decided to tackle it using a tab-by-tab approach. We’ve written a few useful helpers that allow us to easily plug in our new server-driven style into our existing Marionette application. By doing this piecemeal, we are hoping to bake in useful patterns early on, which we can iterate and use to make migrating the next part even simpler. If we do this right, we will be able to swap everything to a normal Rails app with minimal effort. Once we migrate to Rails 5, we should even be able to easily take advantage of Turbolinks 3, which is a conventionalized way to do regional AJAX updates. This new pattern will make building out newer and even more sophisticated features easier, so we can focus on encapsulating the business logic once. Onboarding new hires familiar with the Rails framework will be faster, and those who aren’t familiar can find great external (and internal) resources to learn it. We think that our Web app will be just as pleasant to use, and we can more quickly enhance and build new features going forward. 6 min read * MODERN DATA ANALYSIS: DON’T TRUST YOUR SPREADSHEET Modern Data Analysis: Don’t Trust Your Spreadsheet To conduct research in business, you need statistical computing that you easily reproduce, scale, and make accessible to many stakeholders. Just as the Ford Motor Company created efficiency with assembly line production and Pixar opened up new worlds by computerizing animation, companies now are innovating and improving the craft of using data to do business. Betterment is one of them. We are built from the ground up on a foundation of data. It’s only been about three decades since companies started using any kind of computer-assisted data analysis. The introduction of the spreadsheet defined the beginning of the business analytics era, but the scale and complexity of today’s data has outgrown that origin. To avoid time-consuming manual processes, and the human error typical of that approach, analytics has become a programming discipline. Companies like Betterment are hiring data scientists and analysts who use software development techniques to reliably answer business questions which have quickly expanded in scale and complexity. To do good data work today, you need to use a system that is reproducible, versionable, scalable, and open. Our analytics and data science team at Betterment uses these data best practices to quickly produce reliable and sophisticated insights to drive product and business decisions. A Short History of Data in Business First, a step back in the business time machine. With VisiCalc, the first-ever spreadsheet program, in 1979 and Excel in 1987, the business world stepped into two new eras in which any employee could manage large amounts of data. The bottlenecks in business analytics had been the speed of human arithmetic or the hours available on corporate mainframes operated by only a few specialists. With spreadsheet software in every cubicle, analytical horsepower was commoditized and Excel jockeys were crowned as the arbiters of truth in business. But the era of the spreadsheet is over. The data is too large, the analyses are too complex, and mistakes are too dangerous to trust to our dear old friend the spreadsheet. Ask Carmen Reinhart and Kenneth Rogoff, two Harvard economists who published an influential paper on sovereign debt and economic growth, only to find out that the results rested in part on the accidental omission of five cells from an average. Or ask the execs at JPMorgan who lost $6 billion in the ‘London Whale’ trading debacle, also due in part of poor data practices in Excel. More broadly, a 2015 survey of large businesses in the UK reported that 17% had experienced direct financial losses because of spreadsheet errors. It’s a new era with a new scale of data, and it’s time to define new norms around management of and inferences from business data. Requirements for Modern Data Analysis Spreadsheets fundamentally lack these properties essential to modern data work. To do good data work today, you need to use a system that is: Reproducible It’s not personal, but I don’t trust any number that comes without supporting code. That code should take me from the raw data to the conclusions. Most analyses contain too many important detailed steps to plausibly communicate in an email or during a meeting. Worse yet, it’s impossible to remember exactly what you’ve done in a point and click environment, so doing it the same way again next time is a crap shoot. Reproducible also means efficient. When an input or an assumption changes, it should be as easy as re-running the whole thing. Versionable Code versioning frameworks, such as git, are now a staple in the workflow of most technical teams. Teams without versioning are constantly asking questions like, “Did Jim send the latest file?”, “Can I be sure that my teammate selected all columns when he re-sorted?”, or “The bottom line numbers are different in this report; what exactly changed since the first draft?” These inefficiencies in collaboration and uncertainties about the calculations can be deadly to a data team. Sharing code in a common environment also enables the reuse of modular analysis components. Instead of four analysts all inventing their own method for loading and cleaning a table of users, you can share as a group the utils/LoadUsers() function and ensure you are talking about the same people at every meeting. Scalable There are hard technical limits to how large an analysis you can do in a spreadsheet. Excel 2013 is capped at just more than 1 million rows. It doesn’t take a very large business these days to collect more than 1 million observations of customer interactions or transactions. There are also feasibility limits. How long does it take your computer to open a million row spreadsheet? How likely is it that you’ll spot a copy-paste error at row 403,658? Ideally, the same tools you build to understand your data when you’re at 10 employees should scale and evolve through your IPO. Open Many analyses meet the above ideals but have been produced with expensive, proprietary statistical software that inhibits sharing and reproducibility. If I do an analysis with open-source tools like R or Python, I can post full end-to-end instructions that anyone in the world can reproduce, check, and expand upon. If I do the same in SAS, only people willing to spend $10,000 (or more if particular modules are required) can review or extend the project. Platforms that introduce compatibility problems between versions and save their data in proprietary formats may limit access to your own work even if you are paying for the privilege. This may seem less important inside a corporate bubble where everyone has access to the same proprietary platform, but it is at the very least a turnoff to most new talent in the field. I don’t hear anyone saying that expensive proprietary data solutions are the future. What to Use, and How Short answer: R or Python. Longer answer: Here at Betterment, we use both. We use Python more for data pipeline processes and R more for modeling, analyses, and reporting. But this article is not about the relative merits of these popular modern solutions. It is about the merits of using one of them (or any of the smaller alternatives). To get the most out of a programmatic data analysis workflow, it should be truly end-to-end, or as close as you can get in your environment. If you are new to one or both of these environments, it can be daunting to sort through all of the tools and figure out what does what. These are some of the most popular tools in each language organized by their layer in your full-stack analysis workflow: Full Stack Analysis R Python Environment RStudio iPython / Jupyter, PyCharm Sourcing Data RMySQL, rpostgresql, rvest, RCurl, httr MySQLdb, requests, bs4 Cleaning, Reshaping and Summarizing data.table, dplyr pandas Analysis, Model Building, Learning see CRAN Task Views NumPy, SciPy, Statsmodels, Scikit-learn Visualization ggplot2, ggvis, rCharts matplotlib, d3py, Bokeh Reporting RMarkdown, knitr, shiny, rpubs IPython notebook Sourcing Data If there is any ambiguity in this step, the whole analysis stack can collapse on the foundation. It must be precise and clear where you got your data, and I don’t mean conversationally clear. Whether it’s a database query, a Web-scraping function, a MapReduce job, or a PDF extraction, script it and include it in your reproducible process. You’ll thank yourself when you need to update the input data, and your successors and colleagues will be thankful they know what you’re basing your conclusions on. Cleaning, Reshaping, Summarizing Every dataset includes some amount of errant, corrupted, or outlying observations. A good analysis excludes them based on objective rules from the beginning and then tests for sensitivity to these exclusions later. Dropping observations is also one of the easiest ways for two people doing similar analyses to reach different conclusions. Putting this process in code keeps everyone accountable and removes ambiguity about how the final analysis set was reached. Analysis, Model Building, Learning You’ll probably only present one or two of the scores of models and variants you build and test. Develop a process where your code organizes and saves these variants rather than discarding the ones that didn’t work. You never know when you’ll want to circle back. Try to organize analyses in a structure similar to how you present them so that the connection from claims to details is easy to make. Visualization, Reporting Careful, a trap is looming. So many times, the chain of reproducibility is broken right before the finish line when plots and statistical summaries are copied onto PowerPoint slides. Doing so introduces errors, breaks the link between claims and process, and generates huge amounts of work in the inevitable event of revisions. R and Python both have great tools to produce finished reports as static HTML or PDF documents, or even interactive reporting and visualization products. It might take some time to convince the rest of your organization to receive reports in these more modern formats. Moving your organization towards these ideals is likely to be an imperfect and gradual process. If you’re the first convert, absolutism is probably not the right approach. If you have influence in the hiring process, try to push for candidates who understand and respect these principles of data science. In the near term, look for smaller pieces of the analytical workflow which would benefit especially from the efficiencies of reproducible, programmatic analysis and reporting. Good candidates are reports that are updated frequently, require extensive collaboration, or are constantly hung up on discussions over details of implementation or interpretation. Changing workflows and acquiring new skills is always an investment, but the dividends here are better collaboration, efficient iteration, transparency in process and confidence in the claims and recommendations you make. It’s worth it. 9 min read * ENGINEERING AT BETTERMENT: DO YOU HAVE TO BE A FINANCIAL EXPERT? Engineering at Betterment: Do You Have to Be a Financial Expert? When I started my engineering internship at Betterment, I barely knew anything about finance. By the end of the summer, I was working on a tool to check for money launderers and fraudsters. Last summer, I built an avatar creator for K-12 students. Now, a year later, I’m working on a tool to check for money launderers and fraudsters. How did I go from creating avatars with Pikachu ears to improving detection of financial criminals? Well, it was one part versatility of software engineering, one part courage to work in an industry I knew nothing about, and a dash of eagerness to learn as much as I could. I was on the verge of taking another internship in educational technology, commonly referred to as ‘edtech.’ But when I got the opportunity to work at Betterment, a rapidly growing company, I had to take it. Before my internship, finance, to me, was a field in which some of my peers would work more hours than I had hours of consciousness. Definitely not my cup of tea. I knew I didn’t want to work at a big bank, but I did want to learn more about the industry that employed 16.6% of my classmates at Yale. The name Betterment jumped out at me on a job listings page because it sounded like it would make my life ‘better.’ Betterment is a financial technology, or ‘fintech,’ company; while it provides financial services, it’s an engineering company at its core. Working here offered me the opportunity to learn about finance while still being immersed in tech startup culture. I was nervous to work in an industry I knew nothing about. But I soon realized it was just the opposite: Knowing less about finance motivated me to learn—quickly. When I started working at Betterment, I barely knew anything about finance. I couldn’t tell you what a dividend was. I didn’t know 401(k)s were employer-sponsored. My first task involved DTC participants, CUSIPs, and ACATS—all terms that I’d never heard before. (For the record, they stand for The Depository Trust Company, Committee on Uniform Security Identification Procedures, and Automated Customer Account Transfer Service, respectively.) A few days into my internship, I sat through a meeting about traditional and Roth IRAs wondering, what does IRA stand for? The unfortunate thing is that this is common for people my age. Personal finance is not something many college students think about—partially because it’s not taught in school and partially because we don’t have any money to worry about anyway. (Besides, no one wants to be an adult, right?) As a result, only 26% of 20-somethings have any money invested in stocks. At first, I thought my lack of exposure to finance put me at a disadvantage. I was nervous to work in an industry I knew nothing about. But I soon realized it was just the opposite: Knowing less about finance motivated me to learn—quickly. I started reading Robert Shiller’s Finance and the Good Society, a book my dad recommended to me months earlier. I searched every new term I came across and, when that wasn’t enough, asked my co-workers for help. Many of them took the time to draw diagrams and timelines to accompany their explanations. Soon enough, I had not only expanded my knowledge of engineering best practices, but I learned about dividends, tax loss harvesting, and IRAs (it stands for individual retirement account, in case you were wondering). The friendly atmosphere at Betterment and the helpfulness of the people here nurtured my nascent understanding of finance and turned me into someone who is passionate about investing. Before working at Betterment, I didn’t think finance was relevant to me. It took eight hours a day of working on a personal finance product for me to notice that the iceberg was even there. Now, I know that my money (well, the money I will hopefully have in the future) ideally should work hard for me instead of just sitting in a savings account. Luckily, I won’t have to struggle with building an investment portfolio or worry about unreasonable fees. I’ll just use Betterment. 4 min read * WOMEN WHO CODE: AN ENGINEERING Q&A WITH VENMO Women Who Code: An Engineering Q&A with Venmo Betterment recently hosted a Women in Tech meetup with Venmo developer Cassidy Williams, who spoke about impostor syndrome. Growing up, I watched my dad work as an electrical engineer. Every time I went with him on Take Your Child to Work Day, it became more and more clear that I wanted to be an engineer, too. In 2012, I graduated from the University of Portland with a degree in computer science and promptly moved to the Bay Area. I got my first job at Intel, where I worked as a Scala developer. I stayed there for several years until last May, when I uprooted my life to New York for Betterment, and I haven’t looked back since. As an engineer, I not only love building products from the ground up, but I’m passionate about bringing awareness to diversity in tech, an important topic that has soared to the forefront of social justice issues. People nationwide have chimed in on the conversation. Most recently, Isis Wenger, a San Francisco-based platform engineer, sparked the #ILookLikeAnEngineer campaign, a Twitter initiative designed to combat gender inequality in tech. At Betterment, we’re working on our own set of initiatives to drive the conversation. We’ve started an internal roundtable to voice our concerns about gender inequality in the workplace, we’ve sponsored and hosted Women in Tech meetups, and we’re starting to collaborate with other companies to bring awareness to the issue. Cassidy Williams, a software engineer at mobile payments company Venmo, recently came in to speak. She gave a talk on impostor syndrome, a psychological phenomenon in which people are unable to internalize their accomplishments. The phenomenon, Williams said, is something that she has seen particularly among high-achieving women—where self-doubt becomes an obstacle for professional development. For example, they think they’re ‘frauds,’ or unqualified for their jobs, regardless of their achievements. Williams’ goal is to help women recognize the characteristic and empower them to overcome it. Williams has been included as one of Glamour Magazine's 35 Women Under 35 Who Are Changing the Tech Industry and listed in the Innotribe Power Women in FinTech Index. As an engineer myself, I was excited to to speak with her after the event about coding, women in tech, and fintech trends. Cassidy Williams, Venmo engineer, said impostor syndrome tends to be more common in high-achieving women. Photo credit: Christine Meintjes Abi: Can you speak about a time in your life where ‘impostor syndrome’ was limiting in your own career? How did you overcome that feeling? Cassidy: For a while at work, I was very nervous that I was the least knowledgeable person in the room, and that I was going to get fired because of it. I avoided commenting on projects and making suggestions because I thought that my insight would just be dumb, and not necessary. But at one point (fairly recently, honestly), it just clicked that I knew what I was doing. Someone asked for my help on something, and then I discussed something with him, and suddenly I just felt so much more secure in my job. Can you speak to some techniques that have personally proven effective for you in overcoming impostor syndrome? Asking questions, definitely. It does make you feel vulnerable, but it keeps you moving forward. It's better to ask a question and move forward with your problem than it is to struggle over an answer. As a fellow software engineer, I can personally attest to experiencing this phenomenon in tech, but I’ve also heard from friends and colleagues that it can be present in non-technical backgrounds, as well. What are some ways we can all work together to empower each other in overcoming imposter syndrome? It's cliché, but just getting to know one another and sharing how you feel about certain situations at work is such a great way to empower yourself and empower others. It gets you both vulnerable, which helps you build a relationship that can lead to a stronger team overall. Whose Twitter feed do you religiously follow? InfoSec Taylor Swift. It's a joke feed, but they have some great tech and security points and articles shared there. In a few anecdotes throughout your talk, you mentioned the importance of having mentors and role models. Who are your biggest inspirations in the industry? Jennifer Arguello - I met Jennifer at the White House Tech Inclusion Summit back in 2013, where we hit it off talking about diversity in tech and her time with the Latino Startup Alliance. I made sure to keep in touch because I would be interning in the Bay Area, where she’s located, and we’ve been chatting ever since. Kelly Hoey - I met Kelly at a women in tech hackathon during my last summer as a student in 2013, and then she ended up being on my team on the British Airways UnGrounded Thinking hackathon. She and I both live in NYC now, and we see each other regularly at speaking engagements and chat over email about networking and inclusion. Rane Johnson - I met Rane at the Grace Hopper Celebration for Women in Computing in 2011, and then again when I interned at Microsoft in 2012. She and I started emailing and video chatting each other during my senior year of college, when I started working with her on the Big Dream Documentary and the International Women’s Hackathon at the USA Science and Engineering Festival. Ruthe Farmer - I first met Ruthe back in 2010 during my senior year of high school when I won the Illinois NCWIT Aspirations Award. She and I have been talking with each other at events and conferences and meetups (and even just online) almost weekly since then about getting more girls into tech, working, and everything in between. One of the things we chatted about after the talk was how empowering it is to have the resources and movements of our generation to bring more diversity to the tech industry. The solutions that come out of that awareness are game-changing. What are some specific ways in which companies can contribute to these movements and promote a healthier and more inclusive work culture? Work with nonprofits: Groups like NCWIT, the YWCA, the Anita Borg Institute, the Scientista Foundation, and several others are so great for community outreach and company morale. Educate everyone, not just women and minorities: When everyone is aware and discussing inclusion in the workplace, it builds and maintains a great company culture. Form small groups: People are more open to talking closely with smaller groups than a large discussion roundtable. Building those small, tight-knit groups promotes relationships that can help the company over time. It’s a really exciting time to be a software engineer, especially in fintech. What do you think are the biggest trends of our time in this space? Everyone's going mobile! What behavioral and market shifts can we expect to see from fintech in the next five to 10 years? I definitely think that even though cash is going nowhere fast, fewer and fewer people will ever need to make a trip to the bank again, and everything will be on our devices. What genre of music do you listen to when you’re coding? I switch between 80s music, Broadway show tunes, Christian music, and classical music. Depends on my feelings about the problem I'm working on. ;) IDE of choice? Vim! iOS or Android? Too tough to call. 7 min read * HOW WE BUILT BETTERMENT'S RETIREMENT PLANNING TOOL IN R AND JAVASCRIPT How We Built Betterment's Retirement Planning Tool in R and JavaScript Engineering Betterment’s new retirement planning tool meant finding a way to translate financial simulations into a delightful Web experience. In this post, we’ll dive into some of the engineering that took place to build RetireGuide™ and our strategy for building an accurate, responsive, and easy-to-use advice tool that implements sophisticated financial calculations. The most significant engineering challenge in building RetireGuide was turning a complex, research-driven financial model into a personalized Web application. If we used a research-first approach to build RetireGuide, the result could have been a planning tool that was mathematically sound but hard for our customers to use. On the other hand, only thinking of user experience might have led to a beautiful design without quantitative substance. At Betterment, our end goal is to always combine both. Striking the right balance between these priorities and thoroughly executing both is paramount to RetireGuide’s success, and we didn’t want to miss the mark on either dimension. Engineering Background RetireGuide started its journey as a set of functions written in the R programming language, which Betterment’s investment analytics team uses extensively for internal research. The team uses R to rapidly prototype financial simulations and visualize the results, taking advantage of R’s built-in statistical functions and broad set of pre-built packages. The investment analytics team combined their R functions using Shiny, a tool for building user interfaces in R, and released Betterment’s IRA calculator as a precursor to RetireGuide. The IRA calculator runs primarily in R, computing its advice on a Shiny server. This interactive tool was a great start, but it lives in isolation, away from the holistic Betterment experience. The calculator focuses on just one part of the broader set of retirement calculations, and doesn’t have the functionality to automatically import customers’ existing information. It also doesn’t assist users in acting on the results it gives. From an engineering standpoint, the end goal was to integrate much of the original IRA calculator’s code, plus additional calculations, into Betterment’s Web application to create RetireGuide as a consumer-facing tool. The result would let us offer a permanent home for our retirement advice that would be “always on” for our end customers. However, to complete this integration, we needed to migrate the entire advice tool from our R codebase into the Betterment Web application ecosystem. We considered two approaches: (1) Run the existing R code directly server-side, or (2) port our R code to JavaScript to integrate it into our Web application. Option 1: Continue Running R Directly Our first plan was to reuse the research code in R and let it continue to run server-side, building an API on top of the core functions. While this approach enabled us to reuse our existing R code, it also introduced lag and server performance concerns. Unlike our original IRA calculator, RetireGuide needed to follow the core product principles of the Betterment experience: efficiency, real-time feedback, and delight. Variable server response times do not provide an optimal user experience, especially when performing personalized financial projections. Customers looking to fine-tune their desired annual savings and retirement age in real time would have to wait for our server to respond to each scenario—those added seconds become noticeable and can impair functionality. Furthermore, because of the CPU-intensive nature behind our calculations, heavy bursts of simultaneous customers could compromise a given server’s response time. While running R server-side is a win on code-reuse, it’s a loss on scalability and user experience. Even though code reuse presented itself as a win, the larger concerns behind user experience, server lag, and new infrastructure overhead motivated us to rethink our approach, prioritizing the user experience and minimizing engineering overhead. Option 2: Port the R Code to JavaScript Because our Web application already makes extensive use of JavaScript, another option was to implement our R financial models in JavaScript and run all calculations client-side, on the end user’s Web browser. Eliminating this potential server lag solved both our CPU-scaling and usability concerns. However, reimplementing our financial models in a very different language exposed a number of engineering concerns. It eliminated the potential for any code reuse and meant it would take us longer to implement. However, in keeping with the company mission to provide smarter investing, it was clear that re-engineering our code was essential to creating a better product. Our process was heavily test-driven, during which product engineering reimplemented many of the R tests in JavaScript, understood the R code’s intent, and ported the code while modifying for client-side performance wins. Throughout the process, we identified several discrepancies between JavaScript and R function outputs, so we regularly reconciled the differences. This process added extra validation, testing, and optimizations, helping us to create the most accurate advice in our end product. The cost of maintaining a separate codebase is well worth the benefits to our customers and our code quality. A Win for Customers and Engineering Building RetireGuide—from R to JavaScript—helped reinforce the fact that no engineering principle is correct in all cases. While optimizing for code reuse is generally desirable, rewriting our financial models in JavaScript benefited the product in two noticeable ways: It increased testing and organizational understanding. Rewriting R to JavaScript enabled knowledge sharing and further code vetting across teams to ensure our calculations are 100% accurate. It made an optimal user experience possible. Being able to run our financial models within our customers’ Web browsers ensures an instant user experience and eliminates any server lag or CPU-concerns. 5 min read * MEET BLAZER: A NEW OPEN-SOURCE PROJECT FROM BETTERMENT (VIDEO) Meet Blazer: A New Open-Source Project from Betterment (video) While we love the simplicity and flexibility of Backbone, we’ve recently encountered situations where the Backbone router didn’t perfectly fit the needs of our increasingly sophisticated application. To meet these needs, we created Blazer, an extension of the Backbone router. We created an open-source project called Blazer to work as an extension of the Backbone router. All teams at Betterment are responsible for teasing apart complex financial concepts and then presenting them in a coherent manner, enabling our customers to make informed financial decisions. One of the tools we use to approach this challenge on the engineering team is a popular Javascript framework called Backbone. While we love the simplicity and flexibility of Backbone, we’ve recently encountered situations where the Backbone router didn’t perfectly fit the needs of our increasingly sophisticated application. To meet these needs, we created Blazer, an extension of the Backbone router. In the spirit of open-source software, we are sharing Blazer with the community. To learn more, we encourage you to watch the below video featuring Betterment’s Sam Moore, a lead engineer, who reveals the new framework at a Meetup in Betterment’s NYC offices. Take a look at Blazer. https://www.youtube.com/embed/F32QhaHFn1k 2 min read * DEALING WITH THE UNCERTAINTY OF LEGACY CODE Dealing With the Uncertainty of Legacy Code To complete our portfolio optimization, we had to tackle a lot of legacy code. And then we applied our learnings going forward. Last fall, Betterment optimized its portfolio, moving from the original platform to an upgraded trading platform that included more asset classes and the ability to weight exposure of each asset class differently for every level of risk. For Betterment engineers, it meant restructuring the underlying portfolio data model for increased flexibility. For our customers, it should result in better expected, risk-adjusted returns for investments. However, as our data model changed, pieces of the trading system also had to change to account for the new structure. While most of this transition was smooth, there were a few cases where legacy code slowed our progress. To be sure, we don't take changing our system lightly. While we want to iterate rapidly, we never compromise the security of our customers nor the correctness of our code. For this reason, we have a robust testing infrastructure and only peer-reviewed, thoroughly-tested code gets pushed through to production. What is legacy code? While there are plenty of metaphors and ways to define legacy code, it has this common feature: It’s always tricky to work with it. The biggest problem is that sometimes you're not always sure the original purpose of older code. Either the code is poorly designed, the code has no tests around it to specify its behavior, or both. Uncertainty like this makes it hard to build new and awesome features into a product. Engineers' productivity and happiness decrease as even the smallest tasks can be frustrating and time-consuming. Thus, it’s important for engineers to do two things well: (a) be able to remove existing legacy code and (b) not to write code that is likely to become legacy code in the future. Legacy code is a form of technical debt—the sooner it gets fixed, the less time it will take to fix in the future. How to remove legacy code During our portfolio optimization, we had to come up with a framework for dealing with pieces of old code. Here’s what we considered: We made sure we knew its purpose. If the code is not on any active or planned future development paths and has been working for years, it probably isn't worth it. Legacy code can take a long time to properly test and remove. We made a good effort to understand it. We talked to other developers who might be more familiar with it. During the portfolio update project, we routinely brought a few engineers together to diagram trading system flow on a whiteboard. We wrote tests around the methods in question. It's important to have tests in place before changing code to be as confident as possible that the behavior of the code is not changing during refactoring. Hopefully, it is possible to write unit tests for at least a part of the method's behavior. Write unit tests for a piece of the method, then refactor that piece. Test, repeat, test. Once the tests are passing, write more tests for the next piece, and repeat the test, refactor, test, refactor process. Fortunately, we were able to get rid of most of the legacy code encountered during the portfolio optimization project using this method. Then there are outliers Yet sometimes even the best practices still didn’t apply to a piece of legacy code. In fact, sometimes it was hard to even know where to start to make changes. In my experience, the best approach was to jump in and rewrite a small piece of code that was not tested, and then add tests for the rewritten portion appropriately. Write characterization tests We also experimented with characterization tests. First proposed by Michael Feathers (who wrote the bible on working with legacy code) these tests simply take a set of verified inputs/outputs from the existing production legacy code and then assert that the output of the new code is the same as the legacy code under the same inputs. Several times we ran into corner cases around old users, test users, and other anomalous data that caused false positive failures in our characterization tests. These in turn led to lengthy investigations that consumed a lot of valuable development time. For this reason, if you do write characterization tests, we recommend not going too far with them. Handle a few basic cases and be done with them. Get better unit or integration tests in place as soon as possible. Build extra time into project estimates Legacy code can also be tricky when it comes to project estimates. It is notoriously hard to estimate the complexity of a task when it needs to be built into or on top of a legacy system. In our experience, it has always taken longer than expected. The portfolio optimization project took longer than initially estimated. Also, if database changes are part of the project (e.g. dropping a database column that no longer makes sense in the current code structure), it's safe to assume that there will be data issues that will consume a significant portion of developer time, especially with older data. Apply the learnings to future The less legacy code we have, the less we have to deal with the aforementioned processes. The best way to avoid legacy code is to make a best effort at not writing in the first place. The best way to avoid legacy code is to make a best effort at not writing it in the first place. For example, we follow a set of pragmatic design principles drawn from SOLID (also created by Michael Feathers) to help ensure code quality. All code is peer reviewed and does not go to production if there is not adequate test coverage or if the code is not up to design standards. Our unit tests are not only to test behavior and drive good design, but should also be readable to the extent that they help document the code itself. When writing code, we try to keep in mind that we probably won't come back later and clean up the code, and that we never know who the next person to touch this code will be. Betterment has also established a "debt day" where once every month or two, all developers take one day to pay down technical debt, including legacy code. The Results It's important to take a pragmatic approach to refactoring legacy code. Taking the time to understand the code and write tests before refactoring will save you headaches in the future. Companies should strive for a fair balance between adding new features and refactoring legacy code, and should establish a culture where thoughtful code design is a priority. By incorporating many of these practices, it is steadily becoming more and more fun to develop on the Betterment platform. And the Betterment engineering team is avoiding the dreaded productivity and happiness suck that happens when working on systems with too much legacy code. Interested in engineering at Betterment? Betterment is an engineering-driven company that has developed the most-trusted online financial advisor based on the principles of optimization and efficiency. Learn more about engineering jobs and our culture. Determination of most trusted online financial advisor reflects Betterment LLC's distinction of having the most customers in the industry, made in reliance on customer counts, self-reported pursuant to SEC rules, across all online-only registered investment advisors. 7 min read * THIS IS HOW YOU BOOTSTRAP A DATA TEAM This Is How You Bootstrap a Data Team Data alone is not enough—we needed the right storytellers. Six months ago, I packed up my travel-sized toothbrush kit, my favorite coffee mug now filled with pens and business cards, and a duffel bag full of gym socks and free conference tee-shirts. With my start-up survival kit in tow, it was time to move on from my job as a back-office engineer. From the left: Avi Lederman, data warehousing engineer; Yuriy Goldman, engineering lead; Jon Mauney, data analyst; Nick Petri, data analyst; and Andrew Weisgall, marketing analyst. I dragged my chair ten feet across the office and began my new life as the engineering lead of Betterment’s nascent data team—my new mates included two talented data analysts, a data warehousing engineer and a marketing analyst, also the product owner. I was thrilled. There was a lot for us to do. In our new roles, we are now informing and guiding many of the ongoing product and marketing efforts at Betterment. Thinking big, we decided to dub ourselves Team Polaris after the sky's brightest star. Creating a tighter feedback loop Even though our move to create an in-house data team was a natural part of our own engineering team evolution here at Betterment, it’s still something of a risky unknown for most companies. Business intelligence tooling has traditionally been something that comes at a great upfront cost to an organization (it can reach into the millions of dollars)—but as a startup, we instead looked carefully at how we could leverage our homegrown talent and resources to build a team to seamlessly integrate into the existing company architecture. Specifically, we wanted a tight feedback loop between the business and technology so that we could experiment and figure out what worked before committing real dollars to a solution—aka high-frequency hypothesis testing. We needed a team responsible for collecting, curating and presenting the data—and our data had to be trustworthy for objective metric-level reporting to the organization. Our work consisted of collaborating with our marketing, analytics, and product teams to establish systems and practices that: Measure progress towards high level goals Optimize growth and conversion Support product and project strategy Improve customer outcome A guide to tactical decisions With these requirements in mind, here are some of the tactical decisions we made from the start to get our new data team off the ground. In the future, expect to read more from our team about how we use our data insights to drive product and growth development at Betterment. 1. Define our process For us the obvious first order of business was to deliver continuous, incremental value and gradual transition from legacy systems to new ones. Our initial task was to interview internal stakeholders to get at their data-related pain points. We sent out questionnaires in advance but collected answers through face-to-face dialogue. A couple of hours of focused conversation defined a six-month tactical focus for the team. Then, with our meticulous notes compiled, it became clear to us that our major challenges lay with the accessibility to and reliability of key performance metrics. With the interviews in hand, the team sat down to pen a manifest and define pillars by which we would measure our progress. We came up with ACES: Automated, Consistent, Efficient, and Self-serviced as the motifs by which we could create a measurable feedback loop. 2. Inform the roadmap Within three weeks of operations, it became clear that we could use turn-around time metrics from ad-hoc or advisory requests to inform us where we need to invest in project cycles and technology. Yet busy with data projects we were feeling the pain ourselves. We needed more easily accessible business measures with sufficient context by which we and our colleagues could roll up or slice and dice our data. We knew that a star schema approach would help us clarify a data narrative and give all of us a consistent view of truth. But there was no way for us to do it all at once. 3. Limit disruption while we build To limit disruption to our colleagues while delivering incremental improvements, we implemented a clever and completely practical transition plan within MySQL’s native feature set. Specifically, we set up a new database server dedicated to reporting and ad-hoc workloads. This dedicated MySQL instance consisted of three database schemas we now refer to as our Triumvirate Data Warehouse. The first member of this triad is betterment_live. This database is a complete, real-time, read-only replica of our production database. It’s just native MySQL master-slave replication; easy to set up and maintain on dedicated hardware or in the cloud. The second member is client_analytics. It is a read-write schema to which our colleagues have full privileges. The usage pattern is for folks to connect to client_analytics and from there to: cross-query against the betterment_live schema, import/export and manipulate custom datasets with Python or R, perform regression and analysis, etc. Everybody wins. Our data workers retain their ability to run existing processes until we can transition them to a “better” way while the engineering team has successfully expelled business users out of an already busy production environment. Last but certainly not least is our new baby, the data warehouse. It is a read-only, star-schema representation of fact and dimensional tables for growth subject areas. We’ve pushed the aforementioned nuisance and complexity into our data pipeline (ETL) process and are able to synthesize atomic and summary metrics in a format that is more intuitive for our business users. Legacy workloads that are complex and underperforming can now be transitioned over to the data warehouse schema incrementally. Further, because all three schemas live in the same MySQL server, client_analytics becomes a central hub from which our colleagues can join tables that have not yet been modeled in the warehouse with key dimensions that have been. They get the best of both worlds while we look to what comes next Finally, transition is prioritized in-stream with the needs of the organization and we never bite off more than we can chew. 4. Standardize and educate A major part of our data warehouse build out was in clarifying definitions of business terms and key metrics present in our daily parlance. Maintaining a Data Dictionary wiki became a part of our Definition of Done. Our dashboards, displayed on large screen TVs and visible by all, were the first to be relabeled and remodeled. Reports available to the entire office were next. Cleaning up the most looked at metrics helped the organization speak to and understand key data in a consistent manner. 5. Maintain a tight feedback loop The team follows an agile process familiar to modern technology organizations. We Scrum, we Git, and we Jenkins. We stay in regular contact with stakeholders throughout a build-out and iterate over MVPs. Now, back to the future These are just the first few bootstrapping steps. In future posts I will be tempted to wax technical and provide more color on the choices we’ve made and why. I will also share our vision for an Event Narrative Data Warehouse and how we are leveraging start-up friendly partners such as MixPanel for real-time event processing, funneling, and segmentation. Finally, we will share some tactics for enabling data scientists to be more collaborative and presentational with their R or Python visualizations. At Betterment, our ultimate goal is to continue developing products that change the investing world—and that starts with data. But data alone is not enough—we needed the right storytellers. As we see it, the members of Team Polaris are the bards of a data narrative that help the organization grow while delivering a top-tier product. Interested in engineering at Betterment? Betterment is an engineering-driven company that has developed the most trusted online financial advisor based on the principles of optimization and efficiency. Learn more about engineering jobs and our culture. Determination of most trusted online financial advisor reflects Betterment LLC's distinction of having the most customers in the industry, made in reliance on customer counts, self-reported pursuant to SEC rules, across all online-only registered investment advisors. 7 min read * ONE MASSIVE MONTE CARLO, ONE VERY EFFICIENT SOLUTION One Massive Monte Carlo, One Very Efficient Solution We optimized our portfolio management algorithms in six hours for less than $500. Here’s how we did it. Optimal portfolio management requires managing a portfolio in real-time, including taxes, rebalancing, risk, and circumstantial variables like cashflows. It’s our job to fine-tune these to help our clients, and it’s very important we have these decisions be robust to the widest possible array of potential futures they might face. We recently re-optimized our portfolio to include more complex asset allocations and risk models (and it will soon be available). Next up was optimizing our portfolio management algorithms, which manage cashflows, rebalances, and tax exposures. It’s as if we optimized the engine for a car, and now we needed to test it on the race track with different weather conditions, tires, and drivers. Normally, this is a process that can literally take years (and may explain why legacy investing services are slow to switch to algorithmic asset allocation and advice.) But we did things a little differently, which saved us thousands of computing hours and hundreds of thousands of dollars. First, the Monte Carlo The testing framework we used to assess our algorithmic strategies needed to fulfill a number of criteria to ensure we were making robust and informed decisions. It needed to: Include many different potential futures Include many different cash-flow patterns Respect path dependence (taxes you pay this year can’t be invested next year) Accurately test how the algorithm would perform if run live. To test our algorithms-as-strategies, we simulated the thousands of potential futures they might encounter. Each set of strategies was confronted with both bootstrapped historical data and novel simulated data. Bootstrapping is a process by which you take random chunks of historical data and re-order it. This made our results robust to the risk of solely optimizing for the past, a common error in the analysis of strategies. We used both historic and simulated data because they complement each other in making future-looking decisions: The historical data allows us to include important aspects of return movements, like auto-correlation, volatility clustering, correlation regimes, skew, and fat tails. It is bootstrapped (sampled in chunks) to help generate potential futures. The simulated data allows us to generate novel potential outcomes, like market crashes bigger than previous ones, and generally, futures different than the past. The simulations were detailed enough to replicate how they’d run in our live systems, and included, for example, annual tax payments due to capital gains over losses, cashflows from dividends and the client saving or withdrawing. It also showed how an asset allocation would perform over the lifetime of an investment. During our testing, we ran over 200,000 simulations of 12 daily level returns of our 12 asset classes for 20 year's worth of returns. We included realistic dividends at an asset class level. In short, we tested a heckuva a lot of data. Normally, running this Monte Carlo would have taken nearly a full year to complete on a single computer, but we created a far more nimble system by piecing together a number of existing technologies. By harnessing the power of Amazon Web Services (specifically EC2 and S3) and a cloud-based message queue called IronMQ we reduced that testing time to just six hours—and for a total cost of less than $500. How we did it 1. Create an input queue: We created a bucket with every simulation—more than 200,000—we wanted to run. We used IronMQ to manage the queue, which allows individual worker nodes to pull inputs themselves instead of relying on a system to monitor worker nodes and push work to them. This solved the problem found in traditional systems where a single node acts as the gatekeeper, which can get backed up, either breaking the system or leading to idle testing time. 2. Create 1,000 worker instances: With Amazon Cloud Service, we signed up to access time on 1,000 virtual machines. This increased our computing power by a thousandfold, and buying time is cheap on these machines. We employed the m1.small instances, relying on the quality of quantity. 3. Each machine pulls a simulation: Thanks the the maturation of modern message queues it is more advantageous and simple to orchestrate jobs in a pull-based fashion, than the old push system, as we mentioned above. In this model there is no single controller. Instead, each worker acts independently. When the worker is idle and ready for more work, it takes it upon itself to go out and find it. When there’s no more work to be had, the worker shuts itself down. 4. Store results in central location: We used another Amazon Cloud service called S3 to store the results of each simulation. Each file — with detailed asset allocation, tax, trading and returns information — was archived inexpensively in the cloud. Each file was also named algorithmically to allow us to refer back to it and do granular audits of each run. 5. Download results for local analysis: From S3, we could download the summarized results of each of our simulations for analysis on a "regular" computer. The resulting analytical master file was still large, but small enough to fit on a regular MacBook Pro. We ran the Monte Carlo simulations over two weekends. Keeping our overhead low, while delivering top-of-the-line portfolio analysis and optimization is a key way we keep investment fees as low as possible. This is just one more example of where our quest for efficiency—and your happiness—paid off. This post was written with Dan Egan. 5 min read * ENGINEERING THE TRADING PLATFORM: INSIDE BETTERMENT’S PORTFOLIO OPTIMIZATION Engineering the Trading Platform: Inside Betterment’s Portfolio Optimization To complete the portfolio optimization, Betterment engineers needed enhance the code in our existing trading platform. Here's how they did it. In just a few weeks, Betterment is launching an updated portfolio -- one that has been optimized for better expected returns. The optimization will be partly driven by a more sophisticated asset allocation algorithm, which will dynamically vary individual asset allocations within the stock and bond basket based on a goal’s overall allocation. This new flexible set of asset allocations significantly affects our current trading processes. Until now, we executed transactions based on fixed weights or a precise allocation of assets to every level of risk. Now, in our updated portfolio with a more sophisticated way to allocate, we are using a matrix to manage asset weights—and that requires more complex trading logic. From an engineering perspective, this means we needed to enhance the code in our existing trading platform to accommodate dynamic asset allocation, with an eye towards future enhancements in our pipeline. Here's how we did it. 1. Build a killer testing framework When dealing with legacy code, one of our top priorities is to preserve existing functionality. Failure to do so could mean anything from creating a minor inconvenience to blocking trades from executing. That means the next step was to build a killer testing framework. The novelty of our approach was to essentially build partial, precise scaffolding around our current platform. This kind of scaffolding allowed us to go in and out of the current platform to capture and store precise inputs and outputs, while isolating them away from any unnecessary stuff that wasn’t relevant to the core trading processes. 2. Isolate the right information With this abstraction, we were able to isolate the absolute core objects that we need to perform trades, and ignore the rest. This did two things: it took testing off the developers’ plates early in the process, allowing them to focus on writing production code, and also helped isolate the central objects that required most of their attention. The parent object of any activity inside the Betterment platform is a “user transaction” — that includes deposits or withdrawals to a goal, dividends, allocation changes, transfer of money between goals and so on. The parent object of any activity inside the Betterment platform is a “user transaction” — that includes deposits or withdrawals for a goal, dividends, allocation changes, transfer of money between goals and so on. These were our inputs. In most cases, a user transaction will eventually be the parent of several trade objects. These were our outputs. In our updated portfolio, the number of possible transactions types did not change. What did change, however, was how each transaction type was translated into trading activity, which is what we wanted to test exhaustively. We captured a mass of user transaction objects from production for use in testing. However, a user transaction object contains a host of data that isn’t relevant to the trades that will eventually be created, and is associated with other objects that are also not relevant. So stripping out all non-trading data was the key to focusing on the right things to test for this project. 3. Use SQLite database to be efficient The best way to store the user transaction objects was to use JSON, a human-readable representation of Java objects. To do this, we used GSON, which lets you convert Java objects into JSON, and vice versa. We didn’t want to store the JSON in a MySQL database, because managing it would be unnecessary overhead for this purpose. Instead, we stored them in a flat SQLite database. On the way into SQLite, GSON allowed us to “flatten” the objects, leaving only the bits that pertained to trading and discarding the rest. Then, we could rearrange these chunks to replicate all sorts of trading activity patterns. On the way out, GSON would re-inflate the JSON back into Java objects, using dummy values for the irrelevant fields, providing us with test inputs ready to be pushed through our system. We did the same for outputs, which were also full of “noise” for our purposes. We’d shrink the expected results we got from production, then re-inflate and compare them to what our tests produced. 4. Do no harm to others' work At Betterment, we are constantly pushing through new features and enhancements, some visible to customers, but many not. Development on these is concurrent, sometimes impacting global objects and schemas, and it was essential to insulate the team working on core trading functionality from all other development being done at the company. Just the portfolio transition work alone includes significant new code for front-end enhancements which have nothing to do with trading. The GSON/JSON/SQLite testing framework helped the trading team maintain laser focus on their task, as they worked under the hood. Otherwise, we’d be putting a sweet new set of tires on a car that won’t start! 5 min read * THREE THINGS I LEARNED IN MY ENGINEERING INTERNSHIP Three Things I Learned In My Engineering Internship I knew I had a lot to learn about how a Web app works, but I never imagined that it involved as much as it does. This post is part of series of articles written by Betterment’s 2013 summer interns. This summer, I had the privilege of participating in a software engineering internship with Betterment. My assignment was to give everyone in the office a visual snapshot of how the company is doing. This would be accomplished through the use of dashboards displayed on TV screens inside the office. We wanted to highlight metrics such as net deposits, assets under management, and conversions from visitors to the site into Betterment customers. Coming in with experience in only Java, this was definitely a challenging project to tackle. Now that the summer has ended, I have accomplished my goal — I created five dashboards displaying charts, numbers and maps with valuable data that everyone can see. From this experience, there are three very important things that I’ve learned. 1. School has taught me nothing. Maybe this is a bit of an exaggeration. As a computer science major, school has taught me how to code in Java, and maybe some of the theoretical stuff that I’ve had drilled into my head will come in handy at some point in my life. However, writing mathematical proofs and small Java codes that complete standalone tasks seems pretty pointless now that I’ve experienced the real world of software development. There are so many links in the development chain, and what I have learned in school barely covers half of a link. Not to mention almost everything else I needed I was able to learn through Google, which makes me wonder if I could have learned Java through the Internet in a few weeks rather than spending the past two years in school? Needless to say I definitely wish I could stay and work with Betterment rather than going back to school next week, but today’s society is under the strange impression that a college degree is important, so I guess I’ll finish it out. 2. The structure of a Web app is a lot more complex than what the user sees on the page. Before I began my internship, I had never worked on a Web app before. I knew I had a lot to learn about how it all works, but I never imagined that it involved as much as it does. There’s a database on the bottom, then the backend code is layered on top of that — and then that is broken up into multiple levels in order to keep different kinds of logic separate. And on top of all that, is the front end code. All of it is kept together with frameworks that allow the different pieces to communicate with each other, and there are servers that the app needs to run on.This was extremely eye-opening for me, and I’m so glad that the engineers at Betterment spent time during my first week getting me up to speed on all of it. I was able to build my dashboards as a Web app, so I not only needed to understand this structure, but I needed to implement it as well. 3. A software engineer needs to be multilingual. I’m not talking about spoken languages. The different pieces in the structure of a web app are usually written in different computer languages. Being that Java only covered a small piece of this structure, I had a lot of languages to learn. Accessing the database requires knowledge of SQL, a lot of scripts are written in Python, front end structure and design is written in HTML and CSS, and front end animation is written in javascript. In order to effectively work on multiple pieces of an app, an engineer needs to be fluent in multiple different languages. Thankfully, the Internet makes learning languages quick and easy, and I was able to pick up on so many new languages throughout the summer. My experience this summer has been invaluable, and I will be returning to school with a brand new view on software development and what a career in this awesome field will be like. 4 min read * KEEPING OUR CODE BASE SIMPLE, OPTIMALLY Keeping Our Code Base Simple, Optimally Betterment engineers turned regulatory compliance rules into an optimization problem to keep the code base simple. Here's how they did it. At Betterment, staying compliant with regulators, such as the Securities and Exchange Commission, is a part of everyday life. We’ve talked before about how making sure everything is running perfectly -- especially given all the cases we need to handle -- makes us cringe at the cyclomatic complexity of some of our methods. It’s a constant battle to keep things maintainable, readable, testable, and efficient. We recently put some code into production that uses an optimizer to cut down on the amount of code we’re maintaining ourselves, and it turned out to be pretty darn cool. It makes communicating with our regulators easier, and is doing so in a pretty impressive fashion. We were tasked with coming up with an algorithm that, at first pass, made me nervous about all the different cases it would need to handle in order to do things intelligently. Late one night, we started bouncing ideas off each other on how to pull it off. We needed to make decisions at a granular level, test how they affected the big picture, and then adjust accordingly. To use a Seinfield analogy, the decisions we would make for Jerry had an effect on what the best decisions were for Elaine. But, if Elaine was set up a certain way, we wanted to go back to Jerry and adjust the decisions we made for him. Then George. Then Newman. Then Kramer. Soon we had thought about so many if-statements that they no longer seemed like if-statements, and all the abstractions I was formulating were already leaking. Then a light came on. We could not only make good decisions for Elaine, Jerry, and Newman, we could make those decisions optimally. A little bit of disclaimer here before we start digging in a little more: I can barely scratch the surface of how solvers work. I just happen to know that it was a tool available to us, and it happened to model the problem we needed to solve very well. This is meant as an introduction to using one specific solver as a way to model and solve a problem. An example Let’s say at the last minute, the Soup Nazi is out to make the biggest batch of soup he possibly can. For his recipe he needs a ratio of: 40% chicken 12% carrots 8% thyme 15% onions 15% noodles 5% garlic 5% parsley All of the stores around him only keep limited amounts in stock. He calls around to all the stores just to see what the have in stock and puts together each store’s inventory: Ingredients in stock (lbs) Elaine’s George’s Jerry’s Newman’s Chicken 5 6 2 3 Carrots 1 8 5 2 Thyme 3 19 16 6 Onions 6 12 10 4 Noodles 5 0 3 9 Garlic 2 1 1 0 Parsley 3 6 2 1 Also, the quality of the bags at all of the stores vary, limiting the total number pounds of food the Soup Nazi can carry back. (We’re also assuming he only wants to make at most one visit to each store.) Pound of food limits Elaine’s 12 George’s 8 Jerry’s 15 Newman’s 17 With the optimizer, the function that we are trying to minimize or maximize is called the objective function. In this example, we are trying to maximize the number of pounds of ingredients he can buy because that will result in the most soup. If we say that, a1 = pounds of chicken purchased from Elaine’s a2 = pounds of carrots purchased from Elaine’s a3 = pounds of thyme purchased from Elaine’s … a7 = pounds of parsley purchased from Elaine’s b1 = pounds of chicken purchased from George’s … c1 = pounds of chicken purchased from Jerry’s … d1 = pounds of chicken purchased from Newman’s … We’re looking to maximize, a1 + a2 + a3 … + b1 + … + d7 = total pounds We then have to throw in all of the constraints to our problem. First to make sure the Soup Nazi gets the ratio of ingredients he needs: .40 * total pounds = a1 + b1 + c1 + d1 .12 * total pounds = a2 + b2 + c2 + d2 .08 * total pounds = a3 + b3 + c3 + d3 .15 * total pounds = a4 + b4 + c4 + d4 .15 * total pounds = a5 + b5 + c5 + d5 .05 * total pounds = a6 + b6 + c6 + d6 .05 * total pounds = a7 + b7 + c7 + d7 Then to make sure that the Soup Nazi doesn’t buy more pounds of food from one store than he can carry back: a1 + a2 + … + a7 <= 12 b1 + b2 + … + b7 <= 8 c1 + c2 + … + c7 <= 15 d1 + d2 + … + d7 <= 17 We then have to put bounds on all of our variables to say that we can’t take more pounds of any ingredient than any store has in stock. 0 <= a1 <= 5 0 <= a2 <= 1 0 <= a3 <= 3 0 <= a4 <= 6 … 0 <= d7 <= 1 That expresses all of the constraints and bounds to our problem and the optimizer works to maximize or minimize the objective function subject to those bounds and constraints. The optimization package we’re using in this example, python’s scipy.optimize, provides a very expressive interface for specifying all of those bounds and constraints. Translating the problem into code If you want to jump right in, check out the full sample code. However, there are still a few more things to note: Get numpy and scipy installed. The variables we’re solving for are put into a single list. That means, x = [a1, a2, … , a7, b1, b2 … d7]. With python, it’s helpful to know that we can pull the pounds of food for a particular ingredient out of x, i.e, [a1, b1, c1, d1] with x[ingredient_index :: num_of_ingredients] Likewise, we can pull out the ingredients for a given store with x[store_index * num_of_ingredients : store_index * num_of_ingredients + num_of_ingredients] e.g, [b1, b2, b3, b4, b5, b6, b7] For this example, we’re using the scipy.optimize.minimize function using the ‘NLSQP’ method. Arguments provided to the minimize function Objective function With the package we’re using, there is no option to maximize. This might seem like a show stopper, but we get around it by negating our objective function, minimizing, and then negating the results. Therefore our objective function becomes, −a1 − a2 − a3 − a4 − … − d6 − d7 And expressing that with numpy is pretty painless: numpy.sum(x) * −1.0 Bounds Bounds make sure that we don’t take more than any one ingredient than the store has in stock. The minimize function takes this in as a list of tuples where the indices line up with x. We can’t take negative ingredients from the store, so the lower bound it always 0. Therefore, [(0, 5), (0, 1) … (0, 1)] In the code example, for readability, I threw all of the inputs into the program into some globals dictionaries. Therefore, we can calculate our bounds with, def calc_bounds(): bounds = [] for s in stores: for i in ingredients: bounds.append((0, store_inventory[s][i])) return bounds Guess Providing a good initial guess can go a long way in getting you to a desirable solution. It can also dramatically reduce the amount of time it takes to solve a problem. If you’re not seeing numbers you expect, or it is taking a long time to come up with a solution, the initial guess is often the first place to start. For this problem, we made our initial guess to be what each store had in stock, and we supplied it to the minimize method as a list. Constraints One thing to note is that for the packages we’re using, constraints only deal with ‘ineq’ and ‘eq’ where ‘ineq’ means greater than. The right hand side of the equation is assumed to be zero. Also, we are providing the constraints as tuple of dictionaries. (a1 + b1 + c1 + d1) − (.40 * total pounds) > 0 ... (a7 + b7 + c7 + d7) − (.05 * total pounds) > 0 Note here that I changed the constraints from equal-to to greater-than because comparing floats to be exactly equal is a hard problem when you’re multiplying and adding numbers. Therefore, to make sure we limit chicken to 40% of the overall ingredients, one element of the constraints tuple will be, {'type' : 'ineq', 'fun' : lambda x : sum(extract_ingredient_specific_pounds(x, chicken)) − (calc_total_pounds_of_food(x) * .4) } Making sure the soup nazi is able to carry everything back from the store: 12 − a1 − a2 − … − a7 >= 0 … 17 − d1 − d2 − … − d7 >= 17 Leads to, {'type' : 'ineq', 'fun' : lambda x : max_per_store[store] − np.sum(extract_store_specific_pounds(x, store))} Hopefully this gives you enough information to make sense of the code example. The Results? Pretty awesome. The Soup Nazi should only buy a total of 40 lbs worth ingredients because Elaine, George, Jerry, and Newman just don’t have enough chicken. 9.830 lbs of food from Elaine's. Able to carry 12.0 pounds. chicken: 5.000 lbs (5.0 in stock) carrots: 0.000 lbs (1.0 in stock) thyme: 0.000 lbs (3.0 in stock) onions: 0.699 lbs (6.0 in stock) noodles: 1.000 lbs (5.0 in stock) garlic: 1.565 lbs (2.0 in stock) parsley: 1.565 lbs (3.0 in stock) 7.582 lbs of food from George's. Able to carry 8.0 pounds. chicken: 6.000 lbs (6.0 in stock) carrots: 0.667 lbs (8.0 in stock) thyme: 0.183 lbs (19.0 in stock) onions: 0.733 lbs (12.0 in stock) noodles: 0.000 lbs (0.0 in stock) garlic: 0.000 lbs (1.0 in stock) parsley: 0.000 lbs (6.0 in stock) 13.956 lbs of food from Jerry's. Able to carry 15.0 pounds. chicken: 2.000 lbs (2.0 in stock) carrots: 3.501 lbs (5.0 in stock) thyme: 3.017 lbs (16.0 in stock) onions: 4.568 lbs (10.0 in stock) noodles: 0.000 lbs (3.0 in stock) garlic: 0.435 lbs (1.0 in stock) parsley: 0.435 lbs (2.0 in stock) 8.632 lbs of food from Newman's. Able to carry 17.0 pounds. chicken: 3.000 lbs (3.0 in stock) carrots: 0.632 lbs (2.0 in stock) thyme: 0.000 lbs (6.0 in stock) onions: 0.000 lbs (4.0 in stock) noodles: 5.000 lbs (9.0 in stock) garlic: 0.000 lbs (0.0 in stock) parsley: 0.000 lbs (1.0 in stock) 16.000 lbs of chicken. 16.0 available across all stores. 40.00% 4.800 lbs of carrots. 16.0 available across all stores. 12.00% 3.200 lbs of thyme. 44.0 available across all stores. 8.00% 6.000 lbs of onions. 32.0 available across all stores. 15.00% 6.000 lbs of noodles. 17.0 available across all stores. 15.00% 2.000 lbs of garlic. 4.0 available across all stores. 5.00% 2.000 lbs of parsley. 12.0 available across all stores. 5.00% Bringing it all together Hopefully this gives you a taste of the types of problems optimizers can be used for. At Betterment, instead of picking pounds of ingredients from a given store, we are using it to piece together a mix of securities, in order to keep us compliant with certain regulatory specifications. While there was a lot of work involved in making our actual implementation production-ready (and a lot more work can be done to improve it), being able to express rules coming out of a regulatory document as a series of bounds and constraints via anonymous functions was a win for the readability of our code base. I’m also hoping that it will make tacking on additional rules painless in comparison to weaving them into a one off algorithm. 8 min read Show more SUBSCRIBE TO NEW ARTICLES Email* Notification Frequency Add to your RSS Feed JOIN OUR OPEN SOURCE PROJECTS * TEST_TRACK Server app for the TestTrack multi-platform split-testing and feature-gating system. See it on GitHub * WEBVALVE Betterment’s framework for locally developing and testing service-oriented apps in isolation with WebMock and Sinatra-based fakes. See it on GitHub * BETTER_TEST_REPORTER Tooling and libraries for processing dart test output into dev-friendly formats. See it on GitHub * DELAYED A multi-threaded, SQL-driven ActiveJob backend used at Betterment to process millions of background jobs per day. See it on GitHub Triangle illustration COME BUILD WITH US. Explore open roles Betterment Logo Icon SVG Footer * Accounts * Investing * IRAs and 401(k)s * Roth IRAs * Cash Reserve * Checking * Trusts * Investments * Portfolio options * Socially responsible investing * Tax-smart investing * Charitable giving * 401(k) rollovers * Retirement income * Tools * Retirement planning * Track your goals * All-in-one dashboard * Compare robo-advisors * Rewards * Refer-a-friend program * Help * Help center * FAQ * Expert guidance * Investment philosophy * Article library * Video * Legal * Company * Pricing * About us * Mobile app * How Betterment works * Product roadmap * Press * Betterment shop * Careers * Betterment on Instagram * Betterment on Facebook * Betterment on Twitter * Betterment on LinkedIn * * * Contact us * Betterment 401(k) * Betterment for Advisors This page is operated and maintained by Betterment Holdings Inc. and it is not associated with Betterment LLC or MTG LLC. © 2021 Betterment Holdings Inc. You are viewing a web property located at Betterment.com. Different properties may be provided by a different entity with different marketing standards. * Site Map * Terms of Use * Privacy Policy * Trademark * Legal Directory Google Play and the Google Play logo are trademarks of Google, Inc. Apple, the Apple logo, and iPhone are trademarks of Apple, Inc., registered in the U.S. Any links provided to other websites are offered as a matter of convenience and are not intended to imply that Betterment or its authors endorse, sponsor, promote, and/or are affiliated with the owners of or participants in those sites, unless stated otherwise. © Betterment. All rights reserved