Wednesday, November 6, 2013

A simple demonstration of the benefits of minification on the Healthcare.gov Marketplace. What happened?

(This was originally a pull request on the healthcare.gov repo that was taken down - it has since been moved to its own repository)

Notes


CGI Federal has not released the source to the webapp powering Healthcare.gov. This pull request is not meant to be merged into this repository. For lack of a better place, I have put it here in hopes that it will get some eyes. This PR is directed at CGI Federal, not Development Seed, who has done some clean and responsible work on their part of the project. I hope they will allow me to occupy this space for a little while so this story can be told.

Note: I have moved this idea to its own repository in hopes of sourcing more fixes from the community. Please contribute!

What is this?


This commit is a quick demonstration of how badly CGI Federal has botched the Healthcare.gov Marketplace.

In less than two hours, with absolutely no advance knowledge of how Healthcare.gov works, I was able to build a simple system for the absolutely vital task of minifying and concatenating static application assets. CGI Federal's coding of the marketplace has so many fundamental errors, I was able to reduce the static payload size by 71% (2.5MB to 713KB) and reduce the number of requests from 79 to 17.

This means 62 fewer round trips, 71% less bytes on the wire, and a site that loads much more quickly and with a less than quarter of the requests - crucial during the first frantic days of launch when web servers are struggling to meet demand.

I'm not any sort of fantastic coder. Most web developers would be able to easily complete this step. It is inexcusable that CGI Federal went to production without it, given the absurd amount of taxpayer money they were given to develop this system. Most of the Javascript code that we are able to see was clearly written by inexperienced developers. If they can't even complete this simple step, we have to ask ourselves: is this the best $50+ million dollars can buy? How can such an expensive, vital project be executed so poorly?

There are many other issues in the current system besides this one. This is merely a demonstration of the lack of care CGI Federal has put into this project. Simply put, a single programmer could have easily done this in a day and healthcare.gov would have stood a much better chance against the load this week. Clearly, there is a perverse set of incentives that has dominated the federal contracting system; delivering a quality product appears to be at the very end of their priority list.

Technical Details


The production app on healthcare.gov delivers a very large payload of JS and CSS without making any attempt to reduce load on its own servers. A great benefit could be realized by simply minifying and concatenating all source.

This commit add a simple builder and test runner and rearranges the JS directory structure a bit so it makes more sense. It also refactors some inline JS into separate files so they can also be optimized.
Adding insult to injury is the delivery of nearly 160kb of unused test data to every consumer of the app (js/dummyData.js). How this made it to the final release is beyond me.

Healthcare.gov is not setting any caching headers, so all assets need to be re-downloaded on every visit. It seems that they intended for the site to work in a completely fluid manner without reloads, but that is clearly not the case. Every refresh (and there are many throughout the process) requires reloading 80+ files, a task that can take 30s or longer and strains healthcare.gov's webservers.

To run (requires nodejs & npm):

git clone https://github.com/STRML/healthcare.gov.git
cd healthcare.gov/marketplaceApp
npm install -g grunt
npm install
grunt build # concat/minification step
grunt connect # runs a webserver to view results

Load Graphs


Before
Before (live site as of Thursday, Oct 10)
Note that the API call to CreateSaml2 triggers an inspector bug - the actual load time is ~28s, not 15980 days
After
This pull request
Load times are from localhost so they are much faster than they would be otherwise. API calls fail because they are relative to the current domain.

No comments:

Post a Comment