Wednesday, November 6, 2013

A simple demonstration of the benefits of minification on the Healthcare.gov Marketplace. What happened?

(This was originally a pull request on the healthcare.gov repo that was taken down - it has since been moved to its own repository)

Notes


CGI Federal has not released the source to the webapp powering Healthcare.gov. This pull request is not meant to be merged into this repository. For lack of a better place, I have put it here in hopes that it will get some eyes. This PR is directed at CGI Federal, not Development Seed, who has done some clean and responsible work on their part of the project. I hope they will allow me to occupy this space for a little while so this story can be told.

Note: I have moved this idea to its own repository in hopes of sourcing more fixes from the community. Please contribute!

What is this?


This commit is a quick demonstration of how badly CGI Federal has botched the Healthcare.gov Marketplace.

In less than two hours, with absolutely no advance knowledge of how Healthcare.gov works, I was able to build a simple system for the absolutely vital task of minifying and concatenating static application assets. CGI Federal's coding of the marketplace has so many fundamental errors, I was able to reduce the static payload size by 71% (2.5MB to 713KB) and reduce the number of requests from 79 to 17.

This means 62 fewer round trips, 71% less bytes on the wire, and a site that loads much more quickly and with a less than quarter of the requests - crucial during the first frantic days of launch when web servers are struggling to meet demand.

I'm not any sort of fantastic coder. Most web developers would be able to easily complete this step. It is inexcusable that CGI Federal went to production without it, given the absurd amount of taxpayer money they were given to develop this system. Most of the Javascript code that we are able to see was clearly written by inexperienced developers. If they can't even complete this simple step, we have to ask ourselves: is this the best $50+ million dollars can buy? How can such an expensive, vital project be executed so poorly?

There are many other issues in the current system besides this one. This is merely a demonstration of the lack of care CGI Federal has put into this project. Simply put, a single programmer could have easily done this in a day and healthcare.gov would have stood a much better chance against the load this week. Clearly, there is a perverse set of incentives that has dominated the federal contracting system; delivering a quality product appears to be at the very end of their priority list.

Technical Details


The production app on healthcare.gov delivers a very large payload of JS and CSS without making any attempt to reduce load on its own servers. A great benefit could be realized by simply minifying and concatenating all source.

This commit add a simple builder and test runner and rearranges the JS directory structure a bit so it makes more sense. It also refactors some inline JS into separate files so they can also be optimized.
Adding insult to injury is the delivery of nearly 160kb of unused test data to every consumer of the app (js/dummyData.js). How this made it to the final release is beyond me.

Healthcare.gov is not setting any caching headers, so all assets need to be re-downloaded on every visit. It seems that they intended for the site to work in a completely fluid manner without reloads, but that is clearly not the case. Every refresh (and there are many throughout the process) requires reloading 80+ files, a task that can take 30s or longer and strains healthcare.gov's webservers.

To run (requires nodejs & npm):

git clone https://github.com/STRML/healthcare.gov.git
cd healthcare.gov/marketplaceApp
npm install -g grunt
npm install
grunt build # concat/minification step
grunt connect # runs a webserver to view results

Load Graphs


Before
Before (live site as of Thursday, Oct 10)
Note that the API call to CreateSaml2 triggers an inspector bug - the actual load time is ~28s, not 15980 days
After
This pull request
Load times are from localhost so they are much faster than they would be otherwise. API calls fail because they are relative to the current domain.

Sunday, January 20, 2013

Announcing the first client-encrypted in-browser file sharing service, Securesha.re


End-to-end encryption in a file sharing service


Securesha.re is a file sharing service much like MegaUpload or RapidShare, with an important twist; all files are encrypted in your browser. Browser support is wider than Mega: we support Chrome, Firefox, Safari, and Chrome for Android. We are not able to see your file's contents or name, and we have no means to access them. We do not log IPs and have no means to know what is being shared on our service.

Additionally, Securesha.re allows for automatic self-destruction of your files; by default, your files will self-destruct after 10 views or 7 days.

Securesha.re is not Mega - it is not meant for long-term file storage. Securesha.re is meant for end-to-end transmission and transient storage only. It is not currently possible to keep files on Securesha.re for longer than 7 days; this is a feature, not a bug.

We have been up since AngelHack DC, November 18th 2012.

We believe that the smaller an application is, the greater the chances of it being truly secure. To help with that, the client-side code is small and simple enough to understand and is intentionally kept public for review.

Why does this exist?


File sharing on the web is fraught with problems from the very beginning.

By sharing your files, anywhere, on almost every mainstream service, you implicitly trust the operators not to look at your files. This introduces a number of problems if you don't want your data to be available to just anyone:

1. You must be very careful to store your file on a service that randomizes its URLs, so it cannot be found simply by chance.
2. You must store your files in an encrypted archive. To do this, you must have an archiver capable of encryption, and the knowledge to choose a secure algorithm.
3. You should be careful to delete your files after the intended recipient(s) have downloaded them.

It is unlikely that the average computer user will have the knowledge to follow all three of these steps. Yet everyone has files that should they want to privately share. Many encounter this on a regular basis with family, business partners, lawyers, landlords, and so on.

Email is a decent alternative but you must be very careful to encrypt properly as email is often transferred in plain-text. In addition, your archive may live nearly forever (e.g. Gmail) in the recipient's mailbox. If the data is truly sensitive this may not be acceptable.

Securesha.re solves these problems in a way that is so simple, your grandmother can use it.

How to Use Securesha.re


It is simple to use the service: simply attach a file and click upload.

We have only a minimal set of options. You may specify a date and number of views allowed on your file. The file will be deleted after receiving that number of views, or once the date has been reached.

For now, defaults and maximums are 10 views and 1 week.  File size maximum is 10MB. We may introduce a paid option that allows for more customization.

Securesha.re can easily be used as a simple encrypted end-to-end transfer service. Simply set your view limit to 1 and the file will be deleted as soon as the recipient has downloaded it.

Public Repo


For those of you who are security fanatics, all client-side code is purposely kept un-minified for inspection. You can see exactly how your data is sent. The simplest way to verify I am not being sent any compromising information is to look at your inspector's Network tab.

The client-side files are hosted in a public repo on GitHub. Pull requests are very welcome.

Technical Details


Uploaded files are grabbed as binary strings using the HTML5 File API. They are then encrypted using CryptoJS's fast AES encryption algorithm. Files are encrypted and decrypted in 512kb chunks by up to 4 web workers for extra speed. Encryption/decryption reaches about 2MB/sec on an i5 Macbook Air.

Files are sent to our server which stores on S3. File contents and name are encrypted separately. MIME type is sent in the clear for analytics; these are the only analytics we have. We do not log IP addresses.

We are very careful not to include any plugins or outside JavaScript inside Securesha.re. That means no analytics, no Facebook connect, no Flash copy-paste, nothing. All assets are loaded from our server.

When downloading, the file is decrypted in chunks, reassembled, and transformed into a Blob object. Unfortunately, Safari doesn't support Blob URLs - the application will fallback to data-URIs, but this tends to crash the browser after ~6MB. Securesha.re will notify you of this if it applies.

Securesha.re does not exist to make money and is likely to remain as a proof-of-concept rather than a full-blown sharing service.

Who are you guys?


This project came out of AngelHack DC, a 24-hour hackathon hosted on November 17-18th. I am Samuel Reed; my partners are Kevin Ohashi and Sean Perkins.


Tuesday, November 13, 2012

WPEngine is smooth - but not that smooth

Over the last three months I've been developing a large site for WPEngine. The previous site was hosted on a simple VPS and the customer wanted a faster, more hands-off solution. I like WPEngine for a number of reasons:
  • Hands-off caching
  • Simple staging
  • Simple backups
  • Easy domain configuration
  • Fast support
And so on. Finally, I thought, a service where you get what you pay for (and you do pay for it). And the site is fast. Really fast.

But the problems have started mounting. A quick preview of what I've seen just in the past week:
  • Uploaded images occasionally disappear. I don't know where they go. The user who uploads them sees the images (they go into browser cache) but nobody else can see them. That makes this very difficult for an author to catch.
  • You simply cannot set cookies from PHP. It will not happen. It will happen on your staging server, it will happen on your local server, but WPEngine's caching simply does not allow this. This should be a big red note in WPEngine's support garage. You don't simply disable all cookies and not tell anyone.
  • WP-Cron is broken by default. Posts miss their schedule all the time. I registered a support ticket with WPEngine - they say that they have some internal defaults set that often break wp-cron, and that they would set mine back to fix it. My posts still occasionally miss their schedule. More importantly, why is wp-cron broken by default? Isn't that something important to tell your customers?
  • Weird validation - WPEngine overrides core validation routines to do totally inexplicable things like ban capital letters in usernames. What the hell?! Since WP doesn't ban this by default, you get totally unhelpful validation messages like "Your username must only have alphanumeric characters." Adding a simple filter to sanitize_user fixed this, but why do I have to do that?
  • Staging is a mess. I use Wordless, a great plugin that allows you to use HAML, comes with great helpers, and breaks functions.php into a folder of scripts. This completely doesn't work on staging, and WPEngine has no solution. On top of that, staging does not use the same caching as production, which means that even if my theme did work - I wouldn't catch many of the above bugs.
  • Import limits are agonizingly low. I wanted to import a bunch of posts of a certain category to another site on my WPMU network. Rather than pull the raw SQL, it would be nice to use the WP importer/exporter so I can pull media data, create taxonomies, and so on. After all, that's what it's built for. But WPEngine has a 256MB memory limit (!!!!!) which means that my imports have a paltry 1.6MB limit. What can I do with this? I ended up having to find a Python script to split my imports into manageable chunks. On other hosting I would simply raise the memory limit. This was a real pain.
  • Just as I was writing this post, I lost all of my restore points. Wow.


I'm sure I will find more.

A lot of this would be alleviated by having a simple guide to WPEngine's quirks - like a "What to watch out for in production" article. But no such thing exists. And until the staging environment is a proper staging environment, with the same exact caching, I will continue to find bugs in production & only in production.

Suffice it to say, WPEngine is anything but "Hassle-Free Wordpress Hosting."

Edit: Since this post, I've come across yet another: WPEngine's aggressive site caching actually caches the blogname of my main site onto several of my child sites every morning. That's right, the title of my child sites actually CHANGES every morning and I have to empty the cache daily to set it back. No database writes are done, nothing is wrong in MySQL. This is WPEngine caching my blogname - and getting it wrong. I can't find a GIF to explain how I feel.

Tuesday, October 30, 2012

Uploading a whole directory to a remote server with LFTP

As any user of a restricted VPS or PaaS product (like WPEngine) might know, there are sometimes restrictions that stop you from using ssh or scp.

In my most recent case I really needed to move about 30GB of tiny thumbnail files to WPEngine. I considered downloading them all to my machine with Cyberduck, then uploading them over, but that would have been way too slow. Cyberduck goes into a 'preparing...' cycle that seems to never end, and caps my CPU at 100%.

And in any case, SFTP doesn't support recursive put. What?!

Thankfully, there's LFTP. LFTP is available in just about any sane package manger and works in this fashion:

lftp sftp://user@mysub.domain.com

It uses the same syntax as sftp, plus a few awesome additions.
 
In my case, I wanted to move an entire wp-content/uploads folder over. This is really easy. Simply cd into your existing uploads directory, fire up lftp, and...

lftp user@site.wpengine.com:/wp-content/uploads/2012/10> mirror -R --parallel=20
This is the money shot. mirror -R  means "mirror in reverse", that is, move all local files to remote. And the parallel directive is extremely useful when moving tons of small files. I was seeing incredible (>6MB/sec) transfer rates of these tiny files between sites. And when uploading on my small comcast connection, I was able to saturate the pipe with parallelism this high.

Simply put, there exists no GUI tool that can pull something off this elegantly and quickly. Use it.