Monday, July 21, 2014

Managing & Massaging Data with ReactJS

I've been working on a new project.  It's heavily data-oriented, and data is changing constantly. I believe it would be very difficult to make a project like this work in a performant manner even 3-4 years ago; it's nearly the perfect use case for React, in my opinion.

I have about 8 data stores, and each client is processing 2-3 websocket messages *per second*, updating those stores. Each store update triggers a render that may be a insert, modify, delete, or complete replacement of a store. Each one of these stores is linked to one or more widgets that must update immediately so that users are informed of the most up-to-date state of the system.

React is a great fit for this because I can modify the data, pipe the proper `props` hooks through the system, and call it a day. But React makes no assumptions about your data, and is completely hands-off about how you should manage it. To help out, I use Fluxxor with some modifications to manage my data stores. But even Flux/Fluxxor does not tell you how to manage your data. So after some figuring, I set about figuring out how best to store my data in the browser.

It appears that the "React Way" is to pass only raw data around to components. This has some distinct advantages, to be sure. Data is much easier to reason about when there are no wrappers getting in the way. However, `shouldComponentUpdate`, the lifecycle event that allows you to skip a rerender in the case of an insignificant data change, because a serious challenge in the event of raw JS data. Javascript's arrays and objects are mutable, which is the norm in most languages but becomes a serious hassle in the context of React. In order to determine if data has changed, you may have to do a deep comparison of all arrays or objects passed to your component, which can take almost as long as rebuilding the component (as virtual DOM diffiing is quite fast).

I'm building an app that has real requirements, and eventually it becomes quite important to massage data. That means adding labels, changing column names for readability, adding derived/virtual properties that depend on other properties (and update properly when their dependencies change), and so on. I thought about this and got a flashback to Backbone - Backbone.Model is one of the best parts of Backbone. Maybe I could just use it raw?

I started working with Backbone as my Model/Collection abstraction, but it didn't offer as much as I wanted, had a lot of cruft I didn't need (Router, Views, History, etc.), and it wasn't easy to update if I removed that cruft. Just about that time, a user on HN mentioned ampersandJS, a refactored and enhanced version of Backbone's data components. It's much better, and if you're willing to leave < ES5 behind, it does quite well with data getters, setters, deep model hierarchies, derived properties, session storage, and more.

Now, I like this, but a lot of it assumes that you want mutable data structures. I don't. So I set upon removing mutability from my collections:

// Collection.js, superclass for all collections

// We always want to mix in underscore & a constructor override.
module.exports = function() {
  var args = [];

  // Remove mutation methods
  var constructor = AmpersandCollection.prototype.constructor;
  args[0] = {
    constructor: function(models, options) {

      // Call super., models, options);

      // Freeze this collection
      var me = this;
      ['add', 'set', 'remove', 'reset'].forEach(function(funcName){
        me[funcName] = doNotUse.bind(null, funcName);


  // Add underscore
  args[1] = underscoreMixin;

  // Add collection definition
  for (var i = 0; i < arguments.length; i++) {
  return AmpersandCollection.extend.apply(AmpersandCollection, args);

function doNotUse(name) {
  throw new Error("Collections are immutable, do not use the method: " + 

// For instanceof checks - necessary when extending this class.
// This allows components to call `new Collection(models, options);`
module.exports.prototype = AmpersandCollection.prototype;

This allows me to create a new collection every time I make a significant data change, making `shouldComponentUpdate` O(1) while giving me all the benefits that these Collections and Models provide: validation, virtual attributes, nested models, sorting, and so on.

In the end, I found that calling the Collection's constructor on every data change was far too expensive; I have some 100+ element arrays full of rich objects that often change one at a time. I added a helper:

// Lighter weight than creating a new collection entirely.
AmpersandCollection.prototype.clone = function(data, options) {
  if (!options) options = {};
  // Create a new object.
  function factory(){}
  factory.prototype = this.constructor.prototype;
  var newCollection = new factory();
  _.extend(newCollection, this);

  // Assign models
  newCollection.models =, function(datum) {
    var model =  newCollection._prepareModel(datum);
    return model;

  // Sort if necessary.
  var sortable = this.comparator && options.sort !== false;
  if (sortable) newCollection.sort();

  // Remove all references on the old data so it can be GCed.
  // This adds some runtime cost but prevents memory from getting out of control.;
  _.each(this.models, function(model) {

  return newCollection;

This benchmarks quite well: I am able to replace a 150 element collection of large, rich models in less than 0.1ms.

So far, this has been working for me. It creates a fair bit of GC pressure but I am careful to only replace models themselves when they have changed as well, and to preserve those that have not. In a way, it's a lower-tech version of ClojureScript's structural sharing, which is certainly far superior than this. However, I haven't found a good FP-style replacement for what I'm doing.

Have any of you had experience doing this in a similar way, or using Mori instead? What have you found to be the pain points and benefits of your method?

Wednesday, November 6, 2013

A simple demonstration of the benefits of minification on the Marketplace. What happened?

(This was originally a pull request on the repo that was taken down - it has since been moved to its own repository)


CGI Federal has not released the source to the webapp powering This pull request is not meant to be merged into this repository. For lack of a better place, I have put it here in hopes that it will get some eyes. This PR is directed at CGI Federal, not Development Seed, who has done some clean and responsible work on their part of the project. I hope they will allow me to occupy this space for a little while so this story can be told.

Note: I have moved this idea to its own repository in hopes of sourcing more fixes from the community. Please contribute!

What is this?

This commit is a quick demonstration of how badly CGI Federal has botched the Marketplace.

In less than two hours, with absolutely no advance knowledge of how works, I was able to build a simple system for the absolutely vital task of minifying and concatenating static application assets. CGI Federal's coding of the marketplace has so many fundamental errors, I was able to reduce the static payload size by 71% (2.5MB to 713KB) and reduce the number of requests from 79 to 17.

This means 62 fewer round trips, 71% less bytes on the wire, and a site that loads much more quickly and with a less than quarter of the requests - crucial during the first frantic days of launch when web servers are struggling to meet demand.

I'm not any sort of fantastic coder. Most web developers would be able to easily complete this step. It is inexcusable that CGI Federal went to production without it, given the absurd amount of taxpayer money they were given to develop this system. Most of the Javascript code that we are able to see was clearly written by inexperienced developers. If they can't even complete this simple step, we have to ask ourselves: is this the best $50+ million dollars can buy? How can such an expensive, vital project be executed so poorly?

There are many other issues in the current system besides this one. This is merely a demonstration of the lack of care CGI Federal has put into this project. Simply put, a single programmer could have easily done this in a day and would have stood a much better chance against the load this week. Clearly, there is a perverse set of incentives that has dominated the federal contracting system; delivering a quality product appears to be at the very end of their priority list.

Technical Details

The production app on delivers a very large payload of JS and CSS without making any attempt to reduce load on its own servers. A great benefit could be realized by simply minifying and concatenating all source.

This commit add a simple builder and test runner and rearranges the JS directory structure a bit so it makes more sense. It also refactors some inline JS into separate files so they can also be optimized.
Adding insult to injury is the delivery of nearly 160kb of unused test data to every consumer of the app (js/dummyData.js). How this made it to the final release is beyond me. is not setting any caching headers, so all assets need to be re-downloaded on every visit. It seems that they intended for the site to work in a completely fluid manner without reloads, but that is clearly not the case. Every refresh (and there are many throughout the process) requires reloading 80+ files, a task that can take 30s or longer and strains's webservers.

To run (requires nodejs & npm):

git clone
npm install -g grunt
npm install
grunt build # concat/minification step
grunt connect # runs a webserver to view results

Load Graphs

Before (live site as of Thursday, Oct 10)
Note that the API call to CreateSaml2 triggers an inspector bug - the actual load time is ~28s, not 15980 days
This pull request
Load times are from localhost so they are much faster than they would be otherwise. API calls fail because they are relative to the current domain.

Sunday, January 20, 2013

Announcing the first client-encrypted in-browser file sharing service,

End-to-end encryption in a file sharing service is a file sharing service much like MegaUpload or RapidShare, with an important twist; all files are encrypted in your browser. Browser support is wider than Mega: we support Chrome, Firefox, Safari, and Chrome for Android. We are not able to see your file's contents or name, and we have no means to access them. We do not log IPs and have no means to know what is being shared on our service.

Additionally, allows for automatic self-destruction of your files; by default, your files will self-destruct after 10 views or 7 days. is not Mega - it is not meant for long-term file storage. is meant for end-to-end transmission and transient storage only. It is not currently possible to keep files on for longer than 7 days; this is a feature, not a bug.

We have been up since AngelHack DC, November 18th 2012.

We believe that the smaller an application is, the greater the chances of it being truly secure. To help with that, the client-side code is small and simple enough to understand and is intentionally kept public for review.

Why does this exist?

File sharing on the web is fraught with problems from the very beginning.

By sharing your files, anywhere, on almost every mainstream service, you implicitly trust the operators not to look at your files. This introduces a number of problems if you don't want your data to be available to just anyone:

1. You must be very careful to store your file on a service that randomizes its URLs, so it cannot be found simply by chance.
2. You must store your files in an encrypted archive. To do this, you must have an archiver capable of encryption, and the knowledge to choose a secure algorithm.
3. You should be careful to delete your files after the intended recipient(s) have downloaded them.

It is unlikely that the average computer user will have the knowledge to follow all three of these steps. Yet everyone has files that should they want to privately share. Many encounter this on a regular basis with family, business partners, lawyers, landlords, and so on.

Email is a decent alternative but you must be very careful to encrypt properly as email is often transferred in plain-text. In addition, your archive may live nearly forever (e.g. Gmail) in the recipient's mailbox. If the data is truly sensitive this may not be acceptable. solves these problems in a way that is so simple, your grandmother can use it.

How to Use

It is simple to use the service: simply attach a file and click upload.

We have only a minimal set of options. You may specify a date and number of views allowed on your file. The file will be deleted after receiving that number of views, or once the date has been reached.

For now, defaults and maximums are 10 views and 1 week.  File size maximum is 10MB. We may introduce a paid option that allows for more customization. can easily be used as a simple encrypted end-to-end transfer service. Simply set your view limit to 1 and the file will be deleted as soon as the recipient has downloaded it.

Public Repo

For those of you who are security fanatics, all client-side code is purposely kept un-minified for inspection. You can see exactly how your data is sent. The simplest way to verify I am not being sent any compromising information is to look at your inspector's Network tab.

The client-side files are hosted in a public repo on GitHub. Pull requests are very welcome.

Technical Details

Uploaded files are grabbed as binary strings using the HTML5 File API. They are then encrypted using CryptoJS's fast AES encryption algorithm. Files are encrypted and decrypted in 512kb chunks by up to 4 web workers for extra speed. Encryption/decryption reaches about 2MB/sec on an i5 Macbook Air.

Files are sent to our server which stores on S3. File contents and name are encrypted separately. MIME type is sent in the clear for analytics; these are the only analytics we have. We do not log IP addresses.

We are very careful not to include any plugins or outside JavaScript inside That means no analytics, no Facebook connect, no Flash copy-paste, nothing. All assets are loaded from our server.

When downloading, the file is decrypted in chunks, reassembled, and transformed into a Blob object. Unfortunately, Safari doesn't support Blob URLs - the application will fallback to data-URIs, but this tends to crash the browser after ~6MB. will notify you of this if it applies. does not exist to make money and is likely to remain as a proof-of-concept rather than a full-blown sharing service.

Who are you guys?

This project came out of AngelHack DC, a 24-hour hackathon hosted on November 17-18th. I am Samuel Reed; my partners are Kevin Ohashi and Sean Perkins.

Tuesday, November 13, 2012

WPEngine is smooth - but not that smooth

Over the last three months I've been developing a large site for WPEngine. The previous site was hosted on a simple VPS and the customer wanted a faster, more hands-off solution. I like WPEngine for a number of reasons:
  • Hands-off caching
  • Simple staging
  • Simple backups
  • Easy domain configuration
  • Fast support
And so on. Finally, I thought, a service where you get what you pay for (and you do pay for it). And the site is fast. Really fast.

But the problems have started mounting. A quick preview of what I've seen just in the past week:
  • Uploaded images occasionally disappear. I don't know where they go. The user who uploads them sees the images (they go into browser cache) but nobody else can see them. That makes this very difficult for an author to catch.
  • You simply cannot set cookies from PHP. It will not happen. It will happen on your staging server, it will happen on your local server, but WPEngine's caching simply does not allow this. This should be a big red note in WPEngine's support garage. You don't simply disable all cookies and not tell anyone.
  • WP-Cron is broken by default. Posts miss their schedule all the time. I registered a support ticket with WPEngine - they say that they have some internal defaults set that often break wp-cron, and that they would set mine back to fix it. My posts still occasionally miss their schedule. More importantly, why is wp-cron broken by default? Isn't that something important to tell your customers?
  • Weird validation - WPEngine overrides core validation routines to do totally inexplicable things like ban capital letters in usernames. What the hell?! Since WP doesn't ban this by default, you get totally unhelpful validation messages like "Your username must only have alphanumeric characters." Adding a simple filter to sanitize_user fixed this, but why do I have to do that?
  • Staging is a mess. I use Wordless, a great plugin that allows you to use HAML, comes with great helpers, and breaks functions.php into a folder of scripts. This completely doesn't work on staging, and WPEngine has no solution. On top of that, staging does not use the same caching as production, which means that even if my theme did work - I wouldn't catch many of the above bugs.
  • Import limits are agonizingly low. I wanted to import a bunch of posts of a certain category to another site on my WPMU network. Rather than pull the raw SQL, it would be nice to use the WP importer/exporter so I can pull media data, create taxonomies, and so on. After all, that's what it's built for. But WPEngine has a 256MB memory limit (!!!!!) which means that my imports have a paltry 1.6MB limit. What can I do with this? I ended up having to find a Python script to split my imports into manageable chunks. On other hosting I would simply raise the memory limit. This was a real pain.
  • Just as I was writing this post, I lost all of my restore points. Wow.

I'm sure I will find more.

A lot of this would be alleviated by having a simple guide to WPEngine's quirks - like a "What to watch out for in production" article. But no such thing exists. And until the staging environment is a proper staging environment, with the same exact caching, I will continue to find bugs in production & only in production.

Suffice it to say, WPEngine is anything but "Hassle-Free Wordpress Hosting."

Edit: Since this post, I've come across yet another: WPEngine's aggressive site caching actually caches the blogname of my main site onto several of my child sites every morning. That's right, the title of my child sites actually CHANGES every morning and I have to empty the cache daily to set it back. No database writes are done, nothing is wrong in MySQL. This is WPEngine caching my blogname - and getting it wrong. I can't find a GIF to explain how I feel.

Tuesday, October 30, 2012

Uploading a whole directory to a remote server with LFTP

As any user of a restricted VPS or PaaS product (like WPEngine) might know, there are sometimes restrictions that stop you from using ssh or scp.

In my most recent case I really needed to move about 30GB of tiny thumbnail files to WPEngine. I considered downloading them all to my machine with Cyberduck, then uploading them over, but that would have been way too slow. Cyberduck goes into a 'preparing...' cycle that seems to never end, and caps my CPU at 100%.

And in any case, SFTP doesn't support recursive put. What?!

Thankfully, there's LFTP. LFTP is available in just about any sane package manger and works in this fashion:

lftp s

It uses the same syntax as sftp, plus a few awesome additions.
In my case, I wanted to move an entire wp-content/uploads folder over. This is really easy. Simply cd into your existing uploads directory, fire up lftp, and...

lftp> mirror -R --parallel=20
This is the money shot. mirror -R  means "mirror in reverse", that is, move all local files to remote. And the parallel directive is extremely useful when moving tons of small files. I was seeing incredible (>6MB/sec) transfer rates of these tiny files between sites. And when uploading on my small comcast connection, I was able to saturate the pipe with parallelism this high.

Simply put, there exists no GUI tool that can pull something off this elegantly and quickly. Use it.