Monday, July 21, 2014

Managing & Massaging Data with ReactJS

I've been working on a new project.  It's heavily data-oriented, and data is changing constantly. I believe it would be very difficult to make a project like this work in a performant manner even 3-4 years ago; it's nearly the perfect use case for React, in my opinion.

I have about 8 data stores, and each client is processing 2-3 websocket messages *per second*, updating those stores. Each store update triggers a render that may be a insert, modify, delete, or complete replacement of a store. Each one of these stores is linked to one or more widgets that must update immediately so that users are informed of the most up-to-date state of the system.

React is a great fit for this because I can modify the data, pipe the proper `props` hooks through the system, and call it a day. But React makes no assumptions about your data, and is completely hands-off about how you should manage it. To help out, I use Fluxxor with some modifications to manage my data stores. But even Flux/Fluxxor does not tell you how to manage your data. So after some figuring, I set about figuring out how best to store my data in the browser.

It appears that the "React Way" is to pass only raw data around to components. This has some distinct advantages, to be sure. Data is much easier to reason about when there are no wrappers getting in the way. However, `shouldComponentUpdate`, the lifecycle event that allows you to skip a rerender in the case of an insignificant data change, because a serious challenge in the event of raw JS data. Javascript's arrays and objects are mutable, which is the norm in most languages but becomes a serious hassle in the context of React. In order to determine if data has changed, you may have to do a deep comparison of all arrays or objects passed to your component, which can take almost as long as rebuilding the component (as virtual DOM diffiing is quite fast).

I'm building an app that has real requirements, and eventually it becomes quite important to massage data. That means adding labels, changing column names for readability, adding derived/virtual properties that depend on other properties (and update properly when their dependencies change), and so on. I thought about this and got a flashback to Backbone - Backbone.Model is one of the best parts of Backbone. Maybe I could just use it raw?

I started working with Backbone as my Model/Collection abstraction, but it didn't offer as much as I wanted, had a lot of cruft I didn't need (Router, Views, History, etc.), and it wasn't easy to update if I removed that cruft. Just about that time, a user on HN mentioned ampersandJS, a refactored and enhanced version of Backbone's data components. It's much better, and if you're willing to leave < ES5 behind, it does quite well with data getters, setters, deep model hierarchies, derived properties, session storage, and more.

Now, I like this, but a lot of it assumes that you want mutable data structures. I don't. So I set upon removing mutability from my collections:



// Collection.js, superclass for all collections

// We always want to mix in underscore & a constructor override.
module.exports = function() {
  var args = [];

  // Remove mutation methods
  var constructor = AmpersandCollection.prototype.constructor;
  args[0] = {
    constructor: function(models, options) {

      // Call super.
      constructor.call(this, models, options);

      // Freeze this collection
      var me = this;
      ['add', 'set', 'remove', 'reset'].forEach(function(funcName){
        me[funcName] = doNotUse.bind(null, funcName);
      });

    }
  };

  // Add underscore
  args[1] = underscoreMixin;

  // Add collection definition
  for (var i = 0; i < arguments.length; i++) {
    args.push(arguments[i]);
  }
  return AmpersandCollection.extend.apply(AmpersandCollection, args);
};

function doNotUse(name) {
  throw new Error("Collections are immutable, do not use the method: " + 
    name);
}

// For instanceof checks - necessary when extending this class.
// This allows components to call `new Collection(models, options);`
module.exports.prototype = AmpersandCollection.prototype;




This allows me to create a new collection every time I make a significant data change, making `shouldComponentUpdate` O(1) while giving me all the benefits that these Collections and Models provide: validation, virtual attributes, nested models, sorting, and so on.

In the end, I found that calling the Collection's constructor on every data change was far too expensive; I have some 100+ element arrays full of rich objects that often change one at a time. I added a helper:



// Lighter weight than creating a new collection entirely.
AmpersandCollection.prototype.clone = function(data, options) {
  if (!options) options = {};
  // Create a new object.
  function factory(){}
  factory.prototype = this.constructor.prototype;
  var newCollection = new factory();
  _.extend(newCollection, this);

  // Assign models
  newCollection.models = _.map(data, function(datum) {
    var model =  newCollection._prepareModel(datum);
    newCollection._addReference(model);
    return model;
  });

  // Sort if necessary.
  var sortable = this.comparator && options.sort !== false;
  if (sortable) newCollection.sort();

  // Remove all references on the old data so it can be GCed.
  // This adds some runtime cost but prevents memory from getting out of control.
  this.off();
  _.each(this.models, function(model) {
    this._removeReference(model);
  }.bind(this));

  return newCollection;
};


This benchmarks quite well: I am able to replace a 150 element collection of large, rich models in less than 0.1ms.

So far, this has been working for me. It creates a fair bit of GC pressure but I am careful to only replace models themselves when they have changed as well, and to preserve those that have not. In a way, it's a lower-tech version of ClojureScript's structural sharing, which is certainly far superior than this. However, I haven't found a good FP-style replacement for what I'm doing.

Have any of you had experience doing this in a similar way, or using Mori instead? What have you found to be the pain points and benefits of your method?

4 comments:

  1. Hi there! I'd like to let you know that I have never had experience doing this in a similar way, so let me give it a try, first of all.

    ReplyDelete
  2. Processing and storing data is a priority, since how much you work will be very dependent on it.

    ReplyDelete
  3. The work in the system involves the data management to achieve the necessary goals. The data structure may also take some changes in doing so.

    ReplyDelete


  4. To keep current, you need to find trustworthy technology news sources that will provide you with up-to-date information.

    Download GTA 5 Full Game for Android without Verification

    ReplyDelete