Thursday, November 22, 2012

The Only Way to Benchmark Dart Async Code

‹prev | My Chain | next›

Yesterday I used the benchmarking package for Dart to measure my current data-load scheme in dart dirty. Today, I would like to be able to do the same for a better approach. The problem is that a better approach is asynchronous and I am unsure of how to benchmark that.

Hearkening back to my efforts with async testing in Dart, it would be really nice if some of those unittest methods blocked. That is, if the expectAsync0 method blocked until the async method in question was called, then I could do something like wait until my DB has finished loading:
test_perf() {
  solo_test("performance", () {
    Dirty db = new Dirty('test/perf.db', onLoad: expectAsync0() {
      print("[async] !!!");
      expect(
        db['everything'],
        equals({'answer': 42})
      );
    });
  });
}
But that just falls straight through without hitting the expectAsync0 (expect an async method to be called with zero arguments). The problem is that none of the expectAsync methods in unittest will block until expected async method is called. If the functions inside the test have completed execution without invoking the expected callbacks, then the test fails. Actually, it seems to simply stop all tests as my test suite returns nothing:
➜  dart-dirty git:(perf) ✗ dart test/dirty_test.dart
unittest-suite-wait-for-done
Unless I am mistaken, Dart benchmarking suffers from a similar problem. It expects only a run() method to be defined. In other words, there is not a startTimer() / endTimer() pair of methods I could define when the database load first starts and when it is complete.

For this stuff, I think that I am going to have to drop back to writing my own timer. Happily, Dart includes a microsecond stopwatch:
    Stopwatch stopwatch = new Stopwatch()..start();
    var inputStream = db.openInputStream();
    var input = new StringInputStream(inputStream);
    input.onLine = () {
      var line = input.readLine();
      var rec = JSON.parse(line);
      _docs[rec['key']] = rec['val'];
    };
    input.onClosed = () {
      stopwatch.stop();
      print("done!");
      print("Finshed in ${stopwatch.elapsedMicroseconds} us");
      onLoad(this);
    };
What I do here is start the stopwatch before I open the DB stream. By virtue of defining an onLine handler for whenever new data is available to be read, data is read from the file. When there is no more data to be read, the onClosed handler is invoked, stopping my timer and reporting back to me.

Which results in:
➜  dart-comics git:(app.dart) ✗ dart app.dart
done!
Finshed in 12700819 us
That's 13 seconds to read (and parse) 100,000 rows of JSON in a 23MB file. I really did not expect it to run that fast.

To compare with my slurp-the-whole-db-into-memory approach from the other night, I also add the same stopwatch:
    Stopwatch stopwatch = new Stopwatch()..start();

    var lines = db.readAsLinesSync();
    lines.forEach((line) {
      var rec = JSON.parse(line);
      _docs[rec['key']] = rec['val'];
    });

    stopwatch.stop();
    print("done!");
    print("Finshed in ${stopwatch.elapsedMicroseconds} us");
Now, when I run this code, I find:
➜  dart-comics git:(app.dart) ✗ dart app.dart
done!
Finshed in 11015211 us
It finished in 11 seconds! Whoa. That was faster than my supposed optimization. I still might have memory issues with this approach, but speed is definitely not a bottleneck. And this is why I wanted to benchmark my code.

There still may be some things I can do to improved the time of my data load. Now that I have some concrete numbers to compare with, I can get started on that tomorrow.


Day #577

No comments:

Post a Comment