Wednesday, November 21, 2012

Benchmarking File IO In Dart

‹prev | My Chain | next›

I now have dart dirty functioning as a persistant HashMap. That should allow me to convert the sample app that I use in Dart for Hipsters over to a pure Dart solution instead of using a node.js backend as I do now. There are still a few more things that I would like to fix before I call it good enough for a 0.1.0 release. Tonight, I hope to improve the loading algorithm.

The current implementation would break down for extremely large datasets. It uses readAsLinesSync() to slurp in the entire file, split on newline characters, and then process each line as JSON. If the file is very large, the slurp-in-the-entire-file part is going to cause problems. First, there might not be enough memory on the system to read the entire file in. Second, it is going to take a long time to split tens of thousands of lines on a newline character—even if I use the non-blocking readAsLines() version of the method.

But first...

Since I am trying to improve performance, I need a baseline measure to compare the end result. As luck would have it the Dart benchmarking article was updated just today. It's like the universe wants me to measure my code.

In a performance branch, I update my pubspec.yaml to include the benchmarking library:
name: dart_dirty
version: 0.0.1
description: Dirt, simple nosql DB
author: Chris Strom 
homepage: https://github.com/eee-c/dart-dirty
dependencies:
  unittest:
    sdk: unittest
  benchmark_harness: ">=1.0.0 <2.0.0"
A quick pub install and I am ready to go:
➜  dart-dirty git:(perf) ✗ pub install 
Resolving dependencies...
Downloading benchmark_harness 1.0.0...
Dependencies installed!
The current implementation of the load looks as follows:
class Dirty {
  // ...
  _load() {
    // ...
    var lines = db.readAsLinesSync();
    lines.forEach((line) {
      var rec = JSON.parse(line);
      _docs[rec['key']] = rec['val'];
    });
    // ...
  }
  // ...
}
Even if I use the non-blocking readAsLines() above, I am still faced with the problem that I am reading the entire DB file into memory and splitting lines on the newline character. That is going to cause trouble if I read in a fixture file containing 100,000 records. For the time being, I stick with the blocking version, which will allow me to benchmark.

Speaking of the benchmarking, that is fairly easy, thanks to the tutorial. I need access to the class that I am testing as well as the benchmarking code. With those, I can implement a benchmarking class for my dart-dirty DB:
import 'package:dart_dirty/dirty.dart';

import 'package:benchmark_harness/benchmark_harness.dart';

class DirtyBenchmark extends BenchmarkBase {
  const DirtyBenchmark() : super("Dirty");

  static void main() {
    new DirtyBenchmark().report();
  }

  void run() {
    Dirty db = new Dirty('test/perf.db');
  }
}

main() {
  DirtyBenchmark.main();
}
The result is:
Dirty(RunTime): 87686000.0 us.
Those are microseconds, so it takes the benchmark 87.7 seconds to load 100,000 records ten times with the current algorithm.

Hopefully I can improve on that tomorrow with a better loading algorithm. But first, I am going to need to figure out a way to invoke any of the IO read methods in a blocking manner.


Day #576

No comments:

Post a Comment