Thursday, July 31, 2014

Working Tests Again on Polymer.dart 0.12


Time to get back to Patterns in Polymer. It has been two months since the last release of the book and the last real changes to the code. So no doubt everything is broken.

Not that I'm complaining. I signed on for this when I started a book on a technology that was quite up front about being in the the pre-alpha stages. Truth be told, that is part of the fun!. So let's see what broke. The quickest way to figure this out is tests, so I start with the testing with Page Objects chapter. To ease my way back in, I start with the Dart version of the code.

After clearing up a spurious curly brace (seriously, how did I manage to commit code with an extra curly brace in it?), I make sure that my tests pass against the old version of Polymer (and Dart):
$ ./test/run.sh
CONSOLE WARNING: ShadowRoot.resetStyleInheritance and ShadowRoot.applyAuthorStyles now deprecated in dart:html.
Please remove them from your code.

CONSOLE MESSAGE: PASS: [defaults] it has no toppings
CONSOLE MESSAGE: PASS: [adding toppings] updates the pizza state accordingly
CONSOLE MESSAGE:
CONSOLE MESSAGE: All 2 tests passed.
I am none too surprised that applyAuthorStyles is deprecated (though I'm a little sad that I left it in there). That aside, everything appears to be in order. So let's change that.

I start a pub upgrade:
$ pub upgrade
Resolving dependencies... (4.9s)
...
> polymer 0.12.0 (was 0.10.0-pre.9) (1 newer version available)
> polymer_expressions 0.12.0 (was 0.10.0-pre.1)
...
> web_components 0.5.0 (was 0.3.3)
...
Changed 30 dependencies!
That is sure to break all kinds of things. And sure enough, my test suite fails to even start.

Fortunately, I have not been completely away from Polymer in the intervening time. James Hurford and I went through this pain when we updated the <ice-code-editor> element (from ICE Code Editor). From that experience, I rework the test context page to load the Polymer platform JavaScript source (previously it was Dart):
<head>
  <!-- Load Polymer -->
  <script src="packages/web_components/platform.js"></script>

  <!-- Load component(s) -->
  <link rel="import" href="packages/page_objects/elements/x-pizza.html">

  <!-- The actual tests -->
  <script type="application/dart" src="test.dart"></script>
  <script src="packages/unittest/test_controller.js"></script>
</head>
This code was built on an experimental version of Polymer.dart, so I also have to remove some mime-type tomfoolery. I also have to reintroduce a initPolymer() call to the test's main() entry point:
// ...
main() {
  initPolymer();
  // Actual tests go here...
}
Even with that, I was still getting a rather unhelpful error when I attempted to run the tests:
CONSOLE WARNING: line 213: FAIL
CONSOLE ERROR: Exception: NoSuchMethodError: method not found: 'whenPolymerReady'
Receiver: Instance of 'JsFunction'
Arguments: [Closure: () => dynamic]
This turns out to also be caused by a jump from a pre-release version of Polymer.dart to 0.12. The fix for this was to simply import Polymer into the top-level Polymer element definition (I am testing <x-pizza>):
<link rel="import" href="../../../packages/polymer/polymer.html">
<link rel="import" href="x-pizza-toppings.html">
<polymer-element name="x-pizza">
  <template>
    <!-- ... -->
  </template>
  <script type="application/dart" src="x_pizza.dart"></script>
</polymer-element>
With that, I have my tests passing again:
$ ./test/run.sh
...
CONSOLE WARNING: line 12: flushing %s elements
CONSOLE MESSAGE: unittest-suite-wait-for-done
CONSOLE MESSAGE: PASS: [defaults] it has no toppings
CONSOLE MESSAGE: PASS: [adding toppings] updates the pizza state accordingly
CONSOLE MESSAGE:
CONSOLE MESSAGE: All 2 tests passed.
I will worry about that “flushing %s elements” another time. For now, I am satisfied to have my tests passing.

I also find that I need to update the actual page that demonstrates this Polymer element. As in the test, I need to load the web components polyfill JavaScript code. I also have to initPolymer(), which is done under normal circumstances with the built-in init.dart:
<!DOCTYPE html>
<html lang="en">
  <head>
    <!-- Load platform polyfills -->
    <script src="packages/web_components/platform.js"></script>

    <!-- Load component(s) -->
    <link rel="import"
          href="packages/page_objects/elements/x-pizza.html">

    <!-- Start Polymer -->
    <script type="application/dart">
      export 'package:polymer/init.dart';
    </script>
  </head>
  <body>
    <div class="container">
      <h1>Ye Olde Dart Pizza Shoppe</h1>
      <x-pizza></x-pizza>
    </div>
  </body>
</html>
With that, I have it working again in both Dart and JavaScript (the latter courtesy of Polymer.dart's built-in dart2js transformer:



I am back in business! No doubt I have other clean-up to do and some associated rewriting. And of course, screencasts for readers of the “Extras” version of Patterns in Polymer. For today, it is good to be back.


Day #139

Wednesday, July 30, 2014

Declaring and Immediately Invoking a Recursive Function in Dart


While performing some minor refactoring on Design Patterns in Dart code (public repo), I ran across an interesting problem. What is the cleanest way to define a recursive function and immediately call it?

I am nearing the end of preliminary research for the book so I am cleaning code up as much as possible for future me. One of the snippets that I lingered on was the loop in the Reactor Pattern code:
  // Reactor “loop” (handleEvent is recursive)
  new InitiationDispatcher().handleEvents();
I can't just make this into a for loop because the loop would always run in the current event frame—never allowing other code to send data into the loop nor allowing the event loop to perform asynchronous actions. So a recursive function seemed the next best idea. Each new call to the function would place it next on Dart's event loop, first allowing any other tasks on the queue to execute.

It seems perfect… except from a clean code standpoint. Why include the comment about the recursive function/method when I can put the actual code right there? As a bonus, this would better follow the code from the original Reactor Pattern paper.

This leads to the question that I started with: how do I define a recursive function in Dart and immediately call it? In JavaScript, the following will work:
(function foo() { setTimeout(foo, 1000); console.log('foo!'); })()
The outer parentheses encapsulate a function, which I immediately invoke with the call operator at the end of that line—the open/close parentheses. Inside the parentheses, I declare the function foo() and invoke it inside a setTimeout. Strictly speaking the setTimeout() is not necessary, but without it I would consume a heck of a lot of my computer's CPU to print out the word “foo” at the end of the function.

But if I try the same thing in Dart:
import 'dart:async';

final timeout = const Duration(milliseconds: 10);
main() {
  (
    foo(){
      new Timer(timeout, foo);
      print('foo!');
    }
  )();
}
Then I get an invalid syntax error:
'file:///home/chris/repos/design-patterns-in-dart/reactor/bin/reactor.dart': error: line 6 pos 10: ')' expected
    foo(){
         ^
It seems that Dart does not consider a function declaration an object that can be grouped by parenthesis.

Similarly, it does not work to simply place the parentheses after the closing curly brace:
  foo(){
    new Timer(timeout, foo);
    print('foo!');
  }();
This results in another syntax error:
'file:///home/chris/repos/design-patterns-in-dart/reactor/bin/reactor.dart': error: line 8 pos 5: unexpected token ')'
  }();
    ^
What is interesting about the error with this approach is that it will work if the function is anonymous:
  (){
    new Timer(timeout, foo);
    print('foo!');
  }();
Well, “work” might not be the right term here. Not fail at the same place would be more apt. This does call the anonymous function right away, but I have not assigned foo, so I wind up with an error trying to invoke it in the Timer:
./bin/reactor.dart
foo!
Unhandled exception:
The null object does not have a method 'call'.

NoSuchMethodError: method not found: 'call'
Receiver: null
Arguments: []
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:45)
#1      _createTimer.<<anonymous closure> (dart:async-patch/timer_patch.dart:9)
If I try to assign a variable foo to the anonymous function and then call it right away:
  var foo=(){
    new Timer(timeout, foo);
    print('foo!');
  }();
Gives me yet another error. This time, I cannot use a variable in the function definition when I am declaring that same variable:
'file:///home/chris/repos/design-patterns-in-dart/reactor/bin/reactor.dart': error: line 8 pos 7: initializer of 'foo' may not refer to itself
  var foo=(){
      ^
So, despite my efforts, the best solution that I can devise is to declare the variable elsewhere:
var foo;
final timeout = const Duration(milliseconds: 10);
main() {
  (foo=(){
    new Timer(timeout, foo);
    print('foo!');
  })();
}
Which is pretty ugly. It is so ugly that I might just stick with a plain old function declaration coupled with a separate call:
  foo(){
    new Timer(timeout, foo);
    print('foo!');
  }
  foo();
In just about all cases, I prefer Dart syntax to JavaScript. But I think I finally found one in which JavaScript has Dart beat.


Day #138

Tuesday, July 29, 2014

Minimum Viable Reactor Pattern (in a Reactor Pattern)


After last night, I believe that I finally have good handle on what the Reactor Pattern is. I also have a decent implementation to illustrate the pattern in Dart (at 039b2b0b3b in the git repo). But it is a bit obtuse.

I patterned the code after the original implementation from the paper that introduced the Reactor Pattern. As the paper noted, the pattern does not apply directly to a single threaded environment like Dart. It does not help that Dart itself is based on the Reactor Pattern. Even so, I think that I have Dart code that mimics the spirit, if not the exact implementation, of the implementation from the paper. But how much, exactly, of that spirit is really necessary?

After some thought, I think I can get rid of two things: a “re-imagining” of select() and the use of Dart isolates. Neither removal is a slam dunk—both serve multiple purposes in the sample code. But both also introduce additional concepts into an already concept-laden design pattern. If I have learned anything in my writings, it is that pretty much everything can (and should) be sacrificed at the alter of minimizing the number of concepts. So let's slaughter a couple of sacrificial concepts and see if I am rewarded...

First up is removing the Dart isolate code, which can been see on the very first line of the main entry point:
main() {
  // Spawn the message sending isolate and set its receive port as the source of
  // the fake select() messages
  Isolate.spawn(messageSender, connection());

  // Create (and register in constructor) event handler
  new LoggingAcceptor();

  // Reactor “loop” (handleEvent is recursive)
  new InitiationDispatcher().handleEvents();

  // NOTREACHED
  return 0;
}
Part of the benefit of using an isolate here is that communication takes place over “ports” which sound very much like the networking ports from the original C++ example. The comparison is not exactly one-to-one, however, and the send vs. receive ports can be confusing:
void messageSender(SendPort server) {
  var logClient = new ReceivePort();
  server.send({'type': 'log_connect', 'value': logClient.sendPort});
  logClient.
    first.
    then((SendPort s1) {
      // Send messages to the connected “application“ port
    });
}
The isolate was introduced to solve a problem that it did not actually solve. I kept it for the “ports,” but perhaps streams would suffice? It is impossible to escape streams in Dart, so using them here would hardly count as a new concept.

The problem is that, in order to mimic the network nature of the Reactor Pattern sample, I need different connections for different services. I had not realized it, but the isolates fit this fairly well. I can convert to streams, but the change does not seem to improve the readability of the example that much:
void messageSender(StreamController server) {
  var logClient = new StreamController();
  server.add({'type': 'log_connect', 'value': logClient});
  logClient.
    stream.
    first.
    then((StreamController s1) {
      s1.add({'type': 'log', 'value': 'howdy'});
      s1.add({'type': 'log', 'value': 'chris'});
      server.add({'type': 'log'});
      // ...
    });
}
Even with isolates, the send and receive ports are ultimately streams themselves, so maybe this pure stream approach is the best option. Still, actually using the companion nature of send and receive ports lends itself to networking discussion. I will have to think about this.

OK, so with the isolates removed, next up is the select() re-imagining. This is a tough thing to remove because it as the heart of the Reactor Pattern. The real select() call is responsible for demultiplexing systems calls likes TCP/IP server connections, network messages, and system I/O. Ultimately these system messages go into a queue that can be processed via select() calls inside of the reactor loop. In both the article and my sample code, this takes the form of the handleEvents() method of the dispatcher:
class InitiationDispatcher implements Dispatcher {
  // ...
  handleEvents() {
    var event = select().fetch();
    if (event == null) { /* Jump to next loop to select again... */ }
    events[event.type].forEach((h)=> h.handleEvent(event));
  }
}
My select() is not even a real function—it just returns an object whose fetch() method behaves like the real select(). I ought to remove it just for that bit of confusion, but… this fake select() does have the advantage of being a very explicit Queue of events:
class Select {
  StreamController<Map> connection;

  static final Select _select = new Select._internal();
  factory Select()=> _select;
  Select._internal() {
    connection = new StreamController();

    connection.
      stream.
      listen((m) {
        add(new MessageEvent(m['type'], m['value']));
      });
  }

  Queue<MessageEvent> queue = new Queue();

  void add(MessageEvent e) { queue.addLast(e); }
  MessageEvent fetch() {
    if (queue.isEmpty) return null;
    return queue.removeFirst();
  }
}
That said, streams themselves are decent demultiplexers. So perhaps I can get rid of all of the Select queuing stuff and replace it with a simple stream?
StreamController _connect;
connect() {
  if (_connect == null) {
    _connect = new StreamController.broadcast();
  }
  return _connect;
}
The main() entry point and the message sending code does not have to change, but, unfortunately, the actual handleEvents() method that is called in the heart of the reactor is not longer a loop or a recursive function. Instead, it just becomes a stream listener:
class InitiationDispatcher implements Dispatcher {
  // ...
  handleEvents([int timeout=10]) {
    connect().
      stream.
      listen((event){
        print('[handleEvents] ${event}');
        events[event['type']].forEach((h)=> h.handleEvent(event));
      });
  }
}
At this point, the sample code is no longer demonstrating the Reactor Pattern. Instead it is a roundabout way to illustrate that Dart is using the Reactor Pattern under the covers. Even worse, I cannot send the messages ahead of time since they will no longer queue. Instead I have to further deviate from the example in the original paper and move the message sending after the dispatcher is created:
main() {
  new LoggingAcceptor();
  new InitiationDispatcher().handleEvents();
  messageSender(connect());
}
That is not a show stopper, but the lack of an actual loop (or recursive function) probably is.

So in the end, last night's isolate and fake select() code was probably fairly close to what I need. There is still some code that I might clean up in there, but maybe not as many concepts as I had thought.


Day #137

Monday, July 28, 2014

Deep, Isolated Reactors in Dart


I think that I have decent sample code the Reactor Pattern in Dart. Furthermore, I think I have a decent explanation for why I cannot get closer to the C++ implementation in the original paper. I think, I think, I think… What I know is that I usually get into trouble when I think.

To have a better idea that I have replicated the spirit, if not the exact details, of the pattern described in the original paper, I would like to establish two communication channels between my isolate message sending code and the reactor loop from yesterday:
main() {
  // Spawn the message sending isolate and set its receive port as the source of
  // the fake select() messages
  var res = new ReceivePort();
  Select.source = res;
  Isolate.spawn(messageSender, res.sendPort);

  // Create (and register in constructor) event handler
  new WordAcceptor();

  // Reactor “loop” (handleEvent is recursive)
  new InitiationDispatcher().handleEvents();
}
Aside from the shenanigans with the select(), this looks very much like the main loop presented in the paper. The “word acceptor” waits for the reactor to inform it of new word sending connections, at which point it creates an instance of a “word handler” which simply prints words to STDOUT.

The problem with my current implementation is that I have the acceptor doing everything. Instead, it should follow the paper more closely—at least if I hope to better appreciate the implications. So I start with a much simpler WordAcceptor class:
class WordAcceptor implements EventHandler {
  WordAcceptor() {
    new InitiationDispatcher().registerHandler(this, 'word_connect');
  }

  void handleEvent(connection) {
    if (connection.type != 'word_connect') return;
    new WordPrinter(connection.value);
  }
}
As in the source paper, this registers itself with a singleton instance of the initiation dispatcher. It is looking to accept incoming connections (connections in this setup come from a separate isolate of code) with a type of word_connect. That is, it is accepting “word” client connections.

The initiation dispatcher invokes the acceptor's handleEvent() method whenever such an event occurs. I will supply a value of a “receive port.” The port in this case is a connection back into the calling isolate—a way to communicate back with the isolate. That value is supplied to the WordPrinter which, until now, has not been pulling its weight.

The WordPrinter will mimic the functionality of the Logging_Handler from the paper. I will not have the low-level C interfaces into network devices, but it can use the isolate ports as a substitute:
class WordPrinter implements EventHandler {
  SendPort sendPort;
  ReceivePort receivePort;

  StreamSubscription handle;

  final timeout = const Duration(milliseconds: 2);

  WordPrinter(this.sendPort) {
    new InitiationDispatcher()
      ..registerHandler(this, 'word')
      ..registerHandler(this, 'word_close');
    // ...
  }
  // ...
}
Like the WordAcceptor, the WordPrinter needs to register itself with the reactor's dispatcher. In this case, it will handle word connections (to signal when words are available to be read) and word_close connections (to signal that there are no more words).

The sendPort is the mechanism by which this printer can communicate back to the isolate sending words to be printed. The WordPrinter only needs to send a means for the isolate to send words directly back to the WordPrinter. This is very much in the spirit of the Logging_Handler establishing a server connection for clients once the Logging_Acceptor receives a connection request. The WordPrinter establishes a local port via which a client (the message sending isolate) can send data.

To accomplish this, the WordPrinter needs its own receive port. So I add that in the constructor:
class WordPrinter implements EventHandler {
  // ...
  WordPrinter(this.sendPort) {
    new InitiationDispatcher()
      ..registerHandler(this, 'word')
      ..registerHandler(this, 'word_close');

    receivePort = new ReceivePort();
    sendPort.send(receivePort.sendPort);

    handle = receivePort.
      asBroadcastStream().
      timeout(timeout, onTimeout: (_){ handle.pause(); }).
      listen(write)
      ..pause();
  }
  // ...
}
The timeout and the pause are how I hope to better mimic the Reactor Pattern's usage of select() in C++. When data is ready to be read in the C++ version, the Reactor signals the concrete event handler to read as much data is available for non-blocking read. It then yields control back to the reactor so that it can wait for more data—either from the same connection or from somewhere else. This is the very heart of the pattern, so it seems useful to try to simulate it here.

So I pause the listener on the stream as soon as it is established so that control can go back to the reactor loop, which will wait until it sees a word event. When it does, this dispatcher invokes handleEvent(), which resumes the subscription handle, which reads all data available on the stream, sending it to write()::
class WordPrinter implements EventHandler {
  // ...
  void handleEvent(event) {
    if (event.type == 'word') {
      read();
    }
    // ...
  }

  void read() {
    handle.resume();
  }

  void write(word) {
    print('[WordPrinter.read] $word');
  }
}
With that, I think I have a decent replication of the spirit of the Logging_Handler class from the paper, but now in Dart.

To call this code, I use the isolate's sendPort to send the word_connect message. The WordAcceptor then creates a WordPrinter which replies back with a send port of its own:
main() {
  // Spawn the message sending isolate and set its receive port as the source of
  // the fake select() messages
  var res = new ReceivePort();
  Select.source = res;
  Isolate.spawn(messageSender, res.sendPort);

  // Create (and register in constructor) event handler
  new WordAcceptor();

  // Reactor “loop” (handleEvent is recursive)
  new InitiationDispatcher().handleEvents();

  // NOTREACHED
  return 0;
}

void messageSender(SendPort port) {
  var wordSender = new ReceivePort();
  port.send({'type': 'word_connect', 'value': wordSender.sendPort});
  wordSender.
    first.
    then((SendPort s1) {
      s1.send({'type': 'word', 'value': 'howdy'});
      s1.send({'type': 'word', 'value': 'chris'});
      port.send({'type': 'word'});
    });
}
Once the WorkPrinter's send port comes back, I can send as many words as I like. To actually tell the reactor that non-blocking data is ready, I send a connection of type word.

And that actually works. The reactor sees the messages, triggers the handle_event() methods in its registered event handlers, which eventually results in the right connections and ultimately words being printed. It even works if I delay sending some of the words (and then send a follow-up word connection):
$ ./bin/reactor.dart
[handleEvents] word_connect
[handleEvents] word
[WordPrinter.read] {type: word, value: howdy}
[WordPrinter.read] {type: word, value: chris}
[handleEvents] word
[WordPrinter.read] {type: word, value: delayed}
[WordPrinter.read] {type: word, value: howdy}
[WordPrinter.read] {type: word, value: chris}
[handleEvents] word_close
To be sure, this is a loooong way to go to print out words, but it does a decent job of capturing the intent of the Reactor Pattern in Dart. Of course, Dart itself is a Reactor Pattern, so absolutely none of this is needed. But it was still fun to implement.



Day #136

Sunday, July 27, 2014

Isolated Reactors in Dart


I am having a devil of a time implementing the Reactor Pattern in Dart.

Truth be told, I do not have to implement it in Dart since Dart is itself built on the Reactor Pattern. Even so, I would like to try my level best to get a reasonable facsimile of it going. The effort itself so far has been illuminating. I had not previously really understood it—that is I could not teach it to someone else. But I would like to get it running for a more pragmatic reason: it would provide insight on how (and if) to include concurrency patterns in the forthcoming Design Patterns in Dart. And, for better or worse, I opted for the Reactor Pattern as my test case.

The problem is not the implementation of the reactor loop:
main() {
  _startRandomMessageSender();

  // Create (and register in constructor) event handler
  new WordAcceptor();

  // Reactor loop...
  for(;;) {
    new InitiationDispatcher().handleEvents();
  }
}
The problem is not even event handlers like the WordAcceptor above, which follows along with the original Reactor Pattern paper by creating new object that then listen for the actual words being sent into the reactor loop.

The problem is getting the messages into the reactor loop. Messages sent before the loop are properly queued and properly processed by the loop. But, once the loop is running, nothing else in the Dart program will run. The loop is so tight that asynchronous code is never given a chance to be executed by Dart's own reactor loop. This, it would seem, is one of the hazards of attempting to build a Reactor Pattern on top of another Reactor Pattern.

But I think the problem might be better phrased as that of a single threaded execution environment, which Dart is. Or, more precisely, Dart's isolates are single threaded. But nothing is preventing me from spawning a second isolate. So that is just what I do. Spawn the message sending function in its own isolate so that it can send messages regardless of what the main isolate is doing:
// ...
import 'dart:isolate';
main() {
  var res = new ReceivePort();
  select().from = res;
  Isolate.
    spawn(messageSender, res.sendPort).
    then((_){
      // Create (and register in constructor) event handler
      new WordAcceptor();

      // Reactor loop...
      for(;;) {
        new InitiationDispatcher().handleEvents();
      }
    });
}
And… this has no effect whatsoever.

Well, it has some effect. The message sending code is now reached and can even be delayed. But the problem remains back in the main code isolate. The reactor loop is so tight that it prevents any other activity from reaching the top of Dart's internal reactor loop.

Oh, and in case anyone thinks like me, no, scheduleMicrotask() is not the answer. Unless the question was, ”how do I exhaust VM memory really, really fast?” Again, the main thread of execution never stops evaluating the for loop long enough to ask what the next microtask or regular event task might be.

So, I think I have to abandon trying to implement this exactly as done in the paper. The for loop prevents any events from outside the loop from being executed, which means that any asynchronous code (like the Future-based isolates) will never been seen.

So I convert the handleEvents() call that used to be inside the loop to a recursive function call. I call handleEvents() once:
main() {
  var res = new ReceivePort();
  Isolate.
    spawn(messageSender, res.sendPort).
    then((_){
      // Create (and register in constructor) event handler
      new WordAcceptor();

      // Reactor “loop”...
      new InitiationDispatcher().handleEvents();
    });
}
Then change handleEvents() to a recursive call (adding a Timer delay for good measure):
class InitiationDispatcher {
  // ...
  handleEvents([int timeout=0]) {
    var event = select().fetch();
    if (event == null) {
      Timer.run((){this.handleEvents();});
      return;
    }
    events[event.type].forEach((h)=> h.handleEvent(event));
  }
}
And that does the trick. I can now send any messages in from the message sending isolate that I like. I probably do not even need the isolate any more.

I think this still honors the spirit of the Reactor Pattern, but I am not 100% positive. I am also not entirely sure what all of this would buy me. The code is hard to follow as-is—hardly sufficiently illustrative to belong in a book. But it feel as though it add a lot of ceremony on top of something that Dart just gives us out of the box. Perhaps that is the point. If nothing else, this has given me additional information on which to ruminate.


Day #135

Saturday, July 26, 2014

(Not Quite) Simulating Select for a Dart Reactor


I still hope to build an illustrative implementation of the Reactor Pattern in Dart. The ultimate goal is to begin to understand the right abstraction level which Design Patterns in Dart should approach concurrency patterns (or if it even should include them). Many concurrency patterns come baked into Dart (as does the Reactor Pattern), so it is very much an open question if I ought to cover them. Still, it is worth exploring…

After last night, I have the dispatcher and event handler interfaces fairly well defined as:
abstract class Dispatcher {
  void handleEvents([int timeout=0]);
  void registerHandler(EventHandler);
  void removeHandler(EventHandler);
}

abstract class EventHandler {
  void handleEvent(type);
  get handle;
}
The Dispatcher class serves as the interface for the InitiationDispatcher in the original paper. Similarly, EventHandler has the same structure as the interface in the paper.

From there, things got a little more difficult for me—party because Dart is single threaded and already built on the Reactor Pattern but mostly because I still lack a deep understanding of the pattern. In the end, I was able to somewhat demonstrate the pattern with a STDIN KeyHandler:
main() {
  // Create (and register in constructor) event handler
  new HandleKey();

  // Reactor loop...
  for(;;) {
    new InitiationDispatcher().handleEvents();
  }
}
STDIN is a poor way to demonstrate a demultiplexing pattern since only one thing can be sent to it at a time. Also, the simple implementations of STDIN block in Dart, which rather defeats the purpose of the tight event loop. So let's see if I can come up with a more illustrative example.

I start by defining select() as a Queue on which I can push “system events”:
class Select {
  static final Select _select = new Select._internal();
  factory Select()=> _select;
  Select._internal();

  Queue queue = new Queue();

  void add(String type) { queue.addLast(type); }
  SystemEvent fetch() {
    if (queue.isEmpty) return null;
    return queue.removeFirst();
  }
}

select()=> new Select();
The Select class is a singleton, so whenever new stuff is pushed on there, it is pushed onto the same queue. Hopefully this will mimic the system() call built into Linux sufficiently for my purposes.

The handles that are read from (and written to) after an OS select have an ID. I again try to mimic that:
class Handle {
  static int nextNumber = 1;
  static Map<int,Handle> lookup = {};

  int number;
  String type;
  var stream;
  Handle(this.type, this.stream){
    number = Handle.nextNumber++;
    Handle.lookup[number] = this;
  }
}
Instead of storing the ID in the operating system, I am storing it in the class itself for later lookup.

The other side of the equation is building an “acceptor” and “processor” pair. This is not strictly necessary for demonstrating the Reactor Pattern, but it follows the original paper closely and—hopefully—will help me better understand things.

First up the acceptor, which accepts “word” connections:
class WordAcceptor implements EventHandler {
  WordAcceptor() {
    new InitiationDispatcher().registerHandler(this, 'word');
  }

  void handleEvent(event) {
    if (event.type != 'word') return;
    new WordPrinter(event.handleId);
  }
  get handle;
}
As with other reactor event handlers, it registers itself with the dispatcher and then handles events by creating a processor instance—in this case a simple word printer that will print the message to STDOUT. Yes, I am going a long way simply to print things out...

The process reads from simple streams:
class WordPrinter  {
  int handleId;
  Handle handle;

  WordPrinter(this.handleId) {
    handle = Handle.lookup[handleId];
    read();
  }

  void read() {
    handle.stream.listen((word){
      print('[WordPrinter.read] $word');
    });
  }
}
Hopefully streams will serve as a nice stand-in for reading from low-level sockets.

Finally, I change the reactor loop to listen for “word” connections and start up a message sender:
main() {
  _startRandomMessageSender();

  // Create (and register in constructor) event handler
  new WordAcceptor();

  // Reactor loop...
  for(;;) {
    new InitiationDispatcher().handleEvents();
  }
}
But here, I am stumped. The reactor loop is too tight for my fake file system streams to be processed. I am effectively blocking in the reactor loop. I can get the message acceptor to fire because it adds to the System queue that is inside the reactor loop. But streams are outside of it.

I could probably replace the streams with something that also gets placed onto the system queue (or something similar). But I am still stumped about how to simulate random data coming in over time. If I try a Timer.run() in the _startRandomMessageSender():
void _startRandomMessageSender() {
  // ...
  new Timer(
    new Duration(seconds: 1),
    () { print('yo'); )
  );
}
Then the print statement is never reached. Again, the reactor loop is too tight. And if I cannot print something 1 second later, how can I send messages to processors?

Bleh. This may be the hazards of attempting to simulate the Reactor Pattern on a platform that itself implements the Reactor Pattern. But, I think it may be more a matter of rethinking how I want to simulate the low-level handles. I will ruminate on this some more and approach it fresh tomorrow.



Day #134

Friday, July 25, 2014

The Reactor Pattern in Dart


I purposefully did not include the phrase “object oriented” in the title of Design Patterns in Dart. The reason, of course, was that I hope to explore others software design patterns in the book. Some of the patterns that most interest me have to do with concurrency—not because that is where the action is in Dart, but because I am so weak with them.

Let me start by stating that I do not really understand the difference between the actor model, the reactor pattern, regular old events, or the observer pattern. It is not that I have no idea, but rather that I know that I lack the understanding to be able to teach them or their differences. This is why I hope to include them in the book.

Part of the challenge before me is that Dart is built on many of these patterns. Normal operations under Dart occur in an event loop that (I believe) is the reactor pattern. Isolates are built on the actor model. Streams support publish and subscribe. And so forth. So how do I discuss these patterns when they are the building blocks of the language? I really have no good answer to that question, but I have an idea how I can start thinking about them.

So tonight, I implement the Reactor Pattern as described by Douglas C. Schmidt in his Reactor paper. Since the reactor pattern is running underneath in Dart itself, this may be a little weird, but I'm going to try it anyhow.

I start with the main dispatcher as:
class InitiationDispatcher {
  var events = {};

  static final InitiationDispatcher _dispatcher =
    new InitiationDispatcher._internal();

  factory InitiationDispatcher() {
    return _dispatcher;
  }

  InitiationDispatcher._internal();

  registerHandler(eventHandler, eventType) {
    events.putIfAbsent(eventType, ()=> []);
    events[eventType].add(eventHandler);
  }

  removeHandler(eventHandler, eventType) {}

  handleEvents([int timeout=0]) {
    // ...?
  }
}
Most of that is just setting up a singleton per the article. I will worry about removeHandler() another day. I am not quite sure how to handle events just yet, so I leave that open for a bit. I do know how to register a new handler—for a given type of event and corresponding handler object, I add the object to the list of handlers for that type. Adding objects instead of functions is different than the normal event listeners to which I am accustomed, but I roll with it here.

Then the handle event class is a simple keyboard reader:
import 'dart:io';
import 'InitiationDispatcher.dart';

class HandleKey {
  HandleKey(){
    new InitiationDispatcher().registerHandler(this, 'key');
  }

  handleEvent(eventType) {
    if (eventType == 'key') {
      print('[handled] $handle');
    }
  }

  get handle {
    return stdin.readLineSync();
  }
}
As in the paper, I exploit the singleton nature of InitiationDispatcher to get the one instance and register this new keyboard handler when it is constructed. I am still struggling with the Dart equivalent for a “handle” from the paper. Here, I start with reading from STDIN. If nothing else, this will give me something with which to experiment.

Finally main:
#!/usr/bin/env dart

import 'package:reactor_code/InitiationDispatcher.dart';
import 'package:reactor_code/HandleKey.dart';

main() {
  new HandleKey();

  for(;;) {
    new InitiationDispatcher().handleEvents();
  }
}
I create an instance of the HandleKey class (which registers itself with the reactor dispatcher). Then I enter into the infinite loop that tries to handleEvents().

And I remain stumped here on how to handle events in the reactor loop without stealing data from the actual handler—how can I see that there is stuff to be read from STDIN, but not actually read it? Not knowing what else to try, I define handleEvents() as:
class InitiationDispatcher {
  // ...
  handleEvents([int timeout=0]) {
    var line = stdin.readLineSync();
    print('[handleEvents] $line');
    events['key'].forEach((h)=> h.handleEvent('key'));
  }
}
The important part is that handleEvents() tells the appropriate handlers to do their things (the last line of the method). But actually waiting for an event blocks until a line is ready from STDIN—defeating the entire purpose of the Reactor Pattern. Still, for tracer bullet purposes, it works. If I type 1, 2, 3, 4, and so on, I get:
$ ./bin/reactor.dart
[handleEvents] 1
[handleEvent] 2
[handleEvents] 3
[handleEvent] 4
[handleEvents] 5
[handleEvent] 6
The first event triggers the dispatcher. The second triggers the actual handlers.

So I sort of have a Reactor Pattern in Dart. But it would be nicer to have one that did not block and that better mimics handles and select() from the paper. I do not necessarily need low-level I/O or networking access to accomplish this, but what I have now is insufficient. I will ruminate on that (or if someone has suggestions?) and pick back up tomorrow.


Day #133

Thursday, July 24, 2014

Logarithmic Scales: For When the Winner is That Big


Up tonight: whatever breaks when I try my all-encompassing, Design Patterns in Dart benchmarking suite on a different pattern. I have been building it out on code for the Visitor Pattern and I think it is finally ready (more or less). So let's see what happens when I pull it back into the code for the Factory Method Pattern.

Note: the code is located at https://github.com/eee-c/design-patterns-in-dart. The current HEAD points to bec37fd41e.

What I found in the Visitor Pattern code was that the benchmark numbers were dependent on the number of times that a particular pattern was run through a loop. This could be specific to patterns that work on node structures like then Visitor Pattern. But for all I know, different approaches like storing a list of Factories in a Map might also be influenced by the number of times through a loop. There is only one way to find out…

I start with a little code re-organization. I think I have settled on storing the preferred solution directly in the lib directory of the particular pattern being explored while alternatives go in lib/alt. After that, I create a separate benchmark script for each approach in the tool directory:
$ tree tool
tool
├── benchmark.sh
├── benchmark.dart
├── benchmark_map_of_factories.dart
├── benchmark_mirrors.dart
└── packages -> ../packages
I also create a mostly-configuration benchmark.sh Bash script file that pulls in common code to run each of those Dart scripts. And it all works pretty brilliantly. Except for the 2 errors per script that cause each benchmark to crash:
Unhandled exception:
FileSystemException: Cannot open file, path = '/home/chris/repos/design-patterns-in-dart/factory_method/tool/packages/args/args.dart' (OS Error: No s)
#0      _rootHandleUncaughtError.<anonymous closure>. (dart:async/zone.dart:713)
...
The solution is fairly simple, though it does expose yet another gap in my thinking—this time regarding Dart Pub packages. Before moving the benchmarking code into a common package location, it was a development dependency for the Visitor Pattern code:
name: visitor_code
dev_dependencies:
  args: any
  benchmark_harness: any
That is no longer needed directly by the pattern code—the common benchmarking code parses command-line arguments now. Only it is clearly not being pulled into this factory method “application.” The factory method application depends on the common dpid_benchmarking package:
name: factory_method_code
dependencies:
  browser: any
dev_dependencies:
  dpid_benchmarking:
    path: ../packages/benchmarking
And dpid_benchmarking does depend on args:
name: dpid_benchmarking
dev_dependencies:
  args: any
  benchmark_harness: any
So why doesn't args get pulled into the application as well?

The answer is simple, really. It is a development dependency of dpid_benchmarking. In other words, it will get installed when used by other applications or packages (like my factory method code). As a development dependency, it is only installed when working directly with the package—when developing it in isolation.

This is not completely obvious. I had not given this much thought, but I realize now that I expected this package's development dependencies to be installed by my application's development dependencies. Somewhere in the back of my mind I was expecting the development dependency on dpid_benchmarking to pull in dpid_benchmarking's development dependencies as well.

Now that I have exposed this flawed thinking, I acknowledge that it was erroneous. Development dependencies are for single package development only. Hopefully I will remember that.

That issue aside, everything else just works. I get nice graphs that seem to prove that the number of times that a Factory Method implementation is used has no impact on its performance:



It may be a little hard to see, but there is a number associated with the “classic” subclass implementation of the Factory Method pattern—it is just really small compared to the other two approaches. For this, I tell gnuplot to plot the y axis logarithmically:
set style data histogram
set style histogram clustered
set style fill solid border
set xtics rotate out
set key tmargin
set xlabel "Loop Size"
set ylabel "µs per run"

set logscale y

plot for [COL=2:4] filename using COL:xticlabels(1) title columnheader
That gives me:



Any way that you look at it, the class subclass approach is the clear performance winner. There may be some maintainability wins for the other approaches, but they would have to be significant to warrant using them over the subclass approach. Even when compiled to JavaScript:



I think that will close out my initial research into object oriented design patterns for the book. I may investigate some concurrency patterns tomorrow. Or it may be time to switch back to Patterns in Polymer.


Day #132

Wednesday, July 23, 2014

Varying Only the Import in Dart


Tonight, I explore a new kind of Dart refactoring: varying the import statement.

This occurs in the benchmarking code for the Visitor Pattern as part of my research for the future Design Patterns in Dart. I have three different approaches that I would to compare. After refactoring and refactoring and refactoring, I am finally down to very slim scripts that only vary by the implementation code being imported (and the name of the benchmark):
$ diff -u tool/benchmark.dart tool/benchmark_single_dispatch_iteration.dart
--- tool/benchmark.dart 2014-07-23 23:11:55.933361972 -0400
+++ tool/benchmark_single_dispatch_iteration.dart       2014-07-23 23:12:45.545363182 -0400
@@ -2,12 +2,12 @@
 
 import 'package:dpid_benchmarking/pattern_benchmark.dart';
 
-import 'package:visitor_code/visitor.dart';
+import 'package:visitor_code/alt/single_dispatch_iteration/visitor.dart';
 
 main(List<String> args) {
   BenchmarkRunner.main(
     args,
-    "Classic Visitor Pattern",
+    "Nodes iterate w/ single dispatch",
     _setup,
     _run
   );
The BenchmarkRunner.main() method signature is a little ugly, but aside from the minor quibble I feel pretty good about this. Except…

There are two dozen lines of code that follow this that are exactly identical in each of the three benchmark scripts. The setup and the actual code that is executed is 100% duplicate between the three files. It looks something like:
var visitor, nodes;

_setup() {
  visitor = new PricingVisitor();

  nodes = new InventoryCollection([mobile(), tablet(), laptop()]);
  // Add more sub-nodes to complexify the structure...
}

_run(loopSize) {
  // nodes used
  // visitor used
  // (nodes accept visitor a bunch of times)
}
I am creating a closure over visitor and nodes so that they can be shared between the setup and benchmark runner when executed by BenchmarkRunner.

What is important to me in this case is that the PricingVisitor and InventoryCollection classes have the same name, but are defined differently by the three different imported packages in the three different scripts that I have.

This is almost certainly unique to benchmarking considerations, but still, how can I move this duplicated code out into a single file that can be shared? Dart parts will not work because the file containing the common setup and run code would have to be the main file and only one of the three implementations could be part of it. Conditional imports do not work in Dart.

Unfortunately, I am more or less stumped on a good way to do this. Dart is not meant to used like this (I'm not sure any language is). That said, I can come up with something that works. I have to use the Factory Method pattern to create the visitor and the node structure (I also have to pull in makers for the nodes within the structure). In the end, the overhead does not seem to save much in the way of lines of code:
import '_setup.dart' as Setup;

main(List<String> args) {
  Setup.visitorMaker = ()=> new PricingVisitor();
  Setup.inventoryCollectionMaker = (items) => new InventoryCollection(items);
  Setup.mobile = mobile;
  Setup.tablet = tablet;
  Setup.laptop = laptop;
  Setup.app = app;

  BenchmarkRunner.main(
    args,
    "Classic Visitor Pattern",
    Setup.setup,
    Setup.run
  );
}
What I do gain is the ability to keep the test setup and running logic in one place: _setup.dart. And in there, I simply define top-level variables to support the Setup.XXX setters:
library visitor_setup;

var visitorMaker;
var inventoryCollectionMaker;
var mobile;
var tablet;
var laptop;
var app;

var visitor, nodes;

setup() {
  visitor = visitorMaker();
  // ...
}
// ...
I am not satisfied with that, but it does have the advantage of keeping the common logic in one place. At the risk of rationalizing this too much, I note that the 6 Setup assignments are only needed because I am using 4 node types to intentionally create complexity in the data structure being tested.

I will leave this as “good enough” for the time being. Using this on other design patterns will ultimately decide if this approach is usable. So I will pick back up with that tomorrow.



Day #131

Tuesday, July 22, 2014

Internal Dart Packages for Organizing Codebases


This may very well apply only to me…

I would like to re-use some shell and Dart benchmarking code. I will not be duplicating code so I have to find a working solution. The problem with which I am faced is that I am not working on a single Dart package, but dozens—one for each pattern that will be covered in Design Patterns in Dart. Each package has its own internal Dart Pub structure, complete with pubspec.yaml and the usual subdirectories:
$ tree -L 2
.
├── factory_method
│   ├── build
│   ├── lib
│   ├── packages
│   ├── pubspec.lock
│   ├── pubspec.yaml
│   ├── tool
│   └── web
└── visitor
    ├── bin
    ├── lib
    ├── packages
    ├── pubspec.lock
    ├── pubspec.yaml
    └── tool
The code that I would like to share currently exists only in the Visitor Pattern's tool subdirectory. I suppose that I could create another top-level directory like helpers and then import the common code into visitor, factory_method and the still-to-be-written directories. That seems like a recipe for an unmaintainable codebase—I will wind up with crazy depths and amounts of relative imports (e.g. import '../../../../helpers/benchmark.dart') strewn about. And the coding gods help me should I ever want to rename things.

Instead, I think that I will create a top-level packages directory to hold my common benchmarking code, as well as any other common code that I might want to use. As the name suggests, I can create this as an actual Dart Pub package, but instead of publishing it to pub.dartlang.org, I can keep it local:
$ tree -L 2
.
├── factory_method
│   └── ...
├── packages
│   └── benchmarking
└── visitor
    └── ...
I am probably going to regret this, but I named the package's subdirectory as the relatively brief ”benchmarking,” but name the package dpid_benchmarking in pubspec.yaml. The idea here is to save a few keystrokes on the directory name, but ensure that my local package names do not conflict with any that might need to be used as dependencies. So in packages/benchmarking, I create a pubspec.yaml for my local-only package:
name: dpid_benchmarking
dev_dependencies:
  args: any
  benchmark_harness: any
There is nothing fancy there—it reads like any other package specification in Dart, which is nice.

The first bit of common code that I would like to pull in is not Dart. Rather it is the common _benchmark.sh Bash code from last night. There is nothing in Dart's packages that prevent packaging other languages, a fact that I exploit here:
$ git mv visitor/tool/_benchmark.sh packages/benchmarking/lib
I use the lib subdirectory in the package because that is the only location that is readily shared by Dart packages.

To use _benchmark.sh from the vistor code samples, I now need to declare that it depends on my local-only package. This can be done in pubspec.yaml with a dependency. Since this is benchmarking code, it is not a build dependency. Rather it is a development dependency. And, since this is a local-only package, I have to specify a path attribute for my development dependency:
name: visitor_code
dev_dependencies:
  dpid_benchmarking:
    path: ../packages/benchmarking
I suffer a single relative path in my pubspec.yaml because Pub rewards me with a common, non-relative path after pub install. Installing this local-only package creates a symbolic link to packages/dpid_benchmarking/lib in the packages directory:
$ pwd
/home/chris/repos/design-patterns-in-dart/visitor
$ ls -l packages 
lrwxrwxrwx 1 chris chris 31 Jul 22 21:35 dpid_benchmarking -> ../../packages/benchmarking/lib
lrwxrwxrwx 1 chris chris  6 Jul 22 21:35 visitor_code -> ../lib
Especially useful here is that Dart Pub creates this packages directory in all of the standard pub subdirectories like tool:
$ cd tool
$ pwd
/home/chris/repos/design-patterns-in-dart/visitor/tool
$ ls -l packages/
lrwxrwxrwx 1 chris chris 31 Jul 22 21:35 dpid_benchmarking -> ../../packages/benchmarking/lib
lrwxrwxrwx 1 chris chris  6 Jul 22 21:35 visitor_code -> ../lib
This is wonderfully useful in my visitor pattern's benchmark.sh Bash script. Instead of sourcing _benchmark.sh in the current tool directory, I simply change it so that it sources it from the local-only package:
#!/bin/bash

source ./packages/dpid_benchmarking/_benchmark.sh

BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)

_run_benchmarks $1
The symbolic links from Dart Pub takes care of the rest. Nice!

Of course, Pub is the Dart package manager, so it works with Dart code as well. I move some obvious candidates for common benchmarking from visitor/tool/src into the local-only package:
$ git mv visitor/tool/src/config.dart packages/benchmarking/lib/
$ git mv visitor/tool/src/score_emitters.dart packages/benchmarking/lib/
It is then a simple matter of changing the import statements to use the local-only package:
import 'package:dpid_benchmarking/config.dart';
import 'package:dpid_benchmarking/score_emitters.dart';
import 'package:visitor_code/visitor.dart';

main (List args) {
  // ...
}
Dart takes care of the rest!

Best of all, I rinse and repeat in the rest of my design patterns. I add the dpid_benchmarking local-only package as a development dependency, run pub install, then make use of this common code to ensure that I have beautiful data to back up some beautiful design patterns.

That is an all-around win thanks to Dart's Pub package manager. Yay!


Day #130

Monday, July 21, 2014

Refactoring Bash Scripts


I'll be honest here: I'm a pretty terribly Bash script coder. I find the man page too overwhelming to really get better at it. If I need to do something in Bash—even something simple like conditional statements—I grep through /etc/init.d scripts or fall back to the Google machine.

But tonight, there is no hiding. I have two Bash scripts (actually, just shell scripts at this point) that do nearly the same thing: benchmark.sh and benchmark_js.sh. Both perform a series of benchmarking runs of code for Design Patterns in Dart, the idea being that it might be useful to have actual numbers to back up some of the approaches that I include in the book. But, since this is Dart, it makes sense to benchmark both on the Dart VM and on a JavaScript VM (the latter because most Dart code will be compiled with dart2js). The two benchmark shell scripts are therefore responsible for running and generating summary results for Dart and JavaScript.

The problem with two scripts is twofold. First, I need to keep them in sync—any change made to one needs to go into the other. Second, if I want to generalize this for any design pattern, I have hard-coded way too much in both scripts. To the refactoring machine, Robin!

To get an idea where to start, I diff the two scripts. I have been working fairly hard to keep the two scripts in sync, so there are only two places that differ. The JavaScript version includes a section that compiles the Dart benchmarks in to JavaScript:
$ diff -u1 tool/benchmark.sh tool/benchmark_js.sh 
--- tool/benchmark.sh   2014-07-21 22:32:44.047778498 -0400
+++ tool/benchmark_js.sh        2014-07-21 20:54:20.803634500 -0400
@@ -11,2 +11,16 @@
 
+# Compile
+wrapper='function dartMainRunner(main, args) { main(process.argv.slice(2)); }';
+dart2js -o tool/benchmark.dart.js \
+           tool/benchmark.dart
+echo $wrapper >> tool/benchmark.dart.js
+
+dart2js -o tool/benchmark_single_dispatch_iteration.dart.js \
+           tool/benchmark_single_dispatch_iteration.dart
+echo $wrapper >> tool/benchmark_single_dispatch_iteration.dart.js
+
+dart2js -o tool/benchmark_visitor_traverse.dart.js \
+           tool/benchmark_visitor_traverse.dart
+echo $wrapper >> tool/benchmark_visitor_traverse.dart.js
+
...
The other difference is actually running the benchmarks—the JavaScript version needs to run through node.js:
$ diff -u1 tool/benchmark.sh tool/benchmark_js.sh 
--- tool/benchmark.sh   2014-07-21 22:32:44.047778498 -0400
+++ tool/benchmark_js.sh        2014-07-21 20:54:20.803634500 -0400
...
@@ -15,7 +29,7 @@
 do
-    ./tool/benchmark.dart --loop-size=$X \
+    node ./tool/benchmark.dart.js --loop-size=$X \
         | tee -a $RESULTS_FILE
-    ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X \
+    node ./tool/benchmark_single_dispatch_iteration.dart.js --loop-size=$X \
         | tee -a $RESULTS_FILE
-    ./tool/benchmark_visitor_traverse.dart --loop-size=$X \
+    node ./tool/benchmark_visitor_traverse.dart.js --loop-size=$X \
         | tee -a $RESULTS_FILE
For refactoring purposes, I start with the latter difference. The former is a specialization that can be performed in a single conditional. The latter involves both uses of the script.

That will suffice for initial strategy, what about tactics? Glancing at the Dart benchmark.sh script, I see that I have a structure that looks like:
#!/bin/bash

RESULTS_FILE=tmp/benchmark_loop_runs.tsv
SUMMARY_FILE=tmp/benchmark_summary.tsv
LOOP_SIZES="10 100 1000 10000 100000"

# Initialize artifact directory
...

# Individual benchmark runs of different implementations
...

# Summarize results
...

# Visualization ready
...
Apparently I have been quite fastidious about commenting the code because those code section comments are actually there. They look like a nice first pass a series of functions. Also of note here is that the script starts by setting some global settings, which seems like a good idea even after refactoring—I can use these and others to specify output filenames, benchmark scripts, and whether or not to use the JavaScript VM.

But first things first, extracting the code in each of those comment sections out into functions. I make a top-level function that will invoke all four functions-from-comment-sections:
_run_benchmarks () {
    initialize
    run_benchmarks
    summarize
    all_done
}
Then I create each function as:
# Initialize artifact directory
initialize () {
  # ...
}

# Individual benchmark runs of different implementations
run_benchmarks () {
  # ...
}

# Summarize results
summarize () {
  # ...
}

# Visualization ready
all_done () {
  # ...
}
Since each of those sections is relying on top-level global variables, this just works™ without any additional work from me.

Now for some actual refactoring. One of the goals here is to be able to use this same script not only for JavaScript and Dart benchmarking of the same pattern, but also for different patterns. To be able to use this for different patterns, I need to stop hard-coding the scripts inside the new run_benchmarks function:
run_benchmarks () {
    echo "Running benchmarks..."
    for X in 10 100 # 1000 10000 100000
    do
      ./tool/benchmark.dart --loop-size=$X \
          | tee -a $RESULTS_FILE
      ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X \
          | tee -a $RESULTS_FILE
      ./tool/benchmark_visitor_traverse.dart --loop-size=$X \
          | tee -a $RESULTS_FILE
    done
    echo "Done. Results stored in $results_file."
}
The only thing that is different between those three implementation benchmarks is the name of the benchmark file. So a list of files in a global variable that could be looped over is my next step:
BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)
Up until this point, I think I could get away with regular Bourne shell scripting, but lists like this are only available in Bash. With that, I can change run_benchmarks to:
run_benchmarks () {
    echo "Running benchmarks..."
    for X in 10 100 1000 10000 100000
    do
        for script in ${BENCHMARK_SCRIPTS[*]}
        do
            ./$script --loop-size=$X | tee -a $results_file
        done
    done
    echo "Done. Results stored in $results_file."
}
At this point, I would like to get a feel for what the common part of the script is and what specialized changes are needed for each new pattern benchmark. So I move all of my new functions out into a _benchmark.sh script that can be ”sourced” by the specialized code:
#!/bin/bash

source ./tool/_benchmark.sh

BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)

RESULTS_FILE=benchmark_loop_runs.tsv
SUMMARY_FILE=benchmark_summary.tsv

_run_benchmarks
That is pretty nice. I can easily see how I would use this for other patterns—for each implementation being benchmarked, I would simple add them to the list of BENCHMARK_SCRIPTS.

Now that I see what is left of the code, I realize that the RESULTS_FILE and SUMMARY_FILE variables were only varied to keep from overwriting artifacts of the different VM runs. The basename for each remains the same between the two original scripts—the JavaScript artifacts include an additional _js. That is the kind of thing that can be moved into my new common _benchmark.sh file:
#...
results_basename="benchmark_loop_runs"
summary_basename="benchmark_summary"
results_file=""
summary_file=""
type="js"
# ...
initialize () {
    if [ "$type" = "js" ]; then
        results_file=$tmpdir/${results_basename}_js.tsv
        summary_file=$tmpdir/${summary_basename}_js.tsv
    else
        results_file=$tmpdir/${results_basename}.tsv
        summary_file=$tmpdir/${summary_basename}.tsv
    fi
    # ...
}
If _run_benchmarks is invoked when type is set to "js", then the value of the results_file variable will then include _js in the basename. If "js" is not supplied, then the old basename will be used.

To obtain that information from the command-line, I read the first argument supplied when the script is called ($1) and send it to _run_benchmarks:
_run_benchmarks $1
I can then add a new (very simple) parse_options function to see if this script is running the Dart or JavaScript benchmarks:
_run_benchmarks () {
    parse_options $1
    initialize
    run_benchmarks
    summarize
    all_done
}

parse_options () {
  if [ "$1" = "-js" -o "$1" = "--javascript" ]
  then
      type="js"
  fi
}
# ...
The $1 inside _run_benchmarks is not the same as the $1 from the command-line. Inside a function, $1 refers to the first argument supplied to it.

I can also make use of this to pull in the last piece of refactoring—the compiling that I decided to save until later. Well, now it is later and I can compile as needed when type has been parsed from the command line to be set to "js":
_run_benchmarks () {
    parse_options $1
    initialize
    if [ "$type" = "dart" ]; then
        run_benchmarks
    else
        compile
        run_benchmarks_js
    fi
    summarize
    all_done
}
With that, I have everything I need to keep the specialized versions of the benchmarking script small:
#!/bin/bash

source ./tool/_benchmark.sh

BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)

_run_benchmarks $1
No doubt there is much upon which I could improve if I wanted to get really proficient at Bash scripting, but I am reasonably happy with this. It ought to suit me nicely when I switch to other patterns. Which I will try next. Tomorrow.



Day #129

Sunday, July 20, 2014

Reading Files from STDIN in Dart


I need to read from STDIN on the command-line. Not just a single line, which seems to well understood in Dart, but from an entire file redirected into a Dart script.

My benchmarking work has a life of its own now with several (sometimes competing) requirements. One of those requirements is that the code that actually produces the benchmark raw data cannot use dart:io for writing to files (because it also compiled to JavaScript which does not support dart:io). Given that, my benchmark results are sent to STDOUT and redirected to artifact files:
    ./tool/benchmark.dart --loop-size=$X \
        | tee -a $RESULTS_FILE
Or, the JavaScript version:
    node ./tool/benchmark.dart.js --loop-size=$X \
        | tee -a $RESULTS_FILE
That works great, mostly because Dart's print() sends its data to STDOUT regardless of whether or not it is being run the Dart VM or in Node.js after being put through dart2js.

What I need now is a quick way to summarize the raw results that can work with either the Dart or JavaScript results. Up until now, I have hard-coded the current filename inside the summary script which is then run as:
./tool/summarize_results.dart > $SUMMARY_FILE
The problem with that is twofold: I should not hard-code a filename like that and the artifact filenames should all be managed in the same location.

I could send the filename into the ./tool/summarize_results.dart as an argument. That would, in fact, require a minimal amount of change. But I have a slight preference to be consistent with shell file redirection—if STDOUT output is being redirected to a file, then my preference is to (at least support) reading from a file and piping into the script via STDIN.

Fortunately, this is fairly easy with dart:io's stdin top-level getter. It actually turns out to be very similar to the single-line-of-keyboard-input solutions that are already out there:
_readTotals() {
  var lines = [];

  var line;
  while ((line = stdin.readLineSync()) != null) {
    var fields = line.split('\t');
    lines.add({
      'name': fields[0],
      'score': fields[1],
      'loopSize': fields[2],
      'averageScore': fields[3]
    });
  }

  return lines;
}
The only real “trick” here is the null check at the end of the STDIN input stream. Pleasantly, that was easy to figure out since is it similar to many other languages implementation of STDIN processing.



Day #128

Saturday, July 19, 2014

Command Line Arguments with Node.js and Dart2js


How on earth did it come to this?

I need for my compiled Dart to respond to command line arguments when compiled to JavaScript and run under Node.js. I think that sentence reads as proper English.

It all seemed so simple at first. I want to benchmark code for Design Patterns in Dart. I may use benchmarks for every pattern, I may use them for none—either way I would like the ability to quickly get benchmarks so that I can compare different approaches that I might take to patterns. So I would like to benchmark.

So I benchmark my Dart code. I added features to command-line Dart scripts so that they can read command line switches to vary the loop size (which seems to make a difference). Then I try to compile this code to JavaScript to be run under node.js. That actually works. Except for command line options to vary the loop size.

And so here I am. I need to figure out how to get node.js running dart2js code to accept command-line (or environment) options.

Last night I found that the usual main() entry point simply does not see command line arguments when compiled via dart2js and run with node. The print() of the command line args is an empty List even when options are supplied:
main (List<String> args) {
  print(args);
  // ...
}
With the Dart VM, I see command line options:
$ dart ./tool/benchmark.dart --loop-size=666
[--loop-size=666]
And, with the very nice args package, I can even process them to useful things. But when run with node.js, I only get a blank list:
$ node ./tool/benchmark.dart.js --loop-size=666
[]
I had thought to use some of the process environment options, but they are all part of the dart:io package which is unsupported by dart2js. Even initializing a string from the environment does not work:
const String LOOP_SIZE = const String.fromEnvironment("LOOP_SIZE", defaultValue: "10");
No matter how I tried to set that environment variable, my compiled JavaScript would only use the default value.

So am I stuck?

The answer turns out to be embedded in the output of dart2js code. At the very top of the file are a couple of hints including:
// dartMainRunner(main, args):
//    if this function is defined, the Dart [main] method will not be invoked
//    directly. Instead, a closure that will invoke [main], and its arguments
//    [args] is passed to [dartMainRunner].
It turns out that I can use this dartMainRunner() function to grab command line arguments the node.js way so that they can be supplied to the compiled Dart. In fact, I need almost no code to do it:
function dartMainRunner(main, args) {
  main(process.argv.slice(2)); 
}
I slice off the first two command line arguments, supplying the main() callback with the rest. In the case of my node.js code, the first two arguments are the node executable and the script name:
$ node ./tool/benchmark.dart.js --loop-size=666
With those chomped off, I am only supplying the remainder of the command line arguments—the switches that control how my benchmarks will run.

This is a bit of a hassle in my Dart benchmarks since I will need to add that wrapper code to each implementation every time that I make a change. I probably ought to put all of this into a Makefile (or equivalent). For now, however, I simply make it part of my benchmarking shell script:
#...
# Compile
wrapper='function dartMainRunner(main, args) { main(process.argv.slice(2)); }';
dart2js -o tool/benchmark.dart.js \
           tool/benchmark.dart
echo $wrapper >> tool/benchmark.dart.js

# More compiling then actual benchmarks ...
And that works!

Thanks to my already in place gnuplot graphing solution, I can even plot my three implementations of the Visitor Pattern compiled from Dart into JavaScript. The number of microseconds it takes a single run relative to the number of loops winds up looking like this:



That is a little different than last night's Dart numbers, but the bottom line seems to be that it does not matter too much which implementation I choose. At least for this pattern. Regardless, I am grateful to be able to get these numbers quickly now—even from node.js.


Day #127

Friday, July 18, 2014

Close, But Not Quite Visualizing dart2js Benchmark Comparisons


Thanks to benchmark_harness and gnuplot I can make quick, pretty, and useful visualizations of the performance of different code implementations:



The actual numbers are almost inconsequential at this point. I need to know that I can make and visualize them in order to know that I can choose the right solution for inclusion in Design Patterns in Dart. I can investigate the numbers later—for now I only need know that I can generate them.

And I think that I finally have that sorted out. I think that I have eliminated inappropriate comparisons, loops, types and just plain silly mistakes. To be sure, the code could (and probably will need to be) better. But, it works. More importantly, it can be run through a single tool/benchmark.sh script:
#!/bin/sh

# Initialize artifact directory
mkdir -p tmp
cat /dev/null > tmp/benchmark_loop_runs.tsv
cat /dev/null > tmp/benchmark_summary.tsv

# Individual benchmark runs of different implementations
echo "Running benchmarks..."
for X in 10 100 1000 10000 100000
do
    ./tool/benchmark.dart --loop-size=$X
    ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X
    ./tool/benchmark_visitor_traverse.dart --loop-size=$X
done
echo "Done. Results stored in tmp/benchmark_loop_runs.tsv."

# Summarize results
echo "Building summary..."
./tool/summarize_results.dart
echo "Done. Results stored in tmp/benchmark_summary.tsv."

# Visualization ready
echo ""
echo "To view in gnuplot, run tool/visualize.sh."
I am little worried about the need for artifacts, but on some level they are needed—the VM has to be freshly started with each run for accurate numbers. To retain the data between runs—and between the individual runs and the summary—artifacts files will need to store that information. I will worry about that another day.

My concern today is that I am benchmarking everything on the Dart VM. For the foreseeable future, however, the Dart VM will not be the primary runtime environment for Dart code. Instead, most Dart code will be compiled to JavaScript via dart2js. I cannot very well recommend a solution that works great in Dart, but fails miserably in JavaScript.

So how to I get these numbers and visualizations in JavaScript?

Well, I already know how to benchmark dart2js. I just compile to JavaScript and run with Node.js. Easy-peasey, right?
$ dart2js -o tool/benchmark.dart.js \
>            tool/benchmark.dart
tool/src/score_emitters.dart:3:8:
Error: Library not found 'dart:io'.
import 'dart:io';
       ^^^^^^^^^
tool/src/score_emitters.dart:39:18:
Warning: Cannot resolve 'File'.
  var file = new File(LOOP_RESULTS_FILE);
                 ^^^^
tool/src/score_emitters.dart:40:23:
Warning: Cannot resolve 'FileMode'.
  file.openSync(mode: FileMode.APPEND)
                      ^^^^^^^^
Error: Compilation failed.
Arrrgh. All of my careful File code is for naught if I want to try this out in JavaScript. On the bright side, this is why you try a solution in a bunch of different environments before generalizing.

This turns out to have a fairly easy solution thanks to tee. I change the score emitter to simply print to STDOUT in my Dart code:
recordTsvTotal(name, results, loopSize, numberOfRuns) {
  var averageScore = results.fold(0, (prev, element) => prev + element) /
    numberOfRuns;

  var tsv =
    '${name}\t'
    '${averageScore.toStringAsPrecision(4)}\t'
    '${loopSize}\t'
    '${(averageScore/loopSize).toStringAsPrecision(4)}';

  print(tsv);
}
Then I make the benchmark.sh responsible for writing the tab separated data to the appropriate file:
RESULTS_FILE=tmp/benchmark_loop_runs.tsv
# ...
dart2js -o tool/benchmark.dart.js \
           tool/benchmark.dart
# ...
for X in 10 100 1000 10000 100000
do
    ./tool/benchmark.dart --loop-size=$X | tee -a $RESULTS_FILE
    # ...
done
That works great—for the default case:
$ node tool/benchmark.dart.js
Classic Visitor Pattern 6465    10      646.5
Unfortunately, the dart:args package does not work with Node.js. Supplying a different loop size still produces results for a loop size of 10:
$ node tool/benchmark.dart.js --loop-size=100
Classic Visitor Pattern 6543    10      654.3
My initial attempt at solving this is String.fromEnvironment():
const String LOOP_SIZE = const String.fromEnvironment("LOOP_SIZE", defaultValue: "10");

class Config {
  int loopSize, numberOfRuns;
  Config(List<String> args) {
    var conf = _parser.parse(args);
    loopSize = int.parse(LOOP_SIZE);
    // ...
  }
  // ...
}
But that does not work. When I run the script, I still get a loop size of 10:
$ LOOP_SIZE=100 node tool/benchmark.dart.js
Classic Visitor Pattern 6160    10      616.0
Stumped there, I call it a night.

I would have preferred to get all the way to a proper visualization, but the switch to tee for building the tab separated artifact is already a win. The filename is now located in a single script (the overall shell script) instead of two places (the shell script and Dart code). Hopefully I can figure out some way to read Node.js command line arguments from compiled Dart. Tomorrow.


Day #126

Thursday, July 17, 2014

Gnuplot Dart Benchmarks


I am embracing the idea of benchmarking the solutions that I add to Design Patterns in Dart. This early in the research, I am unsure if I will have a strong need for benchmarks. What I do know is that if I cannot make benchmarking easy, then I will almost certainly not use them.

So tonight, I take a quick look at the last piece of my puzzle: visualization. After last night, I have a solution that can store the average run time of a single benchmark run against the number of loops to gather that data. The CSV output looks something like:
Loop_Size, Classic_Visitor_Pattern, Nodes_iterate_w/_single_dispatch, Visitor_Traverses
10, 122.4, 110.6, 117.5, 
100, 118.5, 112.4, 117.6, 
1000, 120.4, 112.4, 116.8, 
10000, 148.6, 148.2, 117.9, 
100000, 148.1, 148.7, 118.5, 
I can paste that data in a spreasheet to produce nice looking graphs like:



But let's face it, if I have to open a spreadsheet every time that I want to look at my data, I am never going to do it. So unless I can find some way to quickly visualize that data, all of this would be for naught. Enter gnuplot.

The preferred data format for gnuplot seems to be tab separated values, so I rework the row output in my Dart benchmark reporter to output tabs instead of commas:
    // ...
    var row = '${loopSize}\t';
    implementations.forEach((implementation){
      var rec = records.firstWhere((r)=> r['name'] == implementation);
      row += '${rec["averageScore"]}\t';
    });
    // ...
The graph type that I want in this case is a clustered histogram culled from data. In gnuplot parlance, that translates as:
set style data histogram
set style histogram clustered
I fiddle a bit with styles (and cull a few from Google / Stack Overflow) to settle on these settings:
set style fill solid border
set xtics rotate out
set key tmargin
The fill style makes the bars a little easier to see and the other two make the data and legends easier to see. Finally, I plot the data as:
plot for [COL=2:4] 'bar.tsv' using COL:xticlabels(1) title columnheader
Which results in:



That is quite nice for just a little bit of work. I do, however, note that this graph suffers from a pretty serious flaw—the same flaw from which my spreadsheet graph suffered unnoticed by me until now. The y-axis does not start at zero, which greatly exaggerates the discrepancies between my three implementations. To start the y-axis at zero, I use:
set yrange [0:]
Now my graph looks like:



Now that I see the “jump” in values at the 10k loop size in that perspective, it does not seem nearly as worrisome.

If I put that all in a gnuplot script file:
# set terminal png size 1024,768
# set output  "bar.png"
set style data histogram
set style histogram clustered
set style fill solid border
set xtics rotate out
set key tmargin
set yrange [0:]
plot for [COL=2:4] 'bar.tsv' using COL:xticlabels(1) title columnheader
pause -1 "Hit any key to continue"
Then I can quickly see my benchmarking results whenever I need to. No excuses.


Day #125