February 2018

One of my work projects is publicly available! We call it Reflow. Written in Java, it's a library for composing individual units of work into a directed acyclic graph. My team has started using it to drive a bunch of our data processing, and now you can try it out too. Source and documentation can be found on GitHub, and as of today, we're also publishing build artifacts to JCenter.

Once a dependency graph has been defined, Reflow enables you to run it end to end in a single method call, with multiple tasks executing in parallel when possible. And there's more:

  • Each task can declare that it will produce some output (database tables, files on local disk, etc.), and those tasks can be skipped when the output is already present.
  • Task definitions are extremely flexible—in fact, the only real requirement is that each task be representable with a Java object. If your tasks happen to implement the Runnable interface, it's easy to get them scheduled on an Executor of your choosing, but you can opt to handle scheduling yourself and even schedule tasks outside of the JVM.
  • If you do schedule tasks externally, the state of the overall workflow can be serialized even while tasks are running. This allows you to bring down one “coordinator” process and bring up another without missing a beat.