My First Streambase Application

May 29th, 2007

This is a follow-up to my previous post about Streambase.

In order to test the capabilities of Streambase I decided to create the basic functionality of a matching engine for an ECN. I picked this application for two reasons. First it is an area that I have quite some experience with (in my past life at the NYSE) and second because it is somewhat atypical for a CEP application but not too far afield.

I gave myself the following basic requirements:

  1. The ECN will accept a stream of stock orders (BUY LMT or SELL LMT orders only) for any stock.
  2. The ECN would accept partial or full cancels for previously input orders.
  3. The ECN will accept requests to “look at the Book” for a given stock (i.e. get a summary of open orders by price and side).
  4. The ECN would output a new quote whenever it changed.
  5. The ECN would output a delta summary record whenever there was a change in quantity (available shares) at a price due to new orders, executions or cancels.
  6. The ECN would output Execution Report Records showing how much of an order was executed and what order it crossed to.
  7. The ECN would ouput Cancel Reports showing how much of an order was actually canceled and what was left.

Positive Impressions

I was very pleased with the speed in which the overall flow of the application could be assembled using Streambase. The built in operators (maps, splits, filters, and aggregations) provided 40% of the basic logic. In memory tables and materialized windows with associated logic provided most of the additional functionality. There was one component that I could not figure out how to do natively with the built-in Streambase components. This was the core execution logic which determines the quantity of each orders execution when an incoming executable order arrives. For this I needed to build a custom Java operator. (More on that below).

It is very easy to test in the Streambase environment and this encourages you to test early and test often. However, the support is primarily for interactive testing rather than unit testing (in the spirit of xUnit frameworks).

Negative Impressions

Streambase’s query language strikes me as having some artificial limitations. For example, when doing a read query against an in-memory table you can only order the resulting tuples by a primary or secondary key and worse you can only order at all if the query is a direct match on the specified index. Since a query against a table will always produce a bounded number of tuples having the ability to specify an arbitrary sort should be an option.

A common task in writing applications is to synthesize a unique id for records. Most relational database’s (such as MySQL) provide built in facilities to do this. Streambase provides some rather awkward ways to do it natively. In the end I wrote a simple Java function but Streambase could probably provide a better solution internally.

Streambase’s support for in-memory database tables is very primitive. Each table is isolated from every other table except through streaming operations. This means that there is no notion of a cross table join. For these capabilities you must go outside to a real database via JDBC but that seems to defeat some of the speed advantages that the streaming approach promises. Furthermore, in-memory tables must only be accessed by a single thread so opportunities for exploiting concurrency are more limited when using tables. However, for those inexperienced with the subtleties of concurrency, this is probably a good thing.

Other Impressions

There is a tricky aspect to streaming or event based applications that the uninitiated have to learn to deal with. Consider aggregation operations or operations that otherwise collect up a set of related records. In my application this occurred when a executable order arrived. Here the executable order acted as a stimulus for producing a stream of contra side orders to execute against. Now in a typical hand coded application you would probably use a vector or some other container to deal with the collection of contra orders as a whole. However, in a streaming application you can’t do this because the basic unit of processing is a single tuple. You can, of course, store tuples in a table but when you process the table you end up with a stream of separate tuples again. Hence, you have to put a bit of thought into how related collections of tuples flowing through your app are demarcated. In the end, there are some simple solutions to these sorts of issues (you will use time, tuple counts or ids in some fashion) but it does force you to think about simple tasks in new ways. It strikes me that Streambase might provide some additional machinery along these lines such as the notion of a transaction (although I do not have a completely formed idea of just how this would work yet).

Extending Streambase

I mentioned earlier that I needed a custom Java code to create an Executor operator. It turns out that this was fairly easy to do, although it does involve a bit of planning and understanding of the Streambase Studio model that goes above and beyond the functionality of the operator itself. Luckily for Java developers, there is a nice Eclipse wizard that makes developing Java operators almost pleasant. There is no such facility for C++ extensions so I would not recommend using C++ here.

The nice thing about the Java operator model is how the operator integrates seamlessly into Streambase if you code it properly. For example, the operator has a typecheck method that is called every time you change its state from Streambase Studio. This allows you to do validation of its custom parameters and report errors back to the user.

Conclusions

I still have a long way to go before I can make definitive statements about where I would recommend using Streambase. For one thing I have not done much yet by way of performance metrics on the ECN app or in general. For now I can conclude that the streaming or event based paradigm offers some unique advantages but as with any new technology there is still plenty of room for improvement.

2 Responses to “My First Streambase Application”

  1. Eddie Galvez Says:

    Hello Sal,

    My name is Eddie Galvez, co-founder engineer at StreamBase Systems, and lead for UI development. I found your blog and am delighted to be able to read another fresh and spontaneous public trial of our software, especially in light of the very real application you set as your goal.

    I read your prose with an open mind, and your input is invaluable to myself personally, as well as more formally for product management. We take our design methodologies very seriously and an unassisted user experience like yours speaks volumes. I take pride in our software’s power, ease of use and usability and strive to maintain that edge over all other products like ours. Many of the issues you’ve seen, such as determining the end of a set of tuples, are inherent to complex event processing, and are areas of interest to us as we look to improve our product.

    Now, I have my selfish motives for posting: I would love it if you could continue thinking out loud on some of the things you encountered while developing and testing your application, namely:

    – What would you like to see by way of improving the testing of StreamBase applications? Can you describe a workflow of unit testing that you expected to find?

    – Following up on your original post regarding the use of the text editor vs the graphical language; which one did you write this application in, after all? Did you try both, or even a mix using modules?

    Finally, I have shared your blog with other architects here at StreamBase for both your positives and negatives, and language improvements on the roadmap are likely to make things better for you and everyone else, and I can only recommend you stay tuned to our future.

    Best successes with the new world of streaming applications!

    / Eddie

  2. smangano Says:

    Thanks for your comments Eddie. Here is my follow-up to your questions.

    - What would you like to see by way of improving the testing of StreamBase applications? Can you describe a workflow of unit testing that you expected to find?

    I think you guys did a good job in building an interactive test environment. I especially like the feed simulator and the record and playback capabilities. However, it would be nice if one could also author automated unit tests. Basically one would define a test as a set of input tuples to one or more input streams or ports and a set of assertions about expected values at other output streams or ports. There should be a way to “hit a button” to run these tests at any time.

    - Following up on your original post regarding the use of the text editor vs the graphical language; which one did you write this application in, after all? Did you try both, or even a mix using modules?

    For now I stuck to the graphical approach exclusively. However, I just used the conversion tool to convert my sbapp to ssql so I could begin to gain some insight into the text language. In the process I noticed an error: “– WARN: Infinite windows are not supported in StreamSQL”. I also remember from the class that there are other cases where the graphical language is more powerful than the textual one. This strikes me as odd since they are basically just different syntaxes (XML vs. Enhanced SQL). It would seem that they should be equivalent in power in all respects since the backend must support the ultimate feature set.