Streambase: First Impressions
I just completed two days of training on Streambase v 3.7, one of the leading Complex Event Processing (CEP) frameworks. What follows are my initial impressions. I plan to try to develop a non-trivial application in the next few days and will repost based on that experience.
The first thing that stands out about Streambase is its support for two approaches to application development. The first is a graphical approach where you build an application by constructing a graph of icons representing input streams, operations, tables, output streams and the like. The second approach is textual using a SQL like language (StreamSQL) with many extensions relevant to streaming applications.
My feeling before the class began was that the graphical approach was going to be rather lame and that I’d much prefer developing directly in the language. However, after spending the entire first day of class in the graphical environment, I have a somewhat more positive impression. To Streambase’s credit they did not try to create a development environment from scratch but host it inside of Eclipse’s Rich Client Platform. Developers familiar with Eclipse will therefore feel at home. I was told the next version of Streambase is going use full blown Eclipse. The most compelling aspect of the graphical environment is that it greatly decreases the learning curve for individuals who have little experience with CEP. The graphical environment lets you concentrate on the concepts in a intuitive fashion without having to deal with the minutia of the syntax. It also helps that a streaming application maps well onto a directed graph. However, after mastering the product, I am fairly sure I would prefer the textual approach because an experienced developer will always be more productive in a text editor. This is especially true in the case of Streambase because its SQL like language is very compact and expressive (several nodes and arcs in the graphical view often collapse to a single StreamSQL statement).
Developing a streaming application begins with defining Input Streams. An Input Stream is associated with a schema that defines the data that can arrive as input. Streambase only supports primitive types (int, double, string, boolean and timestamp). It does not support inputs that contain vectors or nested structures. Therefore, you’ll have to write an adapter to flatten the data if you plan to hook Streambase to some legacy feed that has more complex types. Streambase uses the term tuple to describe the individual records that flow through the stream and therfore the schema is a description of a tuple.
To input streams you will typically attach filters, map operators, and aggregation operators. A filter is a set of predicates that divert the stream to different paths based on the truth value of each predicate. A map is a way of transforming tuples to create new tuples with possible different schema. Aggregation involves performing operations likes sums, counts, averages and the like on windows of tuples. Windows are a very rich concept in Streambase. They can be defined based on a fixed number of tuples, a time period or in more complex ways that are a function of the data flowing through the stream.
In addition there are a variety of other constructs for performing unions, joins, sorts, merges, etc. It would take too long to describe them all here but I’ll probably touch on some of these when I develop a stream base application. My immediate plan is to construct a mini ECN. In the mean time you can download your own evaluation version of Streambase and peruse their documentation.
Streambase is a highly extensible framework and you can create custom operations and adapters in Java and use them from within Streambase studio as if they were native components.
Another very impressive feature of Streambase is it Feed Simulation tool. This tool provides a rich language for describing input data in a statistical fashion so that tuples can be randomly generated to help test your app. Data can also be manually entered, fed from files or databases or obtained directly from middleware or commercial feeds (like Reuters) via adapters available from Streambase.
To sum up, my initial impressions are very positive. However, I do have some misgivings as to how the tool will scale to a complex application and development in the large. For example, there is presently no integration with version control although that is promised in the next release. I’ll have a lot more definitive opinions after a try out something complex.



May 24th, 2007 at 5:17 pm
Hi Sal,
Glad you enjoyed the class, and thanks for the write-up.
I work for StreamBase as a Services Consultant; I’ve been doing little else besides developing StreamBase applications, client, and plugins for the last 2.5 years. Yeah, yeah, I guess I’ve drunk the Kool-Aid, but on the other hand there aren’t many people in the world with more StreamBase application development experience than I have.
A few comments on your notes:
Yes, it’s true that the current StreamBase SBStudio IDE doesn’t have source control repository integration. But it does let you define a project whose local storage is external to the StreamBase Workspace. Which means you can use any external source control tool to manage SBStudio project files. On Windows, I personally find SBStudio and TortoiseSVN to be a nice combination. That’s not to say it won’t be a great day when SBStudio source control integration is released. But the lack of it shouldn’t give you much pause.
Second, I totally get what you’re saying about StreamSQL (the SQL-like text language) and Event Flow (the graphical language). Lots of people new to StreamBase think that they will like one or the other better. But I encourage you to give them both a chance. You may be surprised, and you may learn that “productivity” is a more nuanced concept than you now think . . . . Fortunately, with StreamBase you get to use either or both of the languages and you can use both in the same application if you want.
In terms of elevating productivity in Event Flow, check out the Save Schema and Schema Copy features of SBStudio. I tend to use the Saved Schemas View like a cut/paste clipboard for StreamBase Schema. These are often the unsung heroes of my development day.
In terms of encouraging “development in the large” in Event Flow, think on these 3 features: Modules, Module Reference Input Schema Overrides, and implicit-mode Operators. These things together make it possible to design segmentable applications that are resilient to schema evolutions, not to mention module reuse between projects. These features have analogues in StreamSQL, as well.
Have fun!
-Steve
May 24th, 2007 at 10:24 pm
Good points Steve.
The real problem I see with the graphical approach and programming in the large has to do with a complex project with multiple developers working out of version control. Since the XML files produced by the graphical approach encode metadata about the position of nodes and the like, it would make it cumbersome for multiple developers to work in the graphical environment. Consider the case where you create a graphical application and checking it in. I check it out and rearrange some of the icons to suit my tastes but make no other changes. Meanwhile you do the same. We will each have version control conflicts even though neither of us has made a change to the semantics of the app.
Even if you guys develop a way to separate the graphical aspects from the semantic aspects, the text approach will still be superior to a “real programmer”. Cut and paste, search and replace, regex matching, syntax highlighting, completion, and the like are all things developers master in their favorite editor. I type with just two fingers but code as fast as any developer because I have mastered my preferred editor. This is one reason why profession developers are so religious about there editor of choice.
Finally, there are some places where your present graphical environment is less helpful than it could be. I often find myself having to type in places that in theory the tool could offer a drop down of choices. I’m sure the graphical environment will improve as you guys mature the tool.
Your other points about modules and the like are good and they show that Streambase put a though into some important aspects of programming in the large.
May 25th, 2007 at 1:28 pm
Ah, yes, there is a difference between source repository integration and “visual diffing” of Event Flow programs. Two separate features, and worth not mushing together. We’re doing work in both areas. All I can say in this forum is hang in there and see what we do. “Visual diffing” of graphical programs is not really a well-developed domain; it may take us a few iterations to get to “fantastic” but we’ll get to “very useful” fairly quickly.
I really, really do get absolutely everything you are saying about text vs. graphical programming in terms of code manipulation productivity and religiosity of editors and all that. I get it, I’ve lived it. I used Emacs for 20 years(!) before switching to Eclipse (and now SBStudio as well) as the heart of my day-to-day development toolset.
And I still say: just try Event Flow for a while. Suspend some disbelief. Try not to make up your mind after just a couple days. It’ll pay off.
It’s not apples to apples in terms of overall productivity. It’s not always “how fast can I manipulate the commas for these 97 fields?” Some days it is, “I just shortened what would have been a 3 hour meeting to 5 minutes because I could point the Business Analyst at the Event Flow graph and it was just obvious.”
So it’s not really one or the other . . . it’s both. It’s why we have both. The tension between them is still playing out.
-Steve
May 29th, 2007 at 7:49 pm
[...] This is a follow-up to my previous post about Streambase. [...]