Architecture

Must-haves for future grid-computing

October 2nd, 2008

I’ve been test-flying a few datagrid/datafabric products lately and had a nice opportunity to try them out on gig-ethernet and infiniband. The thing I noticed with all distributed-data systems is that synchronous-replication is always a bottleneck, because you’re forced to wait on the replication-ack before proceeding with the next operation. Because of this, gigabit-ethernet’s latency puts a ~5k ceiling on any throughput rates you can achieve.

For any grid vendor it’s crucial to rollout modern, ultra-low latency interconnect support then, as its latency characteristics blow away any Gig-E numbers. For real-time price publishers, for algo-trading, latency is the key issue, more than throughput.

20-million messages per second?

September 24th, 2008

At a client-site I was privy to an impressive IBM presentation on their

Websphere MQ Low Latency Messaging platform (LLM)
, with some pretty exorbitant claims: 8-million messages per second on gig-ethernet, and 21-million per second on infiniband(!).

I’ll be interested to see if it stands up to scrutiny.

PureMVC

August 13th, 2008

Cross-language implementation of the MVC meta-pattern - PureMVC. Supported languages:

  • ActionScript 2
  • ActionScript 3
  • C#
  • ColdFusion
  • haXe
  • Java
  • Perl
  • PHP
  • Python
  • Ruby

Slimmer, trimmer messaging from Google

July 15th, 2008 / Joe on Computing

Google’s Protocol Buffers offer lightweight, language-independent object serialization. I love the design, especially as I’m increasingly seeing enterprise networks clogged with hordes of oversized XML messages. Protocol buffer bindings are available for C++, Java, and Python, but not C# yet. Once there is support for .NET, I think this could be a really interesting technology for financial applications.

Practical PLT Part 4: A Garbage Collector

June 24th, 2008

In the previous articles of this series we’ve seen how to create a simple Scheme interpreter, then how to write an S-Expression parser to feed the interpreter, and in the last article we saw how to most conveniently bind C++ functions to our interpreter with the strategic application of template classes and a “meta-circular” code-generator that built up a vital part of our interpreter (using our interpreter to do it).

Though we’ve come a long way, there’s still one glaring hole in our interpreter design: we’ve left the allocate<T> function unspecified.  This function is meant to allocate a garbage collected value of type T, which implies that we’ve got to write a garbage collector.  In this article, that’s exactly what we’ll do.

Read the rest of this entry »

Features that Suck!

May 21st, 2008 / London Coder

Features come in many types: only one type really matters. The rest suck!

The one that matters is the User Requested Feature. Sadly its apparent that this type of feature never crosses the mind of many of the folks that build applications and web sites. And even more sadly these features tend to be complex so get dropped first when projects enter Phase 3.Designed by someone who likes tetris

The first type of feature that does not matter is the Developer Feature. These can range from outright bugs to messy APIs that can only be used if you know what going on behind the scenes. Diabolical UI Crimes also belong in this category. These confusion inducing features come from a lack of User Requested Features.

The second type of feature that does not matter is the generic spec feature. These usually come about due to BAs  guessing what users would like from their application. “Well we have a list of things… so… we’ll definitely need to sort by every column, …bound to be important”

Outlook is a perfect example, you can sort and group (slowly) all your email but what you really want to do is search, which you can’t do. Contrast that with GMail. Give people what they want not what you think they want. Lookout did just that and Microsoft bought them to hide their shame. Again this type of feature comes from a lack of User Requested Features.

When the owners wife adds a feature...The third type of feature that is not important is the old technical expert feature. These come in the form of ropey architectural decisions like “we’ll use technology blah” from technologists that are now above programming so just deal out great wisdom… yawn. If you can’t code it, don’t suggest it.

The fourth (be certainly not final) type of feature that doesn’t matter is the infinite configurability feature. Whenever a decision point comes; you go both ways and then let the user configure which behavior they ‘want’. Let me tell you a secret: users don’t care, and being asked just angers them. Take as many decisions as possible, use intelligent defaults and don’t make users think!

User Requested Features are almost the only type of feature your software should have. The problem is that they can be complex, tricky to implement and usually require some creativity to solve. But they’re so neglected there is always some low hanging fruit.

So… if you’re a developer, try asking your users for a small feature they would like and …just add it. If you work for BigCo you’ll start making powerful friends and if you work on the Internet you’ll drive more traffic!

And who knows you might just enjoy it…

(Short && Simple) == Sweet

May 20th, 2008 / London Coder

There’s quote attributed to Blaise Pascal that goes:

“The present letter is a very long one, simply because I had no leisure to make it shorter.”

It’s an observation that brevity is more difficult to produce that verbosity.

However, modern programming ideologies encourage you to write your solutions in a verbose framework or with an X-first methodology (pick an X) or with restrictive rules to help you “be a better programmer”.

It\'s all very complex you see

There are plenty of (typically aggressive) ideology pundits that will rattle off the usual straw-man arguments about using their strict set of rules: the power of sameness, easier maintainance, easily understood code… etc. etc. You can usually spot these people because conversations with them feel like you’re playing an old skool text-based adventure game …and you’re probably stuck in a loop.

The truth is that only Deliberate Practice will make you a better programmer. Only loose coupling and simple architecture will make a system maintainable. And the ONLY way to make good software is to build it for the people that will use it, with their feedback.

Having 7 classes where you could have had 2 is gold-plating. Building everything to an interface is gold-plating. Having more than 1 factory is gold-plating.

So the next time your tempted to build a system of abstractions think of the words of Seneca:

“Love of bustle is not industry”

Aside: In Pascal’s day letters cam in iterations because there were word processors, perhaps a good thing we’ve lost…

Podcast: Lab49, ScaleOut, and Microsoft Talk About Distributed Cache


About three weeks ago, I had the opportunity to sit down with Bill Bain of ScaleOut Software and the two Joes, Joe Cleaver and Joe Rubino, from Microsoft’s Financial Services Industry Evangelism team after I gave my presentation on distributed caches at Microsoft’s 6th Annual Financial Services Developer Conference. The two Joes recorded a podcast of our conversation.

Bill, Joe, and Joe, thanks for the opportunity to talk with you guys.

Dataflow via Data Binding, Part 1: Introduction


Dataflow is about creating a software architecture that models a problem on the functional relationship between variables rather than on the sequence of steps required to update those variables. It’s about shifting control of evaluation away from code you write toward code written by someone else. It’s about changing the timing of recalculation from recalculate now to recalculate when something has changed. Sure, it’s a distinction that may have more to do with emphasis and point of view than with paradigm, but it can be a liberating distinction for certain problems in financial modeling.

If you work in finance, chances are you may already be expert in today’s preeminent dataflow modeling language: Microsoft Excel. Excel is the undisputed workhorse of financial applications, taught in every business school, run on every desk, wired into the infrastructure of nearly every bank, fund, or exchange in existence. The reason for Excel’s singularity in the black hole of finance is its ability to emancipate modeling from code (and thus developers) and empower analysts and business types alike to create models as interactive documents. Make no mistake — writing workbooks is still very much software development. But Excel’s emphasis on data rather than code, relationships rather than instructions, is something that fits with the work this industry does and the people that do it.

Briefly, when you model in Excel, you specify a cell’s output by filling it with either a constant value or a function. Functions are written in a lightweight language that allows function arguments to be either constant values or references to another cell’s output. In the typical workbook, cells may reference cells that in turn reference other cells, and so on, resulting in an arbitrarily sophisticated model that can span multiple worksheets and workbooks. The point though is that, rather than specifying your model as a sequence of steps that get executed when you say go, here you describe your model’s core data relationships to Excel, and Excel figures out how and when it should be executed.

Example: An Equities Market Simulation

Let’s say that we are writing a simulation for an equities (stock) market. Such a simulation could be used for testing a trading strategy or studying economic scenarios. The market is comprised of many equities, and each equity has many properties, some that change slowly over time (such as ticker symbol or inception date), and some that change frequently (such as last price or volume). Some properties may be functions of other properties of the same equity (such as high, low, or closing price), while others may be functions of properties on other equities (such as with haircuts, derivatives, or baskets).

As a starting point, we introduce a simulation clock. Each time the clock advances, the price of all equities gets updated. To update prices, we use a random walk driven by initial conditions (such as initial price S0, drift r, and volatility σ), a normally distributed random variable z, and a recurrence equation over n intervals of t years: 

S_{n} = S_{n-1} \cdot \exp(r t - 0.5 \sigma^2 t + \mathbf{z} \sigma \sqrt{t} )

Note: This equation provides a lognormal random walk [1,2], which means that instead of getting the next price by adding small random price changes to the previous price, we’re multiplying small random percentages against the previous price. This makes sense for things like prices since a) they can’t be negative, and b) the size of any price changes is proportional to the magnitude of the current price. In other words, penny stocks tend to move up and down by fractions of a penny while stock trading at much higher prices tend to move up and down in dollars.

In Excel, you could model this market by plopping the value of the clock into a cell, setting up other cells to contain initial conditions, and then have a slew of other cells initialized with functions that reference the clock and initial conditions cells and that calculate a new price using the above equation for each virtual equity. And then hit F9.

But how would you write this in code? Would you just update the clock and then exhaustively recalculate all of the prices? If you had to incorporate equity derivatives or baskets, would your architecture break? How would you allow non-programming end-users to declaratively design their own simulation markets and the instruments within?

Recently, one of our financial services clients at Lab49 has been trying to solve a similar problem in .NET, and I had been suggesting to them that the problem is analogous to how Microsoft Windows Presentation Foundation (WPF) handles the flow of data from controller to model to view. Dependency properties, which form the basis of data binding in WPF applications, implement a dataflow model similar to Excel, and what I had in mind at first was a solution inspired by WPF. But the more I discussed this analogy with the client, the more I realized that we didn’t just have to use WPF as inspiration; we could actually use WPF.

In this series, I’ll dive further into creating the equities market simulation and look at how to use WPF data binding to create a dataflow implementation. Note that there are several considerations to this approach, and, under the category of just because you can doesn’t mean you should, we’ll evaluate whether or not this method has legs.

[to be continued]

Article published in GRIDtoday


The Marc Jacobs Utilization Meter has been pegged for at least two weeks now on a combination of client work, internal projects, recruiting, and writing (hence the appearance of my blog having fallen down a well.) It’s great to be busy, but I hate seeing the blog go stale.

In any event, I had an article published in GRIDtoday this morning entitled, “Grid in Financial Services: Past, Present, and Future”. Derrick Harris, the editor of GRIDtoday, reached out for an article after reading my multi-part series on “High Performance Computing: A Customer’s Perspective”. A big thanks to Derrick for giving me this opportunity.

Complex Event Processing: When Design Patterns Become Concrete


Over the past few months at Lab49, we’ve thrown ourselves into complex event processing (CEP) — aka event stream processing (ESP) — and have been formulating exactly how and when it fits into the larger, more comprehensive technology stack found in global financial services institutions. We’ve formed a number of interesting vendor partnerships, attended product training, sampled, compared, and teased apart many of the popular products, and we’ve created several CEP-based demo applications that have been shown at recent events like SIFMA.

Along the way, we’ve all learned a lot about CEP, and the more I learn, the more I dig it. The more I put CEP into practice, the more I foresee its ultimate dominance as an architectural design pattern for everyday development.

What’s fascinating to me about CEP isn’t that it’s a new idea, despite how it may be touted by vendors. Regardless of the hype, CEP isn’t the most revolutionary technology you’ve never heard of. What’s fascinating is that out from a decades-old, primordial soup of ideas, research, and trial-and-error that, in and beyond academia, has been trying to create architectural models around complex data problems with real-time constraints, enough best practices and design patterns have emerged to evolve an ecosystem of market entrants, seemingly all at once.

It’s not the first time that a bundle of quality design patterns took concrete form as a technology. Object pooling, lifetime management, transaction enlistment, and crash domains begat COM+, Microsoft Component Services, and J2EE application servers. Logging levels, external configuration, adaptable logging sinks begat log4j, syslogd, and the Logging Application Block from the Microsoft Enterprise Library. Unit-testing and test-driven development begat JUnit and its children.

These transformations have been crucial. Once developers accepted these patterns and solutions as sufficiently solved and commoditized, they were saved considerable time and attention. Freed of coding logging libraries and unit testing frameworks for the umpteenth time, developers could focus more on the business problem being solved rather than the infrastructure details required to solve it.

But these transformations didn’t really upset the gross architecture of applications. They may have changed some of the design decisions and simplified the implementations, but they didn’t fundamentally change the abstraction you would use to model a problem and architect a solution.

CEP, on the other hand, does.

Instead of storing and indexing miles of cumulative data in a persistent store to service complex queries in batch/polling fashion, CEP inverts the whole shebang, storing and indexing the complex queries before streaming data across queries without storing a lick. The transformation of a business problem from tables, rows, and polling intervals into events, filters, triggers, and real-time reactions is not only quite enabling, it changes the very way you think about how business problems can be solved and which problems may have viable solutions.

Over the next few weeks, I’ll delve a bit more into CEP and how it relates to technologies you might be more familiar with. In the meantime, check out some of the in-depth blog entries other folks from Lab49 have been writing about CEP.