Tuesday, November 11, 2008

Of OLAP and the importance of open standards

In these times of economical crisis, many companies will turn to business intelligence (BI) as a source of wisdom and counsel. Millions of dollars will be invested in an effort to understand the extend of their respective problems and find solutions based on accurate and decision oriented datasets.

Since I have a fairly good amount of experience with work in heterogeneous environments and tackling data integration challenges, I thought I'd pitch in my two cents.

Why developers and project managers will have a hard time

The root of the problem is this. The Microsoft OLAP toolkit does not integrate so well with anything else than .NET technologies. SAS offers a Java API, yet it is not ready for production. (I worked with it for two years, and believe me, they are still a fairly long way to production quality code.) As a matter of fact, most software vendors in the OLAP world distribute some API to integrate their technologies, but you often end up with black boxes of questionable quality, flexibility or performance. Some even go as far as to obfuscate their libraries... this really doesn't help in the end.

Some vendors like Oracle went for the all-in-the-box solution. They offer a "complete" solution that can fit every possible need. Then again, what they are telling you is: if we don't have it, you probably don't need it."Probably"? You got to be kidding. Since when does software vendors know what you need and what your future will be? Better switch probably for hopefully.

In the best case, in order to meet your needs, you'll hack your way through at the expense of your project specifications. The final result can be nothing but deceiving. Your celebration will be bitter and probably short lived, I fear.

About the importance of collective work

You have a brand new application. Hooray! This is where the production phase kicks in.

What if you need to move your datamart to another OLAP server? What if there are not enough connections licenses to allow both production connections and all of your maintenance personnel on the OLAP server and they are forced to take turns to debug? What if the CEO decides to migrate to a new platform? What if [insert random but oh so frequent unforeseen event here]? Your thousand dollar code is now rendered useless; you can start crying now, you deserve it. In your quest for more money making, you've created a monster that was expensive and will continue to pump the money out of your institution pockets.

If you were good enough in systems design, you thought about a data layer. The data layer still remains to be rewritten entirely and it often represents at least a third of the overall effort. Close, but no cigar. This might sound like a catastrophic scenario, but it is oh so frequent.

Many people got tired of all this non-sense we decided to work together. We decided that enough time and money was wasted on individual efforts that were ruined in the end.  It was time to agree on standards and share the product of our collective effort.

Take Hibernate for example. It is now a de facto standard when it comes to data mappers. For the Java version alone, it represents 859 thousand lines of code worth 12.8 million dollars in work hours. Think you can top that with your in-house data layer in times of economical crisis?

About Java OLAP

OLAP is a world in itself. You can't take relational paradigms and apply them to the multidimensional world. The .NET toolbox does have very nice libraries to do some neat OLAP stuff, then again, you're locked-in with SSAS. This is a no-no.

On the Java side, things are even worse. There is currently a big void in the Java OLAP market. No OLAP standard emerged at all. Thanks to the selfishness of the big players of the industry, the JOLAP initiative was a total failure. It never reached the final version, so the JSR-69 specification died quietly.

We at Olap4j tried to fill that gap with an open initiative. Everyone can pitch in. And I mean EVERYONE.

What makes Olap4j so kewl

You know the expression vendor lock-in? I hope you do, I *really* do, or else you'll learn it the hard way. Olap4j aims at solving exactly this problem. You can develop applications on it's API and switch the underlying OLAP engine without rewriting a single line of code. Not bad heh? Olap4j is more than a database driver. It is an open API built right on top of the JDBC industry standard where everyone collaborates to specify a common base onto which to build.

It even includes transformation libraries and testing facilities.

I want to kick the tires and use it right now

So far, it has two implementations ready to use. The Mondrian driver allows you to run the much acclaimed Mondrian open source OLAP engine as an in-process data provider.

There is also the XML/A generic driver that can connect to pretty much anything that talks XML/A, whether it's over HTTP or anything else you fancy using. This particular driver allows you to build applications that can switch to and from any of these OLAP engines :

  • Hyperion Essbase

  • Microsoft SQL Server Analysis Services

  • Infor

  • Mondrian

  • Palo

The Olap4j project is gaining momentum and we truly hope to see it become the standard in the Java world.