Tuesday, November 11, 2008

Of OLAP and the importance of open standards

In these times of economical crisis, many companies will turn to business intelligence (BI) as a source of wisdom and counsel. Millions of dollars will be invested in an effort to understand the extend of their respective problems and find solutions based on accurate and decision oriented datasets.

Since I have a fairly good amount of experience with work in heterogeneous environments and tackling data integration challenges, I thought I'd pitch in my two cents.

Why developers and project managers will have a hard time


The root of the problem is this. The Microsoft OLAP toolkit does not integrate so well with anything else than .NET technologies. SAS offers a Java API, yet it is not ready for production. (I worked with it for two years, and believe me, they are still a fairly long way to production quality code.) As a matter of fact, most software vendors in the OLAP world distribute some API to integrate their technologies, but you often end up with black boxes of questionable quality, flexibility or performance. Some even go as far as to obfuscate their libraries... this really doesn't help in the end.

Some vendors like Oracle went for the all-in-the-box solution. They offer a "complete" solution that can fit every possible need. Then again, what they are telling you is: if we don't have it, you probably don't need it."Probably"? You got to be kidding. Since when does software vendors know what you need and what your future will be? Better switch probably for hopefully.

In the best case, in order to meet your needs, you'll hack your way through at the expense of your project specifications. The final result can be nothing but deceiving. Your celebration will be bitter and probably short lived, I fear.

About the importance of collective work


You have a brand new application. Hooray! This is where the production phase kicks in.

What if you need to move your datamart to another OLAP server? What if there are not enough connections licenses to allow both production connections and all of your maintenance personnel on the OLAP server and they are forced to take turns to debug? What if the CEO decides to migrate to a new platform? What if [insert random but oh so frequent unforeseen event here]? Your thousand dollar code is now rendered useless; you can start crying now, you deserve it. In your quest for more money making, you've created a monster that was expensive and will continue to pump the money out of your institution pockets.

If you were good enough in systems design, you thought about a data layer. The data layer still remains to be rewritten entirely and it often represents at least a third of the overall effort. Close, but no cigar. This might sound like a catastrophic scenario, but it is oh so frequent.

Many people got tired of all this non-sense we decided to work together. We decided that enough time and money was wasted on individual efforts that were ruined in the end.  It was time to agree on standards and share the product of our collective effort.

Take Hibernate for example. It is now a de facto standard when it comes to data mappers. For the Java version alone, it represents 859 thousand lines of code worth 12.8 million dollars in work hours. Think you can top that with your in-house data layer in times of economical crisis?

About Java OLAP


OLAP is a world in itself. You can't take relational paradigms and apply them to the multidimensional world. The .NET toolbox does have very nice libraries to do some neat OLAP stuff, then again, you're locked-in with SSAS. This is a no-no.

On the Java side, things are even worse. There is currently a big void in the Java OLAP market. No OLAP standard emerged at all. Thanks to the selfishness of the big players of the industry, the JOLAP initiative was a total failure. It never reached the final version, so the JSR-69 specification died quietly.

We at Olap4j tried to fill that gap with an open initiative. Everyone can pitch in. And I mean EVERYONE.

What makes Olap4j so kewl


You know the expression vendor lock-in? I hope you do, I *really* do, or else you'll learn it the hard way. Olap4j aims at solving exactly this problem. You can develop applications on it's API and switch the underlying OLAP engine without rewriting a single line of code. Not bad heh? Olap4j is more than a database driver. It is an open API built right on top of the JDBC industry standard where everyone collaborates to specify a common base onto which to build.

It even includes transformation libraries and testing facilities.

I want to kick the tires and use it right now


So far, it has two implementations ready to use. The Mondrian driver allows you to run the much acclaimed Mondrian open source OLAP engine as an in-process data provider.

There is also the XML/A generic driver that can connect to pretty much anything that talks XML/A, whether it's over HTTP or anything else you fancy using. This particular driver allows you to build applications that can switch to and from any of these OLAP engines :

  • Hyperion Essbase

  • Microsoft SQL Server Analysis Services

  • Infor

  • Mondrian

  • Palo


The Olap4j project is gaining momentum and we truly hope to see it become the standard in the Java world.

3 comments:

  1. I agree with the importance of open standards.

    Some comments:

    1) The XML crowd seems to fare a bit better. They have DOM, an old and ugly cross-language API that is a pain to use... But at least, it is cross-language! Why am I limited to Java when I want an open API? Because that is the way Oracle went, and the way Mondrian went... and therefore everyone should be using Java?

    I want an API that I can use in Ruby and Python. Please.

    2) You guys need to write more. A lot more. Document, write papers and so on. Obviously, there is a lot of good (Java) code out there... yet, there is so few papers on it that you are never going to get any momentum within academia.

    Look at the XQuery people for example... they wrote lots of papers so that at least academia was aware of their idea and could teach it...

    Making decision makers and engineers aware of the technology is a good first step.

    Beside openness, what are the benefits of olap4j... why did you guys design it the way you did, and so on. We need to talk about these things.

    ReplyDelete
  2. Interoperability between languages is not one of the aim of Olap4j. It was created to fill the gap between JDBC and multidimensional data. Interoperability should not be a requirement at this level. As far as cross platform functionality is concerned, the limit resides in MDX as the query language and XML/A as the grammar.

    I think it's fine like that and we should not try to cross over this very thin line. But since you mentioned Ruby and Python, it is still possible to use Olap4j in Ruby or Python, thanks to the JRuby and JPython libraries.

    You are right on the fact that we should "make more noise" and get Olap4j to the academics and the engineers of the IT world. We do lack documentation, we're aware of this. Olap4j has no official sponsor and is developed by community members only. This is a sorry excuse for a lack of documentation, I agree, then again, this is the reality in witch it evolves. More time will be spent on these issues as soon as we hit 1.0.

    ReplyDelete
  3. First of all, sorry about my poor english in advance....

    I strongly agree with you in terms of Technology, but in terms of project managment I have been pass a hard time in Open Source project based.

    It's hard to find a good BI professional in Brazil market (I dont know about the others), normally good BI consultants has business knowledge, database modeling and a comercial tool (like Cognos, BO)...but in Open Source Environments will be vital to have programming knowledge....well, I am solving using a third resource but I dont think will be a good solution in all projects....

    ReplyDelete