Monday, April 13, 2009

Pentaho Analysis Tool - Final push for sprint 2

I've been really busy lately working on sprint 2 of Pentaho Analysis Tool. We almost reached all our goals for this sprint and are hoping to wrap it all up in the next two weeks. There is still time to add a few late requirements in this sprint so if anyone has a very special wish, now is the time to express it.

About Pentaho Analysis Tool (PAT)


PAT is an attempt to replace the good ol' JPivot application, widely used in the Java world, as a web based browser for OLAP data. There are quite a few similar projects out there, yet none of them quite make the cut in terms of enterprise requirements.

  • Ad-hoc connections

  • ACL management

  • User defined connections saved for later use

  • Saved queries

  • Multiple queries editing at once

  • [ insert even more entreprise software mumbo jumbo here]


We're writing a Google Web Toolkit (GWT) front-end and a Spring based backend as a core. All data manipulations are possible thanks to the Olap4j API. There was talk of a Json bridge later on in development, but this requirement is not part of any sprint planning for now.

The project is hosted on Google code and all the project management is done in the Jira tracker. If you have any further questions about our project or want to chat for whatever reasons, we can be reached via the mailing list or ##pentaho.pat on freenode.

Wednesday, March 25, 2009

Of easy and painless systems monitoring

I'm not a systems administrator. I only have 8 servers to babysit and it used to be enough to be a time consuming problem. You might not be a systems administrator either, nor have many machines / services / websites to monitor, yet the fact remains that as IT professionals we need to keep a close eye on what's going on. I'm not talking about 99.999% uptime here, but a 1% downtime is enough to make a lot of customers, clients and managers angry; especially since outages have a way to happen exactly when it should not.

What are your options? How much does it cost? What can you monitor? These are all questions I'll try to shed a light on. The solution I'm proposing today is one I used myself for years. I'm not legally obliged to 5-9 availability, yet this is what I achieved with a total cost of 0. Yep, z.e.r.o. zero. El zilcho.

I'm not saying this will work for anybody, neither am I pretending to be an expert on the issue at hand, but I learned a lot in a few years on the subject so here it is.

Monday, March 16, 2009

Evading (D)DOS attacks with Apache HTTPD

Just a quick tech-tip. Ever wondered how to prevent your HTTPD server from being knocked off the net by a DOS (Denial Of Service) attack? Check this nifty little module.

Mod Evasive


Its pretty easy to setup. Compile the module as you would normally do for HTTPD modules and create a configuration file. There are many options available. Here's an example of how to configure it.




<IfModule mod_evasive20.c>
DOSHashTableSize 3097
DOSPageCount 6
DOSSiteCount 100
DOSPageInterval 2
DOSSiteInterval 2
DOSBlockingPeriod 600
DOSEmailNotify "my-monitoring-contact@domain.com"
DOSWhitelist  192.168.*.*
</IfModule>


More details on the configuration and how each parameter will affect the module behavior can be found out there on the net.


Beware though, before installing this, make sure you won't blacklist some legitimate users. For example, if you have a AJAX application that sends a burst of requests once in a while, it might get blacklisted. Make sure you test it in a development environment so you get the thresholds right.

Thursday, February 12, 2009

Of economic opinions and commentators

There is a lot of blogging being done out there in these times of economical crisis -- yes, it is a crisis. Most of it is utter garbage, mixing opinions with specifically picked facts to serve a given purpose, yet somehow,  I still find it important to read it all. There is an old saying that goes something like : "Fool is the one who ignores what he considers not worthy, for wise is the one who can learn from anything."

Most of the wisest things i read were published in my favorite monthly publication, Le Monde Diplomatique. On the counterpart, a hellish lot of garbage can be found pretty much anywhere. Then again, during one of my many news scavenging sessions, I was genuinely surprised to find this little post from a man i never noticed before. I do believe this man has a proper sense of economical and political analysis. Here's an excerpt.
US policymakers have ignored the fact that consumer demand in the 21st century has been driven, not by increases in real income, but by increased consumer indebtedness.  This fact makes it pointless to try to stimulate the economy by bailing out banks so that they can lend more to consumers.  The American consumers have no more capacity to borrow.

With the decline in the values of their principal assets--their homes--with the destruction of half of their pension assets, and with joblessness facing them, Americans cannot and will not spend.

Why bail out GM and Citibank when the firms are moving as many operations offshore as they possibly can?

(...)

The US government really has only two possibilities for financing its budget deficit.  One is a second collapse in the stock market, which would drive the surviving investors with what they have left into “safe” US Treasury bonds.  The other is for the Federal Reserve to monetize the Treasury debt.

Monetizing the debt means that when no one is willing or able to purchase the Treasury’s bonds, the Federal Reserve buys them by creating bank deposits for the Treasury’s account.  

In other words, the Fed “prints money” with which to buy the Treasury’s bonds.

Once this happens, the US dollar will cease to be the reserve currency.  

In addition, China, Japan and Saudi Arabia, countries that hold enormous quantities of US Treasury debt in addition to other US dollar assets, will sell, hoping to get out before others.  

The US dollar will become worthless, the currency of a banana republic.

I'll keep on the lookout for more interresting articles on this. I beleive that the current economical difficulties are of enoumous importance to us all. Not only are we at risk of loosing big, decisions will soon be made that will dictate the governance of our everyday life for decades to come. I might don't think much of the last decades of governance we just endured, but I certainly won't fallback to cynicism and apathy.

Comments? More reading suggestions?

Tuesday, November 11, 2008

Of OLAP and the importance of open standards

In these times of economical crisis, many companies will turn to business intelligence (BI) as a source of wisdom and counsel. Millions of dollars will be invested in an effort to understand the extend of their respective problems and find solutions based on accurate and decision oriented datasets.

Since I have a fairly good amount of experience with work in heterogeneous environments and tackling data integration challenges, I thought I'd pitch in my two cents.

Why developers and project managers will have a hard time


The root of the problem is this. The Microsoft OLAP toolkit does not integrate so well with anything else than .NET technologies. SAS offers a Java API, yet it is not ready for production. (I worked with it for two years, and believe me, they are still a fairly long way to production quality code.) As a matter of fact, most software vendors in the OLAP world distribute some API to integrate their technologies, but you often end up with black boxes of questionable quality, flexibility or performance. Some even go as far as to obfuscate their libraries... this really doesn't help in the end.

Some vendors like Oracle went for the all-in-the-box solution. They offer a "complete" solution that can fit every possible need. Then again, what they are telling you is: if we don't have it, you probably don't need it."Probably"? You got to be kidding. Since when does software vendors know what you need and what your future will be? Better switch probably for hopefully.

In the best case, in order to meet your needs, you'll hack your way through at the expense of your project specifications. The final result can be nothing but deceiving. Your celebration will be bitter and probably short lived, I fear.

About the importance of collective work


You have a brand new application. Hooray! This is where the production phase kicks in.

What if you need to move your datamart to another OLAP server? What if there are not enough connections licenses to allow both production connections and all of your maintenance personnel on the OLAP server and they are forced to take turns to debug? What if the CEO decides to migrate to a new platform? What if [insert random but oh so frequent unforeseen event here]? Your thousand dollar code is now rendered useless; you can start crying now, you deserve it. In your quest for more money making, you've created a monster that was expensive and will continue to pump the money out of your institution pockets.

If you were good enough in systems design, you thought about a data layer. The data layer still remains to be rewritten entirely and it often represents at least a third of the overall effort. Close, but no cigar. This might sound like a catastrophic scenario, but it is oh so frequent.

Many people got tired of all this non-sense we decided to work together. We decided that enough time and money was wasted on individual efforts that were ruined in the end.  It was time to agree on standards and share the product of our collective effort.

Take Hibernate for example. It is now a de facto standard when it comes to data mappers. For the Java version alone, it represents 859 thousand lines of code worth 12.8 million dollars in work hours. Think you can top that with your in-house data layer in times of economical crisis?

About Java OLAP


OLAP is a world in itself. You can't take relational paradigms and apply them to the multidimensional world. The .NET toolbox does have very nice libraries to do some neat OLAP stuff, then again, you're locked-in with SSAS. This is a no-no.

On the Java side, things are even worse. There is currently a big void in the Java OLAP market. No OLAP standard emerged at all. Thanks to the selfishness of the big players of the industry, the JOLAP initiative was a total failure. It never reached the final version, so the JSR-69 specification died quietly.

We at Olap4j tried to fill that gap with an open initiative. Everyone can pitch in. And I mean EVERYONE.

What makes Olap4j so kewl


You know the expression vendor lock-in? I hope you do, I *really* do, or else you'll learn it the hard way. Olap4j aims at solving exactly this problem. You can develop applications on it's API and switch the underlying OLAP engine without rewriting a single line of code. Not bad heh? Olap4j is more than a database driver. It is an open API built right on top of the JDBC industry standard where everyone collaborates to specify a common base onto which to build.

It even includes transformation libraries and testing facilities.

I want to kick the tires and use it right now


So far, it has two implementations ready to use. The Mondrian driver allows you to run the much acclaimed Mondrian open source OLAP engine as an in-process data provider.

There is also the XML/A generic driver that can connect to pretty much anything that talks XML/A, whether it's over HTTP or anything else you fancy using. This particular driver allows you to build applications that can switch to and from any of these OLAP engines :

  • Hyperion Essbase

  • Microsoft SQL Server Analysis Services

  • Infor

  • Mondrian

  • Palo


The Olap4j project is gaining momentum and we truly hope to see it become the standard in the Java world.