Author Archives: ngoodman

Mark Rittman on Oracle BI Suite EE

Mark Rittman has worked up an excellent article on the new Oracle BI Landscape. He basically covers the whole range of products and technologies and how they’re being repackaged and rebranded into a new product set. Great read.
I thought the following was of interest:

The Analytic Server is effectively a ROLAP (Relational OLAP) engine, that connects to data sources but does not have any local storage in the traditional, database sense, although it does have a cache which holds copies of data previously retrieved, so that subsequent requests for the same data are returned faster.

What struck me about this statement is that from a technology perspective this is nearly identical to Mondrian‘s (Open Source OLAP engine) architecture. While proprietary vendors (Oracle included) had been touting the benefits of a true MOLAP solution instead of just ROLAP. However, with the processing headed to mid tier servers (instead of OLAP crunching happening on a Pentium I desktop) one can leverage the mid tier for caching, and analytical processing (MOLAP-esque stuff). In other words, you can get MOST of the benefits of MOLAP with a well designed and performant ROLAP server proactively caching/crunching on your behalf! Cool stuff!

UPSERTS using Kettle/MySQL

Oracle Readers: Think MERGE Statement. 🙂

It’s common to need to generate surrogate keys as part of a Data Warehouse / Business Intelligence solution. Protecting your users from souce system changes, or using time variant data are just a couple of the reasons why one needs to generate a surrogate key when loading data into a DW.

Here is a simple example. I’m loading Product data from an XML file into a MySQL table. The transformation in Kettle looks like this:

upsert_kettle_transform.png

The output coming out of the XML file has been flattened using XQuery so that it looks like a flat table struture. Here’s a screen capture of a nifty feature of Kettle where you can preview the output of a step:

upsert_xml_data.png

Ok, this is our source data. The ID you’re seeing is the OLTP key. We’ll refer to it as the Natural Key. This is the ORDER_ID, or CUSTOMER_ID, or well, you get the idea. This is the key that your source application uses in it’s relational structures.

What we want is to INSERT or UPDATE (aka UPSERT) this record into our DW. We’ll INSERT if we don’t already have this NATURAL key in the DW and generate a surrogate primary key. We’ll UPDATE if we already have this NATURAL key in the warehouse.
Our table looks like:
upsert_product_erd.png

The DDL to create the TABLE is important:

CREATE TABLE INT_PRODUCT (
PRODUCT_ID INTEGER NOT NULL AUTO_INCREMENT,
PRODUCT_NAT_ID INTEGER NOT NULL,
TITLE TINYTEXT,
CONSTRAINT INT_PRODUCT_PK PRIMARY KEY (PRODUCT_ID),
UNIQUE INT_PRODUCT_NK (PRODUCT_NAT_ID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Notice that our PRIMARY KEY is the Surrogate we’ve set up to be an AUTO_INCREMENTING column. This means that MySQL will generate a surrogate when we INSERT a record. All we have to do is get Kettle to match our unique contrainst (INT_PRODUCT_NK) and either INSERT/UPDATE.

I’ve used the INSERT / UPDATE operator in Kettle. The configuration looks like this:
upsert_upsert_config.png
What we’re doing is configuring Kettle to do a lookup on the table to see if the natural key already exists. If it does, we’ll perform DML (insert or update) on the fields in the lower window.

After running the transformation my INT_PRODUCT table looks like this.
upsert_surrogate.png
Now, I realized I didn’t choose very illustrative natural keys (all are 1,2,3) but the PRODUCT_ID comes from MySQL as a surrogate.

Subsequent executions of this Kettle transformation will INSERT records as they arrive or UPDATE. MySQL and Kettle in combination are making quick work out of generating these surrogates!

I invite Kettle experts to chime in on any optimizations/quirks.

Connect to your own Database

I get asked frequently by people who evaluate Pentaho: How do I get started? Many people (tens of thousands on a monthly basis) download the Pre Configured application and immediately get a sense for the benefits of the Pentaho open source project.

What people almost always want to do next, is get started with their OWN data. You the reader of this blog might find a pre canned demo interesting, but you’ll want to reserve judgement on the leading open source BI Project until you’ve seen your own data flowing into beautiful PDFs/Excel/OLAP Pivot Tables. I totally understand…

This is a very SIMPLE, BASIC, LIGHTWEIGHT guide on how to get the downloadable DEMO connected to YOUR DATABASE and executing YOUR SQL Query using the lynchpin of the Pentaho platform, an Action Sequence (refer to this blog for more on what Action Sequences are).

Disclaimer: Pentaho has development tools that once installed makes the development of action sequences more productive then editing xml as below. In other words, this is just rough and dirty to get you going and connect… I highly suggest downloading the Development Workbench to actually build the Action Sequences.

Step 1:
Collect the following information for your database.

  • Vendor (Oracle/MySQL) provided JDBC Jar file (classes12.zip, mysql-connector-java-3.1.12-bin.jar)
  • JDBC Connection URL (jdbc:mysql://localhost/database)
  • JDBC User Name (scott)
  • JDBC Password (tiger)
  • A SQL query that you are certain works in your database (select id, title from status)

This is the information needed for ANY Java Application to connect to a database, and is not really Pentaho specific.

Step 2:
Create (using notepad or vi) a file named myfirst-ds.xml in $DEMO_BASE/jboss/server/default/deploy/

Put the following into that file to create a JNDI datasource with the JNDI-NAME “myfirst”

myfirst_001.png
Note: You need to replace the above items with the corresponding JDBC information you collected in the first step.

Step 3:
Edit (using notepad or vi) the file $DEMO_BASE/jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml.
Add the following right below the Shark Connection entry

myfirst_002.png

Step 4:
Edit (using notepad or vi) the file $DEMO_BASE/jboss/server/default/deploy/pentaho.war/WEB-INF/jboss-web.xml.
Add the following right below the Shark entry
myfirst_003.png

Step 5:
Copy your Vendor provided jdbc driver library (classes12.zip) in the following directory:
$DEMO_BASE/jboss/server/default/lib

Congratulations, you’ve setup your JNDI datasource! Whew…. Almost there.

Step 6:
Let’s create your first action sequence!
Create (using notepad or vi) a file named MyFirst.xaction in $DEMO_BASE/pentaho-solutions/samples/datasources
myfirst_004.png
Make sure and replace the select I have above with a select statement that you KNOW works in your database.

Step 7:
Restart your server using the stop-pentaho/start-pentaho commands in $DEMO_BASE.

Step 8:
Open up Firefox (cough IE) and hit the following URL:
http://localhost:8080/pentaho/ViewAction?&solution=samples&path=datasources&action=MyFirst.xaction

You should see the basic output of the Action Sequence which, in this case, is the results of the SQL you provided.

Action Successful


id title
1 Open

That’s it! You’re connected to your own datasource and now you can start to explore the immense power of Action Sequences with your data… There are approximately 50 example .xactions in the demo so use them as a reference when trying to exploring the features of the platform. I HIGHLY SUGGEST downloading the workbench as well. It’s a visual environment for authoring .xaction files. Trust me, you’ll appreciate the GUI!

Move to WordPress

I’ve made the switch from Movable Type to WordPress.  I’ve used Movable Type for more than two years now, quite happily actually, but thought it high time to move to WordPress.  I’m rather impressed with WordPress thus far; very competitive product all in all.  Some things in the installation were even EASIER than MT.

I turned on comments/trackbacks on my MT blog approximately 7 weeks ago.  In that 7 weeks, I had received more than 2,000 spam comments, trackbacks, etc.  I plan to use a spam service on this blog to prevent such an abuse again.

If you are encountering any feed or viewing issues please do let me know.

Oracle: We'll always have Paris

Nothing like running down a good movie quote with techno-babble.

Some people have inquired if I’m heading to Pentaho because I dislike the Oracle tools; specifically if the significant delays (2+ yrs) on Paris have worn on me. I can emphatically state this is NOT the case! I’m moving to Pentaho because Open Source BI is REALLY REALLY cool not because I was unhappy with Oracle Business Intelligence technologies.

I think Paris is a significant improvement for Oracle customers and still believe it provides significant VALUE for customers. That’s one of the reasons I like Pentaho Data Integration (aka Kettle); it provides VALUE in different but very compelling ways.

So, Oracle Warehouse Builder remains a good value/choice in my book. I just think that Open Source BI/ETL/OLAP/DB shines in the “value” equation and I’m thrilled to help make that happen for real customers needing real solutions.

AM I WRONG ON INGRES?

I’ve been watching some of the news lately from Ingres. While their GPL strategy will inhibit their ISV/OEM adoption (amoung other reasons) they seem to be doing two things very very well:

  1. Raising capital (through the divesture fund) for a major expansion/run
  2. Assembling top talent

There has also been murmurs of some new products/strategies, such as a tightly coupled Ingres/Linux that will run on steel or in a VM.

So I ask to those that know… Am I wrong on the future of Ingres?

Open Source OLAP : Standards DO WORK!

A while back I had an impromptu collaboration with Chris Harrington to see if any of his XML/A clients could interoperate with Mondrian. Our initial results were encouraging and Chris has taken some time to put Mondrian to task on his XML/A compatibility tests.

His results, well, I’ll let them speak for themselve:

I have a cmd file full of xmla.wsf invocations to test many kinds of Discover calls and a few Execute calls. So I had this as a ready-made compatability test. I just changed the XML/A URL and the catalog name (“FoodMart” instead of “FoodMart 2000”) and ran the script.
All of my XML/A invocations except one worked against Mondrian.

Apparently MSAS is slight more forgiving on some syntax issues (brackets, curly braces and such) but overall Mondrian can actually BE an XMLA provider to clients.

Very encouraging… Considering the economics of inexpensive Lintel servers, bitmap indices and partitioning, and the aggregate table feature in Mondrian: building a reasonable (ActiveInterface.com. Chris provides BI consulting services but also has a couple of cool XMLA clients.

PS – All of this because of investing in the XMLA standard.

Pentaho Data Integration (aka Kettle)

Pentaho added a key ingredient (no pun intended) of BI into their platform with the acquisition of Kettle. Unveiled at Linuxworld, Pentaho now has an exceptionally productive, effective, open source ETL (and EII) tool.

Watch this space over the next few weeks as I will be blogging about Kettle (in conjunction with other Pentaho technologies) and how it can be used to build full BI solutions.

What’s the net net?

Kettle is more like an ETL product than an open source project. It’s effective, feature rich (including support for facts and dimension tables), productive, and built by a talented consultant who builds DW/BI solutions on a daily basis.

It’s graphical, user friendly, and is 100% pure Java which makes it exceptionally flexible.

My first time with Kettle I had an “Oracle Table” to “Oracle Table” mapping with a filter in between up and running in LESS THAN 10 MINUTES. Matt Casters has done an excellent job and I’m thrilled he’s joining the Pentaho team!

There’s ZERO installation required, just download, unzip and begin. Download it today and email me: let me know what you think!

RIP Ingres

Well, after more than a month, the folks at Ingres came back to the community with ABSOLUTELY NO CHANGES in their licensing. Ingres is indeed “viral.”

So, the head fake is complete (ie, biz friendly CA-TOSL to viral GPL), and Ingres Corporation has charted it’s course to irrelevance. Again, this is just one opinionated blogger but I think Ingres won’t gain traction in new markets now (see previous post why GPL is bad for Ingres, not in general, just Ingres)

I’m sad. Ingres is (should I say was?) cool. RIP.