Author Archives: ngoodman

Mark Rittman on Oracle BI Suite EE

Mark Rittman has worked up an excellent article on the new Oracle BI Landscape. He basically covers the whole range of products and technologies and how they’re being repackaged and rebranded into a new product set. Great read.
I thought the following was of interest:

The Analytic Server is effectively a ROLAP (Relational OLAP) engine, that connects to data sources but does not have any local storage in the traditional, database sense, although it does have a cache which holds copies of data previously retrieved, so that subsequent requests for the same data are returned faster.

What struck me about this statement is that from a technology perspective this is nearly identical to Mondrian‘s (Open Source OLAP engine) architecture. While proprietary vendors (Oracle included) had been touting the benefits of a true MOLAP solution instead of just ROLAP. However, with the processing headed to mid tier servers (instead of OLAP crunching happening on a Pentium I desktop) one can leverage the mid tier for caching, and analytical processing (MOLAP-esque stuff). In other words, you can get MOST of the benefits of MOLAP with a well designed and performant ROLAP server proactively caching/crunching on your behalf! Cool stuff!

UPSERTS using Kettle/MySQL

7 Replies

Oracle Readers: Think MERGE Statement. 🙂

It’s common to need to generate surrogate keys as part of a Data Warehouse / Business Intelligence solution. Protecting your users from souce system changes, or using time variant data are just a couple of the reasons why one needs to generate a surrogate key when loading data into a DW.

Here is a simple example. I’m loading Product data from an XML file into a MySQL table. The transformation in Kettle looks like this:

The output coming out of the XML file has been flattened using XQuery so that it looks like a flat table struture. Here’s a screen capture of a nifty feature of Kettle where you can preview the output of a step:

Ok, this is our source data. The ID you’re seeing is the OLTP key. We’ll refer to it as the Natural Key. This is the ORDER_ID, or CUSTOMER_ID, or well, you get the idea. This is the key that your source application uses in it’s relational structures.

What we want is to INSERT or UPDATE (aka UPSERT) this record into our DW. We’ll INSERT if we don’t already have this NATURAL key in the DW and generate a surrogate primary key. We’ll UPDATE if we already have this NATURAL key in the warehouse.
Our table looks like:

The DDL to create the TABLE is important:

CREATE TABLE INT_PRODUCT (
PRODUCT_ID INTEGER NOT NULL AUTO_INCREMENT,
PRODUCT_NAT_ID INTEGER NOT NULL,
TITLE TINYTEXT,
CONSTRAINT INT_PRODUCT_PK PRIMARY KEY (PRODUCT_ID),
UNIQUE INT_PRODUCT_NK (PRODUCT_NAT_ID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Notice that our PRIMARY KEY is the Surrogate we’ve set up to be an AUTO_INCREMENTING column. This means that MySQL will generate a surrogate when we INSERT a record. All we have to do is get Kettle to match our unique contrainst (INT_PRODUCT_NK) and either INSERT/UPDATE.

I’ve used the INSERT / UPDATE operator in Kettle. The configuration looks like this:

What we’re doing is configuring Kettle to do a lookup on the table to see if the natural key already exists. If it does, we’ll perform DML (insert or update) on the fields in the lower window.

After running the transformation my INT_PRODUCT table looks like this.

Now, I realized I didn’t choose very illustrative natural keys (all are 1,2,3) but the PRODUCT_ID comes from MySQL as a surrogate.

Subsequent executions of this Kettle transformation will INSERT records as they arrive or UPDATE. MySQL and Kettle in combination are making quick work out of generating these surrogates!

I invite Kettle experts to chime in on any optimizations/quirks.

Connect to your own Database

64 Replies

I get asked frequently by people who evaluate Pentaho: How do I get started? Many people (tens of thousands on a monthly basis) download the Pre Configured application and immediately get a sense for the benefits of the Pentaho open source project.

What people almost always want to do next, is get started with their OWN data. You the reader of this blog might find a pre canned demo interesting, but you’ll want to reserve judgement on the leading open source BI Project until you’ve seen your own data flowing into beautiful PDFs/Excel/OLAP Pivot Tables. I totally understand…

This is a very SIMPLE, BASIC, LIGHTWEIGHT guide on how to get the downloadable DEMO connected to YOUR DATABASE and executing YOUR SQL Query using the lynchpin of the Pentaho platform, an Action Sequence (refer to this blog for more on what Action Sequences are).

Disclaimer: Pentaho has development tools that once installed makes the development of action sequences more productive then editing xml as below. In other words, this is just rough and dirty to get you going and connect… I highly suggest downloading the Development Workbench to actually build the Action Sequences.

Step 1:
Collect the following information for your database.

Vendor (Oracle/MySQL) provided JDBC Jar file (classes12.zip, mysql-connector-java-3.1.12-bin.jar)
JDBC Connection URL (jdbc:mysql://localhost/database)
JDBC User Name (scott)
JDBC Password (tiger)
A SQL query that you are certain works in your database (select id, title from status)

This is the information needed for ANY Java Application to connect to a database, and is not really Pentaho specific.

Step 2:
Create (using notepad or vi) a file named myfirst-ds.xml in $DEMO_BASE/jboss/server/default/deploy/

Put the following into that file to create a JNDI datasource with the JNDI-NAME “myfirst”

Note: You need to replace the above items with the corresponding JDBC information you collected in the first step.

Step 3:
Edit (using notepad or vi) the file $DEMO_BASE/jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml.
Add the following right below the Shark Connection entry

Step 4:
Edit (using notepad or vi) the file $DEMO_BASE/jboss/server/default/deploy/pentaho.war/WEB-INF/jboss-web.xml.
Add the following right below the Shark entry

Step 5:
Copy your Vendor provided jdbc driver library (classes12.zip) in the following directory:
$DEMO_BASE/jboss/server/default/lib

Congratulations, you’ve setup your JNDI datasource! Whew…. Almost there.

Step 6:
Let’s create your first action sequence!
Create (using notepad or vi) a file named MyFirst.xaction in $DEMO_BASE/pentaho-solutions/samples/datasources

Make sure and replace the select I have above with a select statement that you KNOW works in your database.

Step 7:
Restart your server using the stop-pentaho/start-pentaho commands in $DEMO_BASE.

Step 8:
Open up Firefox (cough IE) and hit the following URL:
http://localhost:8080/pentaho/ViewAction?&solution=samples&path=datasources&action=MyFirst.xaction

You should see the basic output of the Action Sequence which, in this case, is the results of the SQL you provided.

id	title
Action Successful

1	Open

That’s it! You’re connected to your own datasource and now you can start to explore the immense power of Action Sequences with your data… There are approximately 50 example .xactions in the demo so use them as a reference when trying to exploring the features of the platform. I HIGHLY SUGGEST downloading the workbench as well. It’s a visual environment for authoring .xaction files. Trust me, you’ll appreciate the GUI!

Move to WordPress

3 Replies

I’ve made the switch from Movable Type to WordPress. I’ve used Movable Type for more than two years now, quite happily actually, but thought it high time to move to WordPress. I’m rather impressed with WordPress thus far; very competitive product all in all. Some things in the installation were even EASIER than MT.

I turned on comments/trackbacks on my MT blog approximately 7 weeks ago. In that 7 weeks, I had received more than 2,000 spam comments, trackbacks, etc. I plan to use a spam service on this blog to prevent such an abuse again.

If you are encountering any feed or viewing issues please do let me know.

Oracle: We'll always have Paris

AM I WRONG ON INGRES?

Open Source OLAP : Standards DO WORK!

1 Reply

A while back I had an impromptu collaboration with Chris Harrington to see if any of his XML/A clients could interoperate with Mondrian. Our initial results were encouraging and Chris has taken some time to put Mondrian to task on his XML/A compatibility tests.

His results, well, I’ll let them speak for themselve:

I have a cmd file full of xmla.wsf invocations to test many kinds of Discover calls and a few Execute calls. So I had this as a ready-made compatability test. I just changed the XML/A URL and the catalog name (“FoodMart” instead of “FoodMart 2000”) and ran the script.
All of my XML/A invocations except one worked against Mondrian.

Apparently MSAS is slight more forgiving on some syntax issues (brackets, curly braces and such) but overall Mondrian can actually BE an XMLA provider to clients.

Very encouraging… Considering the economics of inexpensive Lintel servers, bitmap indices and partitioning, and the aggregate table feature in Mondrian: building a reasonable (ActiveInterface.com. Chris provides BI consulting services but also has a couple of cool XMLA clients.

PS – All of this because of investing in the XMLA standard.

Pentaho Data Integration (aka Kettle)

1 Reply

Pentaho added a key ingredient (no pun intended) of BI into their platform with the acquisition of Kettle. Unveiled at Linuxworld, Pentaho now has an exceptionally productive, effective, open source ETL (and EII) tool.

Watch this space over the next few weeks as I will be blogging about Kettle (in conjunction with other Pentaho technologies) and how it can be used to build full BI solutions.

What’s the net net?

Kettle is more like an ETL product than an open source project. It’s effective, feature rich (including support for facts and dimension tables), productive, and built by a talented consultant who builds DW/BI solutions on a daily basis.

It’s graphical, user friendly, and is 100% pure Java which makes it exceptionally flexible.

My first time with Kettle I had an “Oracle Table” to “Oracle Table” mapping with a filter in between up and running in LESS THAN 10 MINUTES. Matt Casters has done an excellent job and I’m thrilled he’s joining the Pentaho team!

There’s ZERO installation required, just download, unzip and begin. Download it today and email me: let me know what you think!

01:02:03 04/05/06 is this week

1 Reply

Picked up on this one from Brad Felds blog (VC in beautiful Boulder, home to my alma mater).

This Wednesday (or Tuesday in Europe) will have a sequential read on the clock.

Goodman on BI

Thoughts on Open Source, Analytics

Author Archives: ngoodman

Mark Rittman on Oracle BI Suite EE

UPSERTS using Kettle/MySQL

Connect to your own Database

Move to WordPress

Oracle: We'll always have Paris

AM I WRONG ON INGRES?

Open Source OLAP : Standards DO WORK!

Pentaho Data Integration (aka Kettle)

01:02:03 04/05/06 is this week

RIP Ingres