Author Archives: ngoodman

Microsoft doing good things with their money!

I’ll pay some praise the guerrilla from Redmond:

Brilliant and Hilarious shorts featuring Ricky Gervais of Office fame:

David Brent rules!

Great use of those profits!  🙂

Pentaho Linux .sh files

Small little tip:

The pentaho build process doesn’t currently manage the permissions on .sh files properly.  When you download the daily builds or other demo installations you may get some errors (bash command not founds, etc).  You need to change to executable all .sh files in the installation.  Use the following command in the “pentaho-demo” directory.

for x in `find . -name ‘*.sh’`; do chmod +x $x; done

Hope you find this helpful!

Open Source is agile

I’m not talking about the methodology in particular, I’m just saying compared to traditional software engineering practices with customer advisory boards vetting major features, rounds of marketing approvals of features, etc.

For instance, I submitted a Jira case to the Pentaho development staff for including a jar in our demo application need to run certain Pentaho Data Integration mappings.  In 20 hrs the jar had been included (already vetted for license since it’s part of another project) and is now part of the daily builds.  This is the oil that makes the open source machine great; ability for software (Pentaho as a project) to respond to real customer needs (from me).  It’s awesome!

Now that reminds me, I hadn’t highlighted some of the cool new “open source — eee” things at Pentaho yet:

  • Public Issue/Feature Roadmap:
    We have launched Jira as a place to track new feature requests, bug submissions, etc.  I greatly encourage you to register and begin using it to submit bugs / suggestions.  Can’t always say they’ll get fixed in 20 hours but they have a MUCH GREATER chance of being fixed if they’re in Jira in addition to the forums.
  • Public Source Control:
    While we’ve always published our source with every release that source repository wasn’t available to anyone on an anonymous basis.  We’re hosting a subversion now that allows easier access and contribution from our always valued community.  Consider this an open invitation to dig in, build a cool plugin, etc.

I’m glad these two things have happened; I think it just makes communication easier, effective, and more transparent.  What do you think?

Finally, not in lame-oh music devoid desktop

I’ve recently made the switch to Linux as many of you have read my previous blogs on the matter. 

One of the things that I missed dearly, but was not a critical priority, was getting streaming MP3 (shoutcast) on my headphones.  Too many higher priority things on my plate, but I finally got XMMS and the MP3 codecs.  What a pain those pesky patents have caused for end users like me. 

977 the Kickin Country Channel never sounded so good!

Windows never looked so GOOD!

In my last blog entry I was clear: Windows had crashed on me for the last time. I was through with the operating system from Redmond…

Except…

It’s a Microsoft world and I’m pragmatic enough to understand that there are simply SOME things that can not be done from Linux (device drivers for my all in one printer/scanner/fax are non existent for example). VMWare is invaluable in this regard and while I’ve raved about it before, I’ll say it again. It’s about the best 150 USD you can spend if you’re a developer.

So… Here’s how I’m using Windows that suits me just fine because it’s a) in VMWare so i only fire it up when need be and b) I’m using XGL and even Windows looks cool on the side of a 3D cube desktop.

Windows Looks Good

Last time Windows crashes on me

End of last week, Windows was kind enough to give me the annual “Blue Screen of Somehow I Screwed Up My Own Internals I Hope You Weren’t Doing Any Real Important Work Because You’ll Have to Reinstall the Operating System of Death.”  Gasp.

We’ve all been there.  What really bugged me is that when it happened, I sighed and just thought to myself that this is “the price of computing.”  This had become normal and acceptable to me… Then I shook myself a bit and became determined to rid myself, as much as possible, of the OS from Redmond.  No offense; I love Excel, think there’s some great usability in there, but it’s just not my cup of tea.

Eventually I’ll end up with a Macbook Pro; I feel the call of the siren as much as anybody.  Until then, I’m on Suse 10.1 desktop and so far I’m quite pleased. 

I’ll blog again later along on the specifics of the setup, but I’ll just say that the XGL desktop is both wicked COOL and very functional.

Donating to Open Source: Gratitude

A while back I blogged about gratitude and generosity, which was mostly about how it made ME feel when I was experiencing those feelings in times of change and growth.  What’s the flip side of that coin, or the other end of that stick, or whatever metaphor you want to use?  How does expressing gratitude to others for what they do feel?

Apparently pretty good; or at least good enough to respond with some very kind, personal notes of thanks.  A few weeks back I realized that I use two open source projects that provide exceptional products.  Truly, they’ve transcended the open source motto of "the code is the documentation and RTFM if there were one" and have created wonderful, easy to use products.  I realized that I had not given these people anything in return (I never encountered any bugs/etc to submit patches for!).

I donated, via their website instructions, to a CYGWIN developer and Gallery.  I received personal notes of thanks, expressing real gratitude.  It wasn’t for the money either (I donated 25 USD to each developer) but more of recongition of their contribution.  I get this.  If someone (me) is willing to pay someone whom they’ve never met before, willing to seek out the method (donation pages and paypal hoops), part with real money, while they’re under absolutely NO OBLIGATION or expectation to means that I think they did a great job.

Well they have! 

Have you ever considered donating to an open source project?  What open source projects do you get value from?  Consider dropping them $20 and see how good it makes them AND you feel!  I bet you’ll feel better giving $20 to the Apache foundation than paying your next enterprise software bill.

Kettle and Pentaho: 1+1=3

Like all great open source products, Pentaho Data Integration (Kettle) is a functional product in and of itself.  It has a very productive UI and delivers exceptional value as a tool in and of itself.  Most pieces of the Pentaho platform reflect a desire to keep the large communities around the original projects (Mondrian, JFree, etc) engaged; they are complete components in and of themselves.

When used together their value, as it relates to building solutions increases and exceeds their use independently.  I’ll be the first to admit that Pentaho is still fairly technical, but we’re rapidly building more and more graphical interfaces and usability features on top of the platform (many in the open source edition, but much is in the professional edition).  Much of this work involves making the "whole" (Pentaho)  work together to exceed the value of the pieces (Mondrian, Kettle, JFree, …).

A few things immediately come to mind of why Pentaho and Kettle together provide exceptional value as compared to used individually or with another open source reporting library:

  1. Pentaho abstracts data access (optionally) from report generation which gives report developers the full POWER of Kettle for building reports.

    There are some things that are tough, if not downright impossible to do in SQL.  Ever do an HTTP retrieval of an XML doc, slurp in a custom lookup from Excel, do a few database joins and analytical calculations in a SQL statement?  I bet not.  Report developers are smart data dudes; having access to a tool that allows them to sort/pivot/group/aggregate/lookup/iterate/list goes on and on/etc empowers report developers in a way that a simple "JDBC" or "CSV" or "XQuery" alone can accomplish. 
    How is this made possible?
    Pentaho abstracts (optionally, it isn’t forced on customers) the data retrievals to lookup components.  This allows BI developers to use either a SQL lookup (DB), XQuery lookup(XML), MDXLookup (OLAP), or Kettle lookup (EII) to populate a "ResultSet."  Here’s the beauty; reports are generated off a result set instead of directly accessing the sources.  This means that a user can use the same reporting templates, framework, designer, etc and feed/calculate data from wherever they desire.  Truly opens a world of possibiliy where before there was "just SQL" or "ETL into DB tables."

  2. Ability to manage the entire solution in one place

    Pentaho has invested greatly in the idea of the solution being a set of "things" that make up your BI, reporting, DW solution.  This means you don’t have ETL in one repository, reports managed somewhere else, scheduling managed by a third party, etc.  It’s open source so that’s obviously a choice, but we can add much value by ensuring that someone who has to transform data, schedule that, email and monitor, secure, build reports, administer email bursting, etc can do some from one "solution repository." Managing an entire BI solution from one CVS repository?  Now that’s COOL (merge diff/patch anyone?).

  3. Configuration Management

    Kettle is quite flexible; the 2.3.0 release extends the scope and locations where you can use variable substitution.  From a practical standpoint this means that an entire Chef job can be parameterized and called from a Pentaho action sequence.  For instance, because you can do your DW load from inside Pentaho action sequences that means you can secure it, schedule it, monitor it, initiate it from an outside workflow via web service, etc.  In one of my recent Kettle solutions ALL OF THE PHYSICAL database, file, and security information was managed by Pentaho so the Kettle mappings can literally be moved from place to place and work inside of Pentaho. 

  4. Metadata and Additional Integration

    Pentaho is investing in making the tools more seamless.  In practice (this is not a roadmap or product direction statement) this means being able to interact with tables, connections, business views inside of Kettle in an identical (at least similar way) in the report designer.  For example, if you’ve defined the business name for a column to be "Actual Sales" Kettle and the Report Designer can now key off that same metadata and present a "consistent" view to the report/ETL developer instead of knowing that "ACT_SL_STD_CURR" is actual sales. 
    Another example is the plans to do some additional Mondrian/Kettle integration to make the building of Dimensions, Cubes, and Aggregates easier.

Free, Valuable, DW Wisdom from man in lilac suit

For those that don’t know the reference between lilac suit and Pete-s, just google it.  🙂  Doesn’t really matter though when compared with the great set of articles that my friend Pete-S is pumping out from the other side of the Atlantic.

Pete has in the trenches practical knowledge building BI and DW systems.  He’s both sharp, and practical (that’s rare you know!).  He’s running a series DW Wisdom and it’s some very useful content.

DW Wisdom

I like that Pete is comfortable enough with his own skills/abilities to question the "age old wisdoms" of DW.  Even if they are found to be true it’s good to see some real scientific "assumption breaking" to prove/disprove reality.

  • Use as many small disks as possible – a 1TB disk would be a bad idea for a system that inherently reads large volumes of data, everything would go through a single IO point.

    • Keep all the OLTP tables separate from the DW systems; OLTP has lots of small, fast transactions, DW has slower, big reads. DW loves bitmap indexes, OLTP hates them.

    • Use high degrees of parallel processing

But are these truths still valid?

DI Wisdom (2) – more of the physical

Comparison of OLTP vs DW(OLAP-esque):  Great reference table:

DI Wisdom (3) – departmental business

When people found that their transactional systems were unsuited for BI reporting (perhaps because of the performance impact of running BI on a transactional system, or the transactional system did not hold all of the data required for reporting) they started to look towards dedicated data warehouses.

DI Wisdom (4)

Enterprise DW moves away from the tactical departmental “point solutions” and into something that fits with strategic aspirations of the enterprise. On the face of it having a single solution across the enterprise as distinct advantages:

  • there is a single, consistent model of enterprise data

    • there is less duplication of data across the enterprise
    • it is possible to construct a security model such that the right people see all of the data that allows them to their jobs but not the information that is too sensitive for their job role
    • the origins of all of the business data can be traced back to source

In fact these aims are so laudable that they have been hijacked by other IT disciplines such as master data management, risk and compliance management, and business process reengineering.

DW Design (part 1)

I can’t agree with Pete more: A staging, 3NF warehouse, and then presentation layer (marts) I think is a very practical way to seperate concerns, and avoid tight coupling between source and reports.  A la Corporate Information Factory.

For a long while I have favoured a three section data warehouse design: a staging area where raw fact and reference data is validated for referential integrity, a third-normal form layer to hold the reference data and historical fact, and finally a presentation layer to hold denormalised reference data and aggregated fact. The staging layer is ‘private’ to the data warehouse but user query access (subject to business security rules) to other layers is permitted. In some cases it will not be possible to use a denormalised layer; but if you can use one, you should.

DW Design (2) – staging data

As mentioned yesterday, the staging area of the data warehouse has three functional uses:

  • It is the initial target for data loads from source systems

    • It validates the incoming data for integrity

    • It is the data source for information to be published the ‘user visible” layers of the data warehouse

Optionally, it may also be where the logic to transform incoming data is applied.

Great series Pete!  Now if only this were in a book that I could tell all my blog readers and colleagues to purchase!  🙂