Author Archives: ngoodman

I think Exhibit B was the elephant in the room!

Over the past few of weeks, the debate on badgeware has percolated to a real public debate on if they meet the OSD. 

Great!  Finally… A real debate on whether or not “badgeware” is open source!
PS – I just picked up the term badgeware.  I like that better than my slightly one side “forced UI attribution.”

New open source project: OWBScripts

I hadn’t had a chance to post yet, but Mark made mention of it on his blog so I figure it’s about time to post about it.

OMB is the TCL based scripting language that comes with Oracle Warehouse Builder that allows you to do OWB “things” programatically (ie, without the GUI).  It is very useful for doing ETL generation, mass updates, deploying mappings, etc.  Basically, anything that you are doing repetitively is a good candidate for making into an OMB script.  OMB is a cure for “tennis elbow” from clicking hours on end in the OWB GUI.

I’ve released a handful of OMB scripts that I used on consulting gigs, presentations, articles, etc.  There is nothing spectacular here, but hey, they’re not doing me any good!  If just one or two people find them useful it was worth the time to slap the Apache 2.0 license and upload them to http://sourceforge.net.

The release (initial and only unless someone else out there wishes to take on the management/augementation) includes scripts to:

a) Generate base SOURCE to STAGING Truncate/Staging mappings and tables.
b) Generate base STAGING to WAREHOUSE Insert/Update mappings, tables, and sequences.
c) Install repository and the standard CIF targets (Staging, Warehouse, AreaMart).

Let me know what you think and I do hope someone, somewhere finds it useful!
PS – I haven’t used OWB for nearly 9 months.  For something I used day in day out for YEARS that’s a long time to have not even touched it!

Turn Pentaho demo into a "server"

The standard Pentaho demo download is super quick and easy: there’s no installation and it just works.  You double click start-pentaho.bat and then it’s running in http://localhost:8080.

However, sometimes you may want to share this demo with others.  Roland Bouman has a nice blog entry on the specifics of how to change the demo install into a server. 

I add the following line to my start-pentaho.sh to make the hostname changing transparent. 

sed -i -e “s/http:\/\/.*:8080/http:\/\/`hostname -f`:8080/” jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml

This allows one to move this “pentaho” to any system and it will startup properly with the http://actualhostname.company.com:8080 instead of http://localhost:8080.  I download a new build of Pentaho on about a weekly basis in addition to preparing virtual machines and zipped installs for customers, partners.  This little shortcut is an absolute must for me; it doesn’t make sense in the actual code release for a variety of reasons.

Perhaps someone else will find this little tidbit useful!  Enjoy!

Command line ETL Job Execution

I know this might seem pretty obivous to those that use Kettle frequently, but there’s a VERY easy way to execute Kettle jobs at the command line. Kitchen is the command line interface and is quite convenient for executing that ETL job you’ve built. Crontab anyone?

kitchen.sh -file=/mnt/pentaho-professional/pentaho-solutions/software-quality/data/etl/jira_do_everything.kjb
-OR-
kitchen.bat -file=c:\dir\jira_do_everything.kjb
kitchen.bat /file:”c:\dir\jira_do_everything.kjb” (from comments below, thanks!!!)

Does anyone use kitchen or pan and have any best practices or suggestions to offer?

On bad things happening to good people

My friend Mark Rittman recently lost his entire library of 700 blogs/articles/etc.  He’s handling it with SOOO much grace; testament to him as a gentlemen and all around great guy.  I know I personally would be furious, bitter, and livid (at least for a few weeks).

The worst part about the whole deal, the hosting company is unapologetic enough to state:

Customers who have their own backups will be able to restore their own
data. Our terms and conditions advise customers to have their own
backups in case there is a catastrophic loss. This is the first time we
have suffered such a loss.

Mark, I’m soooo sorry.  Here’s hoping you get some of it back (BI Blogs, OraBlogs, etc). 

Pentaho tops 4.4 Million USD in open source code

Startups are interesting: Some days you love your job, other days you want to throw yourself out the window.  Yesterday I hated my job, for a variety of reasons.  One of the things that cheers me up when I’m in the midst of some tough stuff is looking at the big picture.

To date, Pentaho has built more than 313,981 lines of open source code.  It’s an estimated 81 person years.  At 55,000 USD / year for a developer that roughly equates to about 4,400,000 USD of “code” built and released under a business friendly, OSI approved, open source license (MPL).

WOW!

We put the vast majority of our stuff into the open source project (more than 80%); it’s a complete product in and of itself and that’s something I personally am proud of.  I’ve added the “OHLOH” badge for Pentaho to the upper right hand corner so there’s a ticker on this page to keep track of the breadth, size, and investment in the open source edition of Pentaho:

Incidentally, the metrics are calculated by a very cool upstart ohloh.  They slurp data from source control systems and display cool metrics about projects, like ours.  Check them out!

Open Source has a little secret: Exhibit B

UPDATE: I’ve submitted the WPL to OSI for approval. It’s a proxy for the Exhibit B licenses used by the companies listed below. We’ll find out soon enough if the OSI believes Exhibit B meets OSD.
UPDATE: OSI has refused to approve the WPL; not because it has actually been vetted to OSD but because the OSI does not want to consider a license from anyone except the original license author. That means, like now, the FOSS community won’t know if Exhibit B companies are actually releasing open source.
UPDATE: Someone told me that Mulesource is also using this. Added them to the list below.
UPDATE: It’s like a bad dream. Dimdim doesn’t care about “Open Source” either. Added them to the list below.
First of all, and no tongue in cheek, I’d like to attribute this as a follow on and continuation of the debate on attribution licenses started on “AC/OS.” In that spirit, I hope this is a “distribution” of that debate, and not a “fork.”

Second, I have nothing but respect for the principals at the companies that are using Exhibit B. I take issue with the substance of their license; nothing more. From my use/evaluation/understanding their products are excellent. I have a sincere desire for them to be successful, just believe they need to do it adhering to the same principles adhered to by the majority of the open source community.

What is Exhibit B?

It’s a clause appended to the Mozilla Public license by some open source startups (listed below). In this blog we’ll consider a fictituous version of this license from “WhizbangAppCompany.”

What Exactly is Exhibit B?

It’s the second clause of a two clause addition to the Mozilla Public License that basically states:
a) You must include on each UI screen a tagline or logo reading “Powered by WHIZBANGAPPCOMPANY.”
b) You have no right to use the trademark WHIZBANGAPPCOMPANY even if it’s included in the UI.

Here’s the Exhibit B text from our WPL (this is just a copy and replace on actual Exhibit Bs).

I’ve copied and pasted one here for reference:

WhizbangAppCompany Public License 1.0 – Exhibit B

Additional Terms applicable to the WhizbangAppCompany Public License.

I. Effect.

These additional terms described in this WhizbangAppCompany Public License – Additional Terms shall apply to the Covered Code under this License.

II. WhizbangAppCompany and logo.

This License does not grant any rights to use the trademarks “WhizbangAppCompany” and the “WhizbangAppCompany” logos even if such marks are included in the Original Code or Modifications.

However, in addition to the other notice obligations, all copies of the Covered Code in Executable and Source Code form distributed must, as a form of attribution of the original author, include on each user interface screen (i) the WhizbangAppCompany Community” logo, (ii) the vendor disclaimer “Supplied free of charge with no support, no certification, no maintenance, no warranty and no indemnity by WhizbangAppCompany or its certified partners. Click here for support. And certified Versions” and (iii) the copyright notice in the same form as the latest version of the Covered Code distributed by WhizbangAppCompany at the time of distribution of such copy. In addition, the “WhizbangAppCompany Community” logo and vendor disclaimer must be visible to all users and be located at the very bottom left of each user interface screen. Notwithstanding the above, the dimensions of the ” WhizbangAppCompany Community “ logo must be at least 176 x 26 pixels. When users click on the “WhizbangAppCompany Community ” logo it must direct them back to http://www.whizbangappcompany.com. When users click on the vendor disclaimer it must direct them to http://www.whizbangappcompany.com In addition, the copyright notice must remain visible to all users at all times at the bottom of the user interface screen. When users click on the copyright notice, it must direct them back to http://www.whizbangappcompany.com.

What does that actually mean?

There’s a lot of implications… Suffice to say it means A LOT because it’s the difference between meeting the definition of Open Source (OSI approved) and not meeting the defintion of Open Source (not OSI approved). The Exhibit B license is being evaluated currently but the determination of whether these companies are actually releasing open source code is in question.

Implication One: What the fork?!?

A long term “litmus” test for user rights re: open source, is to not be bound to one company or organization. Open Source must be able to fork, even though it’s often undesirable.

I’ll reiterate a scenario I posted on a scenario with how this could turn out really bad for customers “thinking” they have the benefit of open source when they implement and purchase services:

2007 – WhizbangAppCompany (company and project) flourishes. Acquires 1000 customers on the premise of Open Source.
2007 – WhizbangAppCompany (company) bought by big mean company where products go to die.
2008 – WhizbangAppCompany users and customers are unhappy. Partners, users, developers, customers are relieved they are using “open source.”
2008 – Coalition of users, customers, and developers “fork” and a new company is formed “EmailRulez”
2008 – EmailRulez screwed, customers LOCKED IN. Can’t remove references to WhizbangAppCompany (can’t remove from UI), but are threatened by large big mean company for Trademark infringement for distributing a product with WhizbangAppCompany trademark.

Probably the primary reason this doesn’t meet the open source definition is that a royalty or other fee (trademark) can be enforced by anyone who uses, or distributes this product. Would these companies actually do this? Probably not, but they CAN. Exhibit B was conceived (in part) to prevent a fork; damned if you do (break license to remove trademarks), damned if you don’t (use a Trademark you don’t have a license for).

Only an attorney could put those two terms (can’t remove trademark, and you can’t use trademark) next to each other and take them seriously. No offense to attorneys who would recognize these two opposing stipulations and ring the “common sense” bell.

Implication Two: I’ve got that Exhibit B thing going around.

Exhibit B is MORE VIRAL than GPL. This has profound implications. Take for instance, a scenario, again, outlined on the original blog:

WhizbangAppCompany code is used as an integration/data transport engine in another open source project that does data profiling (data quality). WhizbangAppCompany consists of approximately 5% of the code of that project. According to the LICENSE it matters NOT anything about intentions (which are tough to put into a license anyway; consider long debate on derivative work). This project now has to SLAP WhizbangAppCompany on every UI on every screen. Now this data quality project must use WhizbangAppCompany trademark and has no use to the trademark.

Consider the implications: No matter what proportion of code you use, whether or not you even USE the projects UI code (perhaps you used one of their libraries), you are now OBLIGED to place their “Powered by WhizbangAppCompany” on every UI screen in your application. You may not have to release your source (GPL) but now every product/project/mashup/integration/etc must have on EACH UI SCREEN the attribution.

Implication Three: Swing and a miss!

Right or wrong, this doesn’t even close the ASP Loophole.
The ASP loophole has long been discussed; smart web 2.0 and web companies use open source and benefit immensely, but don’t trigger GPL and force them to contribute back. Ok, fine. It’s ramifications are, I believe still being determined as part of GPLv3 (fact check, can someone add clarity to this?).

Developers have long opined about how they want the Googles and Yahoos and Web 2.0 companies using and modifying their code to contribute that code back. Exhibit B forces those companies to place a trademark on their screen but STILL DOESN’T FORCE THEM TO RELEASE THE CODE. These companies are taking a dig at ASPs to get the code but don’t actually “write something” to get the code; just money for trademark licensing.

Implication Four: What happened to that “freedom” stuff?

Customers don’t have freedoms to make the code their own. In a good old fashioned, behind the firewall, building an intranet, and mashing up 5 open source projects to build an internal “Asset Tracking System” or a “Conference Room Scheduling” system.

Again, a scenario outlined on the original blog:

Joe just implemented the “community” version of WhizbangAppCompany. His managers invested 6 months of his time to build out this project, and he’s ready to go roll it into the corporate intranet. The corporate intranet, which this product will be embedded has it’s own UI. Joe has to remove the trademarks to “deploy” his application but…. Joe can’t deploy to his portal/intranet without getting code under a commercial license.

Exhibit B doesn’t “trigger” on some sort of distribution clause; it’s ALWAYS there. Is everyone listening? Customers: end using, not making any money off selling any services/products, good ole fashioned support yourself community customers, are violating the license if they do not place the Attribution on EACH UI screen in their application.

The pragmatist in me knows these companies wouldn’t enforce this; communities are their lifeblood. I’m just saying that according to the LICENSE, if Joe doesn’t put Powered by WhizbangAppCompany on every UI screen on his portal application he’s violated the LICENSE.

Common enterprise intranets, portals, and applications are aggregations of several 10s if not 100s of open source projects. XML Parsers, security implementations, regex libraries, jsp libraries, etc etc. It’s part of how we work; do something defined, do it well, and place nicely with others. Where would open source be if all these companies and individuals believed they were special enough to get attribution on the UI? This is a little dramatic but it makes the point:

(speaking of attribution, this is a Web 2.0 logos image not FOSS logos done by stablio-boss. View more of his work here)

If all those that came *before* (apache, xerces, hibernate, jboss, etc etc) these companies believed they needed UI attribution OR if OSI allows this UI attribution this screen COULD ACTUALLY BE REQUIRED. What happens when you allow people to dictate the use of this so called “free software?” You lose some of that magic “freedom ingredient,” yes?

Implication Five: Errr…. Big difference!

Calling it Mozilla causes a HUGE credibility gap. Learning the many open source licenses is tough; the reason we reuse licenses is so that we can quickly understand implications. There’s a big difference of projects selected (and companies with services supporting those projects) based on their license. GPL, Apache, BSD, LGPL, Mozilla. All known quantities, vetted by pundits, attorneys, industry. Claiming to be this, when you’re not is dressing a wolf in a lambs clothing.

Customers don’t know the actual bits arrive with Exhibit B when the advertisements at Sourceforge explictly say Mozilla Public License 1.1 (NOTE: there is an option for “Custom License” so it’s not because they couldn’t pick another license type they CHOSE To say they are Mozilla.).

Remedy:

Work within the open source framework instead of “protecting your IP” with a cleverly disguised license. Apache, Eclipse, IBM, HP, Redhat, Oracle, JBoss, Sun have much varied stakes in Open Source; they’ve found licenses that meet the definition of open source.

Remember the vision and value you sell to customers: it’s not about the software license, the bits. It’s about the value, innovation, and service that comes from it. Protect your brand, not your IP or code. Each of these companies is a clear leader in the space they’re building; embrace that. You can be the Redhat of whatever. Redhat does just fine, even with CentOS and WhiteBox and all the other variants.

It’s not the BITs that matter. Get over it.

Users of Exhibit B companies (listed below)

  • Read the license, interpret yourself.
  • Better yet, since it’s not MPL and isn’t “well known” have your attorneys review it.
  • Ask your organization if it is willing to accept the “Powered by XYZ” on every application which uses any portion of that code (or pay $$ to remove it).
  • Ask your organization if it is willing to accept a license that is not OSI certified and consider it open source.
  • Ask the COMPANY: Why use a license that doesn’t meet the definition of OS? Why isn’t your license OSI certified?
  • Ask the COMPANY: Why not use regular MPL?
  • Ask the COMPANY: Why they think they need to change the definition of OS for their business?
  • Ask the COMPANY to use an OSI approved license.
  • Tell the OSI you are concerned about the implications of Exhibit B on open source.
  • Tell your friends… They may not know.

What’s the Net Net:

Assuming these companies drop their Exhibit B’s and become OSI certified I should say we all should applaud them for being responsible, open source community members, and valuable economic factors in our movement. Until then we should be honest and say they are not open source; community source, shared source, available source, public source, whatevernameyouwant source. Call a spade a spade, and there are definitions for that reason.

References:
OSI = Open Source Institute http://www.opensource.org
OSD = Open Source Definition http://www.opensource.org/docs/definition.php
MPL = Mozilla Public License http://www.opensource.org/licenses/mozilla1.1.php
WPL = wget -O – http://dev.alfresco.com/legal/licensing/apl.txt | sed -e ‘s/[Aa]lfresco/WhizbangApplicationCompany/g’ > wpl.txt

Exhibit B companies and Exhibit B’s:
Alfresco, SugarCRM, Zimbra, Jitterbit, MuleSource, DimDim

DISCLAIMER: These words are ENTIRELY my own. They in no way reflect my employers beliefs or in any way should be construed to speak for them in any way!

Passionate Career Change

One of my professional mentors, and my “boss” through my time at Matchlogic, Inc. recently took a leap from Software Development to Solar Energy.  Steve is an exceptional architect, developer, and all around skilled software engineer.  He’s built systems that are exceptional functional and well designed. 

While the Java world will mourn the loss of an exceptional technologist, the Solar Energy industry will benefit greatly from his talent.  I know that Steve will be successful in his new business; he’s smart, capable, and more than anything else he’s passionate about Solar Energy.

Check out his blog if you’re interested in Solar Energy.

Trend Lines in Mondrian

I often mouth off on the importance and power of getting your data into a star-schema and Mondrian; the power you have to respond to time variant and analytic needs of your users is immense.  In the next few weeks I’ll cover more about these powers in a more concrete form, showing specific examples instead of just alluding to them.

Starting with a relatively straightforward implementation of a Trend Line.  Traditionally a trend line is built using a good old fashion linear regression on a set of data and then used to calculated current and future X and Y coordinates.  This usually involves some knowledge about building the linear regression formula, and then calculating points based on it.  Fortunately for us, we can skip most of this tedious process and just use an MDX function, LinRegPoint, to sort out most of that difficulty and we’ll just enjoy a beautiful trend line on our graph.

Let’s start with the output, so it’s clear what we’re talking about:

The RED is the data set, and the BLUE is the trend line we’ve built using MDX.

The only thing you need to run through this tip is the Pentaho Demo download, available at http://www.pentaho.org/download/latest.  I used 1.2RC2 for this example, but it should work on versions more recent than that.  It’s zero installation and starts up with everything you need for this tip.

Start up pentaho (start-pentaho.bat) and hop into your web browser (http://localhost:8080).  Navigate to the “Samples” section and into “Steel Wheels.”  Steel Wheels is an example we’re shipping with the demo installation now which provides some great time variant data examples (needed to do interesting things with OLAP).  Steel Wheels data is the sample data provided by the BIRT folks at Eclipse, actually.

Navigate to the Analysis folder, and then to “01. Territory Analysis by Year.”  It doesn’t really matter which one, we just need to get into JPivot on our Steel Wheels cube. 

Click on the MDX button and paste the following MDX fragment to get a base “sales view” and hit Apply:

select {[Measures].[Sales]} ON COLUMNS,
  {[Time].[Months].Members} ON ROWS
from [SteelWheelsSales]
where [Customers].[All Customers]

You should get a result that looks like this:

Ok… Now it’s time to build our Calculated Member.  This is kind of hairy: it requires some technical prowess to get the MDX calculation correct.  Just remember, once you’ve got the calcuation working properly you can include it as part of the Cube so your business users (using JPivot or Pentaho Spreadsheet Services) don’t see that complexity.

We’re going to use an MDX function, named LinRegPoint (reference link). I think the best online tutorial was done by Mosha Pasumanksy in his blog entitled “Using Linear Regression MDX functions for forecasting”  I used his tutorial to help build the regression below!  I won’t get into the details of linear regressions; you can read the reference or do some other googling for Linear Regressions. 

Basically, you rank Time to get straight numbers (X coordinates: 1,2,3,4,5), use your measure Sales as your value to regress (Y coordinates: 129754, 140836, …) and then you get it the ranked time as INPUT to your Linear Regression (which time is this) and it CALCULATES the Y output based on the Linear Regression it’s built.

Our LinRegPoint MDX formula comes down to:

LinRegPoint(
  Rank(
     [Time].CurrentMember,
     [Time].CurrentMember.Level.Members),
  {[Time].CurrentMember.Level.Members},
   [Measures].[Sales],
   Rank(
       [Time].CurrentMember,
       [Time].CurrentMember.Level.Members)
)

Enter the following MDX Fragment into the MDX editor to see the results of the Linear regression on Steel Wheels.

with member [Measures].[Line] as
‘LinRegPoint(Rank([Time].CurrentMember,
[Time].CurrentMember.Level.Members),
{[Time].CurrentMember.Level.Members}, [Measures].[Sales],
Rank([Time].CurrentMember, [Time].CurrentMember.Level.Members))’
select Crossjoin({[Markets].[All Markets]}, {[Measures].[Sales], [Measures].[Line]}) ON COLUMNS,
  {[Time].[Months].Members} ON ROWS
from [SteelWheelsSales]
where [Customers].[All Customers]

And you should see the following output:

Note: if you want to see the graph on the right, change the chart settings (icons at top of page) to be a Horizontal Line chart, Width = 300 and Height = 600.

The steel wheels only has data extending to [2005].[May].  If we had “time” members extending beyond our data set the line would extend to the future.  Careful; a simple linear regression is not best practice for doing forecasting on MANY things.  However, business users like to see the overall trend, and slope.

Was this helpful?  What would you like to see next?  Rolling Averages?  It’s VERY IMPORTANT to note that in most circumstances “MDX examples” for Microsoft Analysis Services works with Pentaho.  There’s a dirth BUNCH of articles about MDX on MSAS… That’s a wealth of tutorials that apply to your work with Pentaho.