Every once in a while, I get to sound like a royal arse in front of a customer by saying something “I know” to be true about Pentaho that isn’t. Usually, this is a REALLY good thing because it’s usually some limitation, or Gotcha that existed in the product that has magically disappeared with the latest release. The danger of open source is that these things can change underneath you quickly, without any official fan fare and leave you looking like a total dolt at a customer site. Bad for consultants like me who are constantly having to keep up with extraordinarily fast product development. Good for customers because they get extraordinarily fast product development.
One of these experiences, which I was absolutely THRILLED to look like a dolt about, was
“If you use variables for database connection information, the password will be clear text in kettle.properties.”
A huge issue for many security conscious institutions. Customers were faced with a choice: use variables which centrally manages the connection information to a database (good thing) but then the password is clear text (bad thing). No longer!
Our good friend Sven quietly committed this little gem nearly 18 months ago. It’s been in the product since 3.0.2! It allows encrypted variables to be decrypted in the password field for database connections.
Let’s test it out… our goal here is to make sure we can get a string “Encrypted jasiodfjasodifjaosdifjaodfj” which is a simple encrypted version of the password to be set as a regular ole variable but then be used as the “password” of a database connection.
We have a transformation that will set the variables, and then we’ll use that variable in the next transformation.
The first one sets the variable ${ENCRYPTED_PASSWORD} from a text file. This string would be “lifted” from a .ktr after having been saved that represents the encrypted password.
Then we use it in the next transformation and select from a database, and outputs the list of tables in the database to a text file.
Output – works like a charm!
Customers can now have the best of both worlds. Centralize their variables for host/user/password using variables (including, kettle.properties) and keep those passwords away from casual hackers. I say casual because PDI is open source so in order for someone to decrypted a password they only need know Java, and know where to find PDI SVN. 🙂
As always, example attached: encrypted_variables.zip
So, by “security concious institution”, you meant “people clueless about how to read a lightly obfuscated field”?
OSMA – that’s precisely what I meant. 🙂 It’s meant to keep honest people honest not secure the system.
If you want to really secure the connections, use JNDI so that Kettle knows NOTHING about the passwords. We simply ask for a connection and someone else sorts out that stuff. I’ve also seen one customer write their own custom .jars to handle password management and get the password in a bootstrapping xform (similar to the first xform above) so that the password is only ever stored in memory at runtime.
There are ways to *actually* secure it but the above isn’t one of ’em.