DynamoDB (aka LucidDB) is not just another column store database. Our goal is being the best database for actually doing Business Intelligence; while that means being fast and handling large amounts of data there’s a lot of other things BI consultant/developers need. I’ll continue to post about some of the great BI features that DynamoDB has for the modern datasmiths.
First feature to cover that’s dead easy, is the built in ability to generate a time dimension, including a Fiscal Calendar attributes. If you’re using Mondrian (or come to that, your own custom SQL on a star schema) you need to have a time dimension. Time is the most important dimension! Every OLAP model I’ve ever built uses one! It something that you, as a datasmith will need to do with every project; that’s why we’ve built it right into our database.
Here’s a dead simple way to create a fully baked, ready to use Time Dimension to use with Mondrian.
-- Create a view that is our time dimension for 10 years, with our -- Fiscal calendar starting in March (3) create view dim_time as select * from table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3));
OK, that’s it. You’ve created a Time Dimension! * see NOTE at end of post.
So, we’ve created our time dimension, complete with a Fiscal calendar for 10 years in a single statement! Awesome – but what does it contain?
-- Structure of new time dimension select "TABLE_NAME", "COLUMN_NAME", "DATATYPE" from sys_root.dba_columns where table_name = 'DIM_TIME'; +-------------+---------------------------------+-----------+ | TABLE_NAME | COLUMN_NAME | DATATYPE | +-------------+---------------------------------+-----------+ | DIM_TIME | FISCAL_YEAR_END_DATE | DATE | | DIM_TIME | FISCAL_YEAR_START_DATE | DATE | | DIM_TIME | FISCAL_QUARTER_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | FISCAL_QUARTER_END_DATE | DATE | | DIM_TIME | FISCAL_QUARTER_START_DATE | DATE | | DIM_TIME | FISCAL_MONTH_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | FISCAL_MONTH_NUMBER_IN_QUARTER | INTEGER | | DIM_TIME | FISCAL_MONTH_END_DATE | DATE | | DIM_TIME | FISCAL_MONTH_START_DATE | DATE | | DIM_TIME | FISCAL_WEEK_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | FISCAL_WEEK_NUMBER_IN_QUARTER | INTEGER | | DIM_TIME | FISCAL_WEEK_NUMBER_IN_MONTH | INTEGER | | DIM_TIME | FISCAL_WEEK_END_DATE | DATE | | DIM_TIME | FISCAL_WEEK_START_DATE | DATE | | DIM_TIME | FISCAL_DAY_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | FISCAL_DAY_NUMBER_IN_QUARTER | INTEGER | | DIM_TIME | FISCAL_YEAR | INTEGER | | DIM_TIME | YEAR_END_DATE | DATE | | DIM_TIME | YEAR_START_DATE | DATE | | DIM_TIME | QUARTER_END_DATE | DATE | | DIM_TIME | QUARTER_START_DATE | DATE | | DIM_TIME | MONTH_END_DATE | DATE | | DIM_TIME | MONTH_START_DATE | DATE | | DIM_TIME | WEEK_END_DATE | DATE | | DIM_TIME | WEEK_START_DATE | DATE | | DIM_TIME | CALENDAR_QUARTER | VARCHAR | | DIM_TIME | YR | INTEGER | | DIM_TIME | QUARTER | INTEGER | | DIM_TIME | MONTH_NUMBER_OVERALL | INTEGER | | DIM_TIME | MONTH_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | MONTH_NUMBER_IN_QUARTER | INTEGER | | DIM_TIME | MONTH_NAME | VARCHAR | | DIM_TIME | WEEK_NUMBER_OVERALL | INTEGER | | DIM_TIME | WEEK_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | WEEK_NUMBER_IN_QUARTER | INTEGER | | DIM_TIME | WEEK_NUMBER_IN_MONTH | INTEGER | | DIM_TIME | DAY_FROM_JULIAN | INTEGER | | DIM_TIME | DAY_NUMBER_OVERALL | INTEGER | | DIM_TIME | DAY_NUMBER_IN_YEAR | INTEGER | | DIM_TIME | DAY_NUMBER_IN_QUARTER | INTEGER | | DIM_TIME | DAY_NUMBER_IN_MONTH | INTEGER | | DIM_TIME | DAY_NUMBER_IN_WEEK | INTEGER | | DIM_TIME | WEEKEND | VARCHAR | | DIM_TIME | DAY_OF_WEEK | VARCHAR | | DIM_TIME | TIME_KEY | DATE | | DIM_TIME | TIME_KEY_SEQ | INTEGER | +-------------+---------------------------------+-----------+ -- Let's look at a few rows select time_key_seq, time_key, yr, month_number_in_year, fiscal_year , fiscal_month_number_in_year from dim_time; +---------------+-------------+-------+-----------------------+--------------+------------------------------+ | TIME_KEY_SEQ | TIME_KEY | YR | MONTH_NUMBER_IN_YEAR | FISCAL_YEAR | FISCAL_MONTH_NUMBER_IN_YEAR | +---------------+-------------+-------+-----------------------+--------------+------------------------------+ | 1 | 2000-01-01 | 2000 | 1 | 2000 | 11 | | 2 | 2000-01-02 | 2000 | 1 | 2000 | 11 | | 3 | 2000-01-03 | 2000 | 1 | 2000 | 11 | | 4 | 2000-01-04 | 2000 | 1 | 2000 | 11 | | 5 | 2000-01-05 | 2000 | 1 | 2000 | 11 | | 6 | 2000-01-06 | 2000 | 1 | 2000 | 11 | | 7 | 2000-01-07 | 2000 | 1 | 2000 | 11 | | 8 | 2000-01-08 | 2000 | 1 | 2000 | 11 | | 9 | 2000-01-09 | 2000 | 1 | 2000 | 11 | | 10 | 2000-01-10 | 2000 | 1 | 2000 | 11 | +---------------+-------------+-------+-----------------------+--------------+------------------------------+
Generating the Time Dimension is accomplished using DynamoDBs ability to include Java based UDF Table Functions. Table functions are really powerful – they allow a BI developer to write custom functions that output a “table” that can be queried like ANY OTHER TABLE (mostly). Check out the wiki page FarragoUdx if your interested.
And of course: download LucidDB and give it a whirl!
NOTE: To be candid, doing it as a view isn’t the best approach. For anything beyond tiny (5 million +) we should actually create the table, and do an INSERT INTO SELECT * FROM TABLE(fiscal_time_dimension).
that time dim stuff is a huge time saver………
it brings a glow to my heart
π
Pingback: Goodman on BI » DynamoDB: Time Dimension table with MERGE
Well, I’ll call you in 13.000 years then when I reached the limit with the day-rows in that view/table. π
Thomas,
Always a good laugh.
To be clear, I was talking about the volume of records in the fact table (5 million) not the dimension table; however that isn’t what I wrote. I wrote exactly what you read!
So, I *do* expect you to give me some grief in 13000 years! π
Shouldn’t it be a “DATE”-dimension? I see no time in any of the columns.
Eric,
In the OLAP world the time dimension doesn’t usually contain time elements and we don’t usually go below a day granularity. When we do, we typically have a separate dimension called “time of day” or something of the like.
I think this is mostly because in OLAP systems we almost always talk about “show me fact (sales) OVER dimension (product).” It’s a bit more natural when talking with business analysts, etc to say things like “Sales over Time, Cost over Time” which maps better to their language than “Cost over Date, Sales over Date.” This is prevalent; MSFT in their MDX specification define all the special “Time Dimension” stuff but it’s all Months/Quarter/Years w/ no hours/minutes.
I’ve been asked this question many times; mostly by people who are *good* OLTP/3NF data modelers. I’m the first to acknowledge that naming it Time when their is no time element is not as technically accurate. However it gets us closer to the terms our users think in which is a good thing.
Hi Nicholas
Ive been playing around with Lucid db for a couple of few days now…it is very fast..but I have experienced problems creating schema xml files with Mondrian Schmea workbench…I have built the foodmart database in Lucid using the instructions in http://pub.eigenbase.org/wiki/LucidDbMondrianReplication.
Table Columns do not appear when using the JDBC Explorer in Schema Workbench against the Foodmart Schema in a Lucid DB, which means I cannot properly create a schema xml file, because the system doesnt correctly recognise the columns of the various tables in the Foodmart DB in Lucid.
I can see all the columns and data using the SQL Squirrel client connecting to Foodmart schema in the Lucid DB system
The Table columns also do appear when using the JDBC Explorer in Schema workbench against the Foodmart Schema in MySQL.
I was wondering if this was peculiarity with respect with Lucid DB and Mondrian Schema Workbench, and perhaps there was something I needed to set within the Lucid DB system to get the mondrian schema workbench working properly…
I was wondering if you had recently used the Schema Workbench against Lucid and if you had experienced any issues…
Also yes, I do know there is the Foodmart.xml file which comes with Mondrian out of the box, but still wanted to build another schema for use with mondrian
Any suggestions are greatly appreciated…
Regards
Richard
I’m wondering if i can have that time dimension localized as well? so i can pass something like (de_AT) for Austrian localization
that would make it usable for me as well π
but really handy stuff!! congrats! its something so common, why not built it into the db – makes absolute sense to me
Pingback: Complete Pentaho Installation on Ubuntu, Part 6 | Interesting IT Tip's