OWB 10gR2 : Real Time Data Warehousing

There’s lots of talk about real time, right time, period batch, message based in the Data Warehousing and BI circles these days. I think this is driven by quite a few reasons. Need for fresh data, need for unified reporting interfaces for users, etc. Mostly, I think it comes down to a TCO for IT assets. As the EAI/EII/ETL tools start to converge along with increased SOA-ee-ness of databases and middleware products there becomes quite a bit of overlap between the different product sets. Managing “one product” that does this data integration, calcuation, and movement between systems costs less to maintain than “multiple products.” Truthfully, I see little strategic (ie, warehouse and marts) data that needs to be computed in real time. Those cases do exist, and OWB 10gR2 has some new features for those that do have some Real Time DW/BI needs.

There are two major flavors of mappings in support of Real Time Data Warehousing in OWB:

  • PERIODIC BATCH: This is basically a batch process that runs frequently (say every minute or so) that reads data from a QUEUE or STREAM. While the data is pushed into the DW (real time), the system only processes when run (batch). These are regular mappings that use a Stream or Queue as a source instead traditional Tables/Views/etc.
  • TRICKLE FEED: This is much closer to what most people think of when we refer to real time data warehouse. Trickle feeds involve processing each individual record as it arrives, instead of waiting for them to collect. These are a special kind of OWB mapping called Real Time Mappings that run continuously and process records as they arrive.

Truthfully I’ve only kicked the tires with both of these types of mappings limitedly. I tested some of the features back in OWB Beta2 and built a conceptual mockup of how it would work for a customer of mine. What I’m presenting is a conceptual partially working mock up built using an early beta release. In other words, do not use it as reference or consider it a blueprint for how you should proceed. If there is enough interest I might submit an article to OTN on the subject. Anyone like the idea? Better yet, if you’re not one of my customers please do consider contacting me! I’d love to help build a Real Time DW solution with OWB!

OWB now includes the ability to define, deploy, and setup Streams, Queues, Queue Tables, UserDefinedTypes, and propogations within the GUI. There’s a whole set of screens that you’ll see when the community preview hits the shelves. Unlike regular OWB deployments there are some additional requirements around streams administration locations, permissions, etc, but they are easily surmountable. Also, if you’re going to be doing real time DW you need to understand a bit about the underlying technology anyhow (not tons, but enough to know why you need to have Archive Logging turned on, etc).

Refer to the following PDF for some greater details on the conceptual, but here’s a not so good screenshot:

I’ve created a mockup of a BI solution that is fed by a CRM (Customer Data Hub perhaps) and a Subscription Management Application for this example. You can see that conceptually this involves both systems sending messages either from the APPLICATION LEVEL (JMS or some other messaging technology) or the DATABASE LEVEL (with DML Stream Captures running in Oracle). In other words, we have multiple places we can get different pieces of data and the application doesn’t necessarily have to be “REAL TIME ENABLED” to send real time data. Oracle can do that on it’s behalf using the Streams technology!

Overall what this looks like is we setup the various Streams, Capture Processes (DML), Queue Tables, and Types (based on our source tables) to support our real time system. Note that the screenshot does not include the Streams on the source system or the Capture Process definitions. This only includes the DW side Streams, Queues, Dimensions, etc.

I’ve built three real time mappings (TRICKLE FEED) which in concert receive messages to add Dimension records (SCD2) and insert new Cube records (transactions). Notice this is a greatly simplified example entirely ignoring what I consider a best practice of loading into a normalized warehouse, then updating marts based on the warehouse (a la CIF methodology). Also these are all assuming to changes (ie, record corrections) just straight clean data! We should all be so lucky!

One receives updates from the CRM application and performs SCD on the appropriate Dimension objects.

The others receives event messages from a transaction based system and inserts records into Cubes.

This isn’t quite as much detail as I would like to have gone into, and I’ll quickly repeat my warning… This is just some mockups and conceptual work so don’t expect it to be accurate come OWB 10gR2 production time! I have some more thoughts on how to use this with Partition Exchange Loading to get a days “Cubes” built realtime throughout the day, and then at the end of the day move them over to the full history but that’s a whole nother article.

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

3 thoughts on “OWB 10gR2 : Real Time Data Warehousing

  1. Nicholas Goodman

    PS – The Customer Passed and didn’t go for the real time approach. Didn’t like the idea of building a real time mission critical system using a beta product! When OWB goes GA though… 🙂

    Reply
  2. Pete_S

    Just because the technology can do it does not mean we should! – one issue that sometimes gets overlooked is ‘what does my report mean?’ – if I run this report at 9:00am and again 11:00 and the figures are not the same… why?
    There is also a question as to why provide information faster than a business can react. I’m even not too sure if ‘real-time’ dashboards are that much use – if sales for a store dips at 10:00 is it beacuse of a problem with the sales strategy or the local highway is blocked by an accident. Perhaps we need to capture more raw information (weather, highway speed, sickness in the population) to enable better predicitive models ;_
    But I do have customers that are moving to realtime capture – it fits their 24×7 sales model.

    So perhaps I’m saying realtime BI needs a business culture change to use it

    Reply
  3. Nicholas Goodman

    Indeed Peter… I agree with you that there are very few cirumstances where users have valid BI needs for strategic data. While it can do this, I think the real benefit of Real Time DW lies in other areas currently.

    Area 1 : Smoooooothing that batch window by pre-processing much of the days transactions throughout the day so that come midnight you update some summaries and then exchange some partitions and you’re ready. No fighting 8am for reports to be ready.

    Area 2 : Drill to detail. The data has to get form source to warehouse for the strategic data processing. If you do it in real time you also have the atomic, current operational data there as well. Instead of purchasing and configuring some EII tool to understand data in warehouse and then drill to operational system on the other side of the planet, the DW can have both.

    Area 3 : “Killer App” I’m working on… 🙂 I’ll post some more about this later along.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *