I mentioned this previously, but I’ve been reading “Principles of Distributed Database Systems.” I’m enjoying it, and it’s helping me solidify many of the concepts that I apply daily in my capacity as a Principal BI Solutions consultant. Database theory, and specifically as it relates to tuning is part of any professionals work with Oracle. We’ve all deciphered the performance implications and clues for improvements from EXPLAIN PLAN. I’ve always been told you want to be able to give Oracle the clues/configurations to enable filter of results (selection) before joining. It always made sense but I had never understood fully the concepts behind these recommendations. Until now…
I don’t espouse to understand all of the complexities behind these issues but I ran across a great reference chart. I wanted to post it here along with some numbers for demonstration as a quick reference for any professionals with understanding of quadratic, logarithmic, and linear scales to match those up with the operations we use on a day to day basis.
This is from the book mentioned above, however I’m sure it’s rather common knowledge.
OPERATION | COMPLEXITY |
SELECT | O(n) |
PROJECT (without dedup) | |
PROJECT (with dedup) | O(n*log n) |
GROUP | |
JOIN | |
SEMIJOIN | |
DIVISION | |
SET OPERATORS | |
CARTESIAN PRODUCT | O(n²) |
A quick look at some numbers in the orders mentioned yield the following “costs” in n operation times.
n | O(n) | O(n*log n) | O(n²) |
5 | 5 | 3.49 | 25 |
10 | 10 | 10 | 100 |
100 | 100 | 200 | 10000 |
1000 | 1000 | 3000 | 1000000 |
1000000 | 1000000 | 6000000 | 1E+12 |
Hope this helps provide a useful way to match up the database operations we use on a daily basis with the theoretical cost of such operations. Cheers!