Database Operation Complexity Reference

I mentioned this previously, but I’ve been reading “Principles of Distributed Database Systems.” I’m enjoying it, and it’s helping me solidify many of the concepts that I apply daily in my capacity as a Principal BI Solutions consultant. Database theory, and specifically as it relates to tuning is part of any professionals work with Oracle. We’ve all deciphered the performance implications and clues for improvements from EXPLAIN PLAN. I’ve always been told you want to be able to give Oracle the clues/configurations to enable filter of results (selection) before joining. It always made sense but I had never understood fully the concepts behind these recommendations. Until now…

I don’t espouse to understand all of the complexities behind these issues but I ran across a great reference chart. I wanted to post it here along with some numbers for demonstration as a quick reference for any professionals with understanding of quadratic, logarithmic, and linear scales to match those up with the operations we use on a day to day basis.

This is from the book mentioned above, however I’m sure it’s rather common knowledge.

OPERATION	COMPLEXITY
SELECT	O(n)
PROJECT (without dedup)	O(n)
PROJECT (with dedup)	O(nlog n)*
GROUP
JOIN
SEMIJOIN
DIVISION
SET OPERATORS
CARTESIAN PRODUCT	O(n²)

A quick look at some numbers in the orders mentioned yield the following “costs” in n operation times.

n	O(n)	O(n*log n)	O(n²)
5	5	3.49	25
10	10	10	100
100	100	200	10000
1000	1000	3000	1000000
1000000	1000000	6000000	1E+12

Hope this helps provide a useful way to match up the database operations we use on a daily basis with the theoretical cost of such operations. Cheers!

Goodman on BI

Thoughts on Open Source, Analytics

Database Operation Complexity Reference

Leave a Reply Cancel reply