owa_pattern.amatch gotcha

I’ve been using regular expressions for years. A few handy cut, sorts, egreps, and uniqs work wonders on datafiles for quick easy troubleshooting. I spent an hour or so tracking down what I think is a funny implementation of regular expression matching. Thought I would quickly post it, and hope google picks it up so that perhaps I can save some poor soul the couple hours I just lost to it.

There is limited information about owa_pattern available; one of the better resources I’ve found is this page on stormloader.com. One of the bits of advice on that page is to:

Or use the second function in owa_pattern, amatch. This function has an IN parameter that dictates where you want to start to match. For our purpose, we don’t care so set it to 0. What we want from it is the type of the return value, integer.

Examine the next two commands and their results, and perhaps you can glean what the issue is with setting the offset to 0
select DATE_UTC, owa_pattern.amatch(date_utc, 0, ‘2004’) from xxxx yields

[12/Jul/2004:00:07:27	0
[12/Jul/2004:00:07:37	0

select DATE_UTC, owa_pattern.amatch(date_utc, 9, ‘2004’) from xxxx yields

[12/Jul/2004:00:07:27	13
[12/Jul/2004:00:07:37	13

The offset, or where to start, doesn’t necessarily mean that that is where the pattern matching will begin to scan for the pattern. It most literally means that the pattern must start at that offset. I’m not a huge expert on regex, but I wonder why a function that is supposed to tell YOU where the regular expression exists in the string requires that it start on an offset specified. Anyone have a perspective or a different experience (perhaps I’m looking at this improperly)?

Goodman on BI

Thoughts on Open Source, Analytics

owa_pattern.amatch gotcha