(This posting follows an earlier posting on working with EQL to provide a dynamic filter)
As described in Part 1 of this posting, implementing dynamic filters is an effective strategy for presenting metrics that reflect the most recent business time period.
One client running OEID 2.3 were particularly pleased as it affected around 50 separate components and their IT support was frustrated by the amount of effort required to update the filter each month. As their data volumes for one particularly data store increased however we started to see a significant performance issue.
This wasn’t an issue for their other data stores and at least to date I’ve not seen it with OEID 3.0. We also won’t see much of a performance issue with the GettingStarted data domain given their small data volumes. I’ll step through a comparison of three approaches and you can recreate the same tests at your site with larger data volumes.
To compare the different options I started testing with a hardcoded simple query as follows:
Before testing this eql and before every test I made sure to flush out cached data. The performance issue would only occur at sporadic times, but always with the first clean slate loading of the Studio page.
Then I reset the server stats.
For each test I would test the eql, load the metrics and then retrieve my server stats.
For this first test the results were as follows, and the first query is the one we’re most interested in. This captures the activity after selecting “Test EQL”.
The hardcoded query is definitely the fastest, though as described previously comes at a high price in maintenance and risk for error.
I’ll reference the statistics at my client site going forward since we can’t effectively demonstrate the difference with the GettingStarted dataset. Their performance stats for a virtually identical eql query against a much larger data volume were as follows (all numbers are in milliseconds):
Query 1 again represents the performance load from testing the eql. This generates the largest performance hit as the dgraph process reads all the data from disk to memory.
I updated the query to use my lookup function as follows:
After flushing the data and resetting the server stats my updated stats indicated the following:
What OES is doing under the hood isn’t clear, it feels like the function in the WHERE clause is processed per record. Whatever the reason the performance was definitely unacceptable. Particularly given this was a single metric query on a page containing many.
Resolving this issue was proving to be a challenge. Hardcoding the filter on each individual query was fast but impractical. Defining the filter as a refinement filter was also fast, however impacts all components and breaks trending charts. Upgrading from OES from 2.3 to 7.4 was also without impact.
Taking a page from the world of traditional database development where query optimization starts with a focus on record sets I tried replacing the WHERE clause with a JOIN.
When I refreshed the stats with this query I found the following:
The speed of this JOIN query was not as fast as the hardcoded version, however acceptably close to it and dramatically better than using the WHERE clause.
I’d be curious to learn if this same behavior affects OEID 3.0. Meanwhile whether you choose a local DEFINE statement or centralize it with a custom view through View Manager the performance improvement in applying a JOIN instead of a WHERE clause can be substantial.
Stay tuned to my next posting where I’ll look at how to resolve metrics that disappear.
Update: Oracle provided the client in question with a patched version of OES 7.4 which appeared to resolve this performance issue.