Safe High Performance Sandboxes

Recently I attended a TDWI presentation to hear about some trends being seen within one of the Canadian government ministries. Largely I was hearing a success story for a traditional BI warehouse. Data was being well governed, actively analyzed, and predictive analytics were producing results.

What was also emerging was a demand from business users for raw access to data and the freedom to perform their own research. And like many organizations the IT department tries to satisfy this demand by granting “power users” the tools for ad hoc query access to the corporate data marts.

The presenter defined this area of BI as “high performance sandboxes”, and asked the question: “Is their architecture really optimized to support that need”?

Though every BI team would love to have completely satisfied customers, they are mandated to respect three critical responsibilities:

#1 Data Security & Privacy. The vault needs to have locks on it.

#2 Server Stability & Performance. The infrastructure needs to operate reliably for all users.

#3 Data Accuracy. The various reports, queries, and data models need to be correct.

Enforcing these demands time. And data is arriving on user desktops in greater formats and from a wider range of sources than ever before. So we see increasingly frustrated business users, slow moving analytics, and a popular focus on “self-serve” and “empowered users”.

The typical result is of course to use Excel. The tool is already on our desktops and lets us load and analyze data without having to wait for permission. The proverbial bird in the hand. This is what “self-serve” and “empowering” are supposed to deliver.

What Excel and other desktop analytical tools don’t do so well is they largely skip those three critical responsibilities. Spreadsheets are rogue outputs, without visibility in how they’re shared or assurance of accuracy. And our mailboxes and network folders are ripping at the seams with spreadsheets. I love Excel, but here’s an excellent article about Excel (https://davidmichaelross.com/blog/microsoft-excel-is-everywhere).

So let’s add some business challenges to this demand for “high performance sandboxes”.

#4 Faster Arriving Data. Between data extracted, user generated, shared from colleagues, or acquired independently data is arriving faster than IT shops can respond.

#5 Data of All Formats. Data is not only heavily unstructured but needs to be analyzed together.

#6 Data Collaboration. Whatever conclusions are reached, they need to engage a wider audience.

 

Oracle EID’s optional add on “Provisioning Services” provides an alternative that can deliver that “high performance sandbox” and help meet IT’s responsibilities and the challenges of a sandbox.

 

Data Security & Privacy

When you load data through Provisioning Services a data domain is automatically created on a centrally managed and secured Oracle Endeca Server. Access is only through EID Studio, where robust security features like SSL and LDAP can be applied. Rather than emailing copies of spreadsheets or saving them to crowded network folders, users can email a simple hyperlink to a Studio web page. This data is a corporate asset and critically needs to stay where IT can ensure only authorized users can access it.

Server Stability & Performance

That same centrally managed OES server is a machine that IT can manage and control. They can monitor and optimize performance to ensure all IT services perform reliably and optimally with each other. They can also automate backups to ensure business continuity of anything business users create. One user can’t run an ad hoc query that takes down a server, or produce a valuable report that gets accidentally deleted from their desktop.

Data Accuracy

With the data and analytics hosted through a centrally managed environment IT staff have the option to support users with what they’ve created. Business decisions can be made off these analytics and IT may need to be aware of without necessarily being a bottleneck in their creation. And if or when a correction needs to be made, it’s done once and we can avoid having multiple versions to maintain or deal with.

Faster Arriving Data

Provisioning Services puts the process of ETL directly in the hands of the users. There isn’t the range of complexity available to formal ETL tools, but it still offers very easy to use options for cleansing, filtering & structuring the data being loaded. Certainly more robust than loading data into Excel but sharing the same advantage of avoiding any delay waiting on IT processes.

Data of All Formats

Data is changing from controlled structured data to exploding volumes of unstructured data. Tools like Excel are very powerful for analytics, but mostly for crunching numbers. Oracle EID includes the standard charting & data visualizations, but also offers a much more unique set of features for robust searching, languages, and even some text enrichment options. Furthermore the data sets allow multiple sets of data from different systems to be explored and analyzed together.

Data Collaboration

The last challenge we face is how we share our research and conclusions with others. Generally we send people copies of spreadsheets with a sequence of steps or explanations of what they’re looking at. Sharing a spreadsheet collaboratively requires another tool like GoToMeeting or Lync. EID Studio is a centralized web service, so the audience is always working with the same version of data. The Studio features are intuitive enough to support users of all technical levels to directly explore a question on their own. With the convenience of bookmarks, portlets that can contain free form text (think discussions!), and isolated applications users can easily work in the same sandbox.

 

There is a point where I would expect a user generated sandbox to be migrated to a production state or to be retired. A “sandbox” is just an area to play in. And that production state could very well be better applied by tools outside of Oracle EID. The question is not what the best BI tool for analytics is. The question is how the IT architecture can safely support the demand for “high performance sandboxes”.

To that question I would suggest Provisioning Services can give your users a powerful tool that will not only support IT responsibilities but also the state of data as it evolves today. The nature of data and BI is evolving, so too must our set of tools.

Advertisements
This entry was posted in Endeca and tagged , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s