Predictive Analytics with Endeca

The opportunities and challenges to delivering real world Predictive Analytics are exciting.  They’re not trivial efforts, and they require a level of collaboration between Business and IT that can be rare.  However when the stars align the forecasts they produce can be game changers.  And a BI solution that doesn’t change the game for its business is arguably a waste of time.

Oracle Endeca Information Discovery is not really a Predictive Analytics tool.  The Text Mining through Lexalytics provides one powerful data mining model, but that’s the only one.  Plus it’s part of the data ingest and upstream from Studio.  We’ve interfaced with R from Integrator enough to know during the ETL stage just about any external data mining model is effectively available.  Some level of classification and association might be suggested through Studio’s data exploration, but I’d argue this produces business questions and is a far cry from the complex algorithms proper cluster analysis and classification trees produce.  OEID is a first and foremost a tool for Data Discovery.

So what good is OEID when your company wants to add Predictive Analytics?

There are three reasons, starting with “Big Data”.  If you crunch through some of the facts gathered and collected by Marcia Conner ( you can see data volumes are absolutely enormous and continuing to grow.  Data mining structured and massaged data isn’t hard, but there is a growing gold mine of unstructured and social media data that is ripe for analysis.  Collecting all this data together requires robust and capable ETL processes that can handle data sets that are text rich and constantly changing.  OEID provides the Integrator Acquisition Service (IAS), the text enrichment components, and an inherent flexibility in the engine to support multi assign attributes and ragged width records without a long refactoring effort.  Combined these deliver an ideal toolset to bring together data from multiple sources and formats.

The second consideration is the sheer effort involved in Bridging the Gap between business and IT.  Data Scientists are expected to know all the data mining models, but like the rest of IT have no secret insights to automatically understand all the business nuances.  And the data mining tools they use (e.g. R) are not always intuitive interfaces for business users to pick up.  OEID Studio is a visual tool.  The intuitive and friendly interface makes it ideal to search, explore, and even extract data of interest.  Exploring data can be used to feed into data mining models, create training sample sets, and of course to explore the output they generate.  As a tool for communicating and collaborating Studio can be ideal for reducing the gap between IT & business and help ensure relevancy and focus to any data mining efforts.

The final major consideration is the growing demand for tools that deliver Interactive and Intuitive visualizations and simulations.  At heart I absolutely believe the rule that good data coupled with simple visualizations is best, and out of the box OEID offers the essential histograms, line graphs, scatter plots, and of course the ever popular pie chart.  With features like the guided navigation, breadcrumbs, and the search interface OEID can provide a tool to explore your data and any generated data mining models.  For those analysts who trust to the pivot table and ad hoc queries the alerts, results table and metrics components are easily accessible.  For anything more, the framework Studio is built upon (LifeRay) is entirely extensible to customizations, improved visualizations, and even methods that can support simulations and model output comparisons.  OEID is an ideal and flexible tool for business users to interact with the output of data mining models.

Business Users aren’t going to be able to use OEID to generate predictions.  They aren’t going to write EQL queries that perform regression analysis.  They won’t find a magic button to perform algorithmic analysis and produce coefficients, probabilities, margins of errors, and all the other statistical outputs proper data mining models produce.

They will however have a tool that can handle all the varieties and volumes of real time data to feed into the data mining exercise.  They’ll have an architecture that can interface with data mining engines to ingest the results of a model.  They’ll have powerful text mining engine in a data world that is increasingly unstructured text.  And they’ll have an interactive and visual tool that lets them actively participate in preparing, tuning, and applying the fruits of Predictive Analytics.

The only thing missing is the question:

What do you want to predict for your business?

This entry was posted in Endeca and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

2 Responses to Predictive Analytics with Endeca

  1. Pingback: research for shiny people… | all things dissertation

Comments are closed.