Attribute Groups with Oracle Endeca Information Discovery

Exploring unstructured data is a very powerful option.  And for the casual business user nine times out of ten this leads to a desire for some structure to be added.  Ordering our world is just human nature.  As you work with your client you may find yourselves asking two questions:

  1. What is a friendlier way to display the names of all those attributes?
  2. How can I highlight my most important attributes?

Oracle Endeca Information Discovery studio offers an easy interface to answer both those questions, specifically the “Attribute Settings” in the Control Panel.  Here you can directly adjust the display name, create groups to sort the attributes into, as well as control how the records are sorted (lexical or by record counts) and how they can be selected and applied through the Guided Nav component (single, multiple AND, or Multiple OR).  These are the properties you can edit:

  • mdex-property_DisplayName
  • system-navigation_Sorting (options:  lexical, record-count)
  • system-navigation_Select (options:  single, multi-or, multi-and)

There are however a couple shortcomings with this interface.  Firstly and most notably the ability to update those properties through Studio disappears with OEID ver. 3.0.

Secondly you don’t have access to all properties.  Specifically these additional properties for (standard) attributes are not exposed:

  • mdex-property_Type (default: string)
  • mdex-property_IsPropertyValueSearchable  (default: true)
  • mdex-property_IsSingleAssign (default: false)
  • mdex-property_IsTextSearchable (default: false)
  • mdex-property_IsUnique (default: false)
  • mdex-property_TextSearchAllowsWildcards (default: false)
  • system-navigation_ShowRecordCounts (default: true)

The third caveat is the changes are not permanent.  They update the MDEX engine but the next time you reset that data store they’re gone.  To make sure that doesn’t become an issue for you, here are the steps to make those same changes through the ingestion process.

1.   First work with your client to determine how you want each attribute to be exposed and to behave.  This is a pretty light weight process, and business friendly semantics will provide a high value abstraction to the underlying record structure.

2.   Create a csv file that contains for each attribute the desired display name and attribute grouping.  For this article I’ll also be adjusting how they are sorted, selected and if they’re text searchable.  My file is named “attribute_pdr.csv” and looks like this:

Image

a.      The header row can name the values any way you prefer, I used these so they were slightly more intuitive in case my client will maintain this file themselves.

3.   Create a new graph, add a “Universal Data Reader” and set your csv file as the file url.

4.   Extract the metadata off that file.  Reparse it to extract the names from the first row.  We’ll be using this same file to create the Attribute Groups as well as define the attribute properties.

ag_2

5.   Some of our input attributes are going to stay in the default “Other” group.  We’ll add an “ExtFilter” component to remove them before we pass that record set any further.  This could trap for other conditions, but we’re keeping it simple.

ag_3

6.   Add an “ExtSort” component to sort the data by AttrGroupName first, AttrName second (optional).  Edit your “Sort Key” as follows:

ag_4

7.   Add a Denormalizer component, and set the “Key” property to be the AttrGroupName attribute.  In the Denormalize property paste the following CTL2 code over the default append() and transform() functions.  This will create the xml strings you need to create and populate each of your attribute groups.

//#CTL2
// This transformation defines the way in which multiple input 
// records (with the same key) are denormalized into one output 
// record.

// global variables
integer n = 0;
boolean newGroup    = true;
string  xmlString   = "";

// This function is called for each input record from a group 
// of records with the same key.
function integer append() {

  // increment our record iterator
  n++;

  // check if we have a new group
  if (newGroup) {

    // add the header for a new xml output record
    xmlString = "<mdex:group key='" + replace($in.0.AttrGroupName, " ", "_") + "' displayName='" + $in.0.AttrGroupName + "'>";

    // set the flag to false
    newGroup = false;
  }

  // add the attribute to the current group we're iterating
  xmlString = xmlString + "<mdex-property_Key>" + $in.0.AttrName + "</mdex-property_Key>";

  return n;

}

// This function is called once after the append() function 
// was called for all records of a group of input records 
// defined by the key.
// It creates a single output record for the whole group.
function integer transform() {

       // update our current group with the xml ending tag
       $out.0.xmlString = xmlString + "</mdex:group>";

       // reset our global variables
       xmlString = "";
       newGroup = true;

       return OK;
}

8.   The last step to creating the groups is to add a “WebServiceClient” component.  The recommended practice is to define workspace parameters to track the values for each graph in the project.  Underscores separating the words would be better, I removed them here simply so the underlined urls don’t make them appear to have spaces.  These are the ones we’ll use:

  • MDEXHOST – localhost
  • MDEXPORT – 7770
  • DATASTORE – RY_DEMO   (Replace “RY_DEMO” with your Data Store Name)

9.   First verify the wsdl service is running:

http://${MDEXHOST}:${MDEXPORT}/ws/config/${DATASTORE}?wsdl.  Set that as the property of “WSDL URL”.

a.      Optionally  identify your server host, port and target data store name as parameters like this:    http://${MDEXHOST}:${MDEXPORT}/ws/config/${DATASTORE}?wsdl

OEID 2.2.2:

 If you’re running OEID 2.2.2 the above url should identify the port for your target data store, and the data store would not be part of the url like this:

http://${MDEXHOST}:${DATASTOREPORT}/ws/config?wsdl

OEID 3.0:

 If you’re running OEID 3.0 the url also now needs to reference the “Endeca Server Context”, which in a default scenario will be the folder ”endeca-server”.  I’ve also updated the data store parameter to reflect their new title as a “Data Domain”.  The final url would look like this:

http://${MDEXHOST}:${MDEXPORT}/${ENDECASERVERCONTEXT}/ws/config/${DATADOMAIN}?wsdl

10.   Select the “Operation name” property to DoConfigTransaction.

ag_5

11.   Edit your “Request structure” as follows, be sure the config-service is identified as putGroups.  This will update or add groups based on your input file, it won’t remove any groups created separately.

ag_6

OEID 2.2.2:

 If you’re running OEID 2.2.2 the xmlns:config-service address will be:

http://www.endeca.com/MDEX/config/services/types

OEID 3.0:

If you’re running OEID 3.0 the xmlns:config-service address will be:

http://www.endeca.com/MDEX/config/services/2/0

12.   Check the OEID Control Panel.  By default you should see something like this:

ag_7

13.   Attribute Groups can only be defined when the attributes already exist in the Data Store.  To avoid this being an issue set the sequence on the components that create the attribute groups to 1.

14.   The next thing we want to do is set the individual attribute properties.  Specifically we’re going to change the display name, the sorting, the selection, and which will be text searchable.  Instead of creating a duplicate of the first “UniversalDataReader”, add a “SimpleCopy” component.  This doesn’t need any configuration, just drag and link the output ports around so the “UniversalDataReader” outputs to the “SimpleCopy”, and the “SimpleCopy” outputs back to the original “ExtFilter” component.

15.   Next add a “Reformat” component, and drag the output from the “SimpleCopy” component to it.   Your graph should now look something like this:

ag_8

16.   The steps for the “Transform” property (“Reformat” component) are well documented (IntegratorComponentsGuide.pdf).  I’ve modified them only slightly as follows (source tab):

//#CTL2
integer n = 1;
integer aggrKey = 0;

// Transforms input record into output record.
function integer transform() {   
  string attrRecord = "<mdex:record xmlns=\"\">";

  attrRecord = attrRecord + "<mdex-property_Key>" + $in.0.AttrName + "</mdex-property_Key>";
  attrRecord = attrRecord + "<mdex-property_DisplayName>" + $in.0.DisplayName + "</mdex-property_DisplayName>";
  attrRecord = attrRecord + "<mdex-property_IsTextSearchable>" + $in.0.TextSearch + "</mdex-property_IsTextSearchable>";
  attrRecord = attrRecord + "<system-navigation_Sorting>" + $in.0.SortOrder + "</system-navigation_Sorting>";
  attrRecord = attrRecord + "<system-navigation_Select>" + $in.0.Selection + "</system-navigation_Select>";
  attrRecord = attrRecord + "<mdex-property_Type>" + $in.0.DataType + "</mdex-property_Type>";

  $out.0.xmlString = attrRecord + "</mdex:record>";

  // Batch up the web service requests.
  $out.0.singleAggregationKey = aggrKey;
  n++;
  if (n % 15 == 0) {
    aggrKey++;
  }

  return ALL;
}

17.   As before we’ll add a “WebServiceClient” to make the push to mdex.  Add that component, link the output from the “Reformat” component, and specify user-specified metadata as follows:

ag_9

18.   Configure the “WebServiceClient” URL & Operation Name as before.  The “Request Method” should this time specify updateProperties instead of putGroups.

ag_10

19.   As identified above the address of xmlns:config-service will need to reflect your version of OEID.

20.   Run the graph, and you should now see your attribute properties appearing as follows:

ag_11

21.   The Attribute Groups should now display through the OEID Control Panel as follows, and note how the default “Other” group remains with just the 1 attribute.

And there you have it.  These can be added to your solution to ensure any customizations on attribute display and behavior are never lost.

There are many alternative ways in which those groups and properties could have been set that may be simpler or more robust.  I welcome any comments or suggestions for improvement.  Meanwhile I hope this will help you customize your implementation of OEID to make the exploration of data a friendlier activity for your business.

Advertisements
Aside | This entry was posted in Endeca and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s