Multi-Assign Attributes – What they are and how they’re loaded

Oracle Endeca Server can support “Multi-Assign Attributes”. This means records can have one or more values assigned to the same attribute. All data domain records are stored as name value pairs, so in effect you end up with a record structure like this:

Movie_Title = True Romance
Actor_Name = Christian Slater
Actor_Name = Dennis Hopper
Actor_Name = Val Kilmer
Actor_Name = Brad Pitt

Just to name a few, that movie is well cast.

The power is that not only are all those values distinct and searchable, but since Endeca does not require the data schema to be modeled and formalized in advance one movie record to the next can dynamically have any number of additional values.

If we tried to represent this in the perspective of a traditional table it might look like so:

Movie_Title
Actor_Name_1
ActorName_2
Actor_Name_3
Actor_Name_4
True Romance
Christian Slater
Dennis Hopper
Val Kilmer
Brad Pitt
Open Water
Blanchard Ryan
Daniel Travis
NULL
NULL

Technically we’ve broken Codd’s first rule of Normalization, but that’s alright, Oracle Endeca Server is designed to support these sorts of ragged width records. The actual data domain record won’t even store NULL values for the actors that don’t exist for the Open Water film. For the record Open Water did have more than just those two their cast.

The records are actually stored in the data domain like this, note they vary in the number of columns and the attribute keys share the same name:

Movie_Title
Actor_Name
ActorName
Actor_Name
Actor_Name
True Romance
Christian Slater
Dennis Hopper
Val Kilmer
Brad Pitt
Open Water
Blanchard Ryan
Daniel Travis

The only bit of advance modeling we needed to do was updating the PDR record so the isSingleAssign property (<mdex-property_IsSingleAssign>) for Actor_Name is set to false. The default is true so we can leave it as is for Movie_Title. No other data modeling work was required.

The convenience of records being dynamically extended for multiple attribute values is awesome. This is a critical differentiator for Oracle Endeca Information Discovery, one I can’t really highlight the value enough.

 

So how do these records get ingested?

Oracle’s IntegratorETL is based on an open source ETL tool called CloverETL to which custom (Discovery) components have been added to interface with an Oracle Endeca Server. Since IntegratorETL is otherwise a “standard” ETL tool, it however still expects data to be in a traditional format. This means rows & columns since IntegratorETL doesn’t have the same native understanding of “multi assign” that OEID has.

Your data may arrive in a format that concatenates together the multi assign values to the single attribute. This is definitely the simplest structure to ingest. The value is already in an OES-friendly format.

Movie_Title, Actor_Name
True Romance, “Christian Slater, Dennis Hopper, Val Kilmer, Brad Pitt”
Open Water, “Blanchard Ryan, Daniel Travis”

In a case like this you wouldn’t need to do anything more than replace that comma with the character being used as the delimiter. For simplicity I’m just using the pipe character as my delimiter.

There are a few components that offer a place to add your transformation, the Reformat component works fine.  Simply update the mapping with the CTL2 replace function.

$out.0.Actor_Name = replace($in.0.Actor_Name, ",", "|");

Your output will look like this:

ma_03

The Discovery components such as Bulk Add/Replace Records include a property called “multi assign delimiter”, which specifies how a single attribute value will hold multiple values. Make sure this value matches your delimiter character, pipe or whatever.

ma_01

 

More commonly you’ll find yourself with a normalized data source with multiple separate records. Either in your initial source file or possibly as the result of a join during your ETL process. Like so:

Movie_Title, Actor_Name
True Romance, Christian Slater
True Romance, Dennis Hopper
True Romance, Val Kilmer
True Romance, Brad Pitt

You could pass those records directly through the Add Key Value Pair component (Add KVPs).

ma_02

The downside is you may dramatically increase the number of records passing through the data stream. The same single assign values may add some pointless overhead. You may have other reasons to want to track your record counts so that 10 records passing along the edge actually mean 10 individual records.

Whatever your reason here’s another option to transform those records into a multi-assign delimited format.

The first step is to add the FastSort or ExtSort components. Identify your key column as the sort column, in this example that would be Movie_Title.

Next add a Denormalizer component. You’ll use this component to create a single delimited value and a smaller output record set.

Once again you’ll identify your key column for Key property. Select the transform property and paste the following code:

//#CTL2

//global variables
integer n = 0;
string tmpMovie_Title = "";
string tmpActor_Name = "";

function integer append() {

//increment our record structure
n++;

tmpMovie_Title = $in.0.Movie_Title;

// assume we'll be adding multiple values
// therefore always add a delimiter
tmpActor_Name = tmpActor_Name + $in.0.Actor_Name + "|";

return n;
}

function integer transform() {

$out.0.Movie_Title = tmpMovie_Title;

// remove our trailing delimiter
$out.0.Actor_Name = left(tmpActor_Name, length(tmpActor_Name) - 1);

// reset our global variable
tmpActor_Name = "";

return OK;
}

Since this example sorted the records our output record format will look like so this time:

ma_04

These records can now be passed directly through the Bulk Add/Replace Record component. As before simply make sure the multi-assign delimiter property contains the matching character.

Your basic graph would look like this:

ma_05

And that’s all you need to do to reformat values for multi-assign attributes. Good luck!

And be sure to check out both those movies!

Advertisements
This entry was posted in Endeca and tagged , , , , , , , , , , , , . Bookmark the permalink.

One Response to Multi-Assign Attributes – What they are and how they’re loaded

  1. Anshul Sahu says:

    Great explanation ! Thanks !

Comments are closed.