Expedia Customer Support is garbage.

Expedia Customer Support is garbage.

If you find a good deal through Expedia that’s cool. But be warned, if you have any issue that requires their support you may find yourself very disappointed.

I made the mistake of buying a couple plane tickets I realized needed changed. I contacted the airline and they had a policy that allowed tickets to be fully refunded within 24 hours, and were also happy to make the requested changed.

All good. However they could not because the tickets had been purchased through Expedia and as my travel agent they directed me to speak with them.

And for the record Expedia states their policy is they will defer to the policy of the airline.

“We understand that sometimes plans change. We do not charge a cancel or change fee. When the airline charges such fees in accordance with its own policies, the cost will be passed on to you.”

I called Expedia’s support line, explained the issue, and then was put on hold while they tried to call the airline. After waiting a while I was advised the airline (that I’d just spoken with) weren’t answering the phone.

Okay.. A little odd..

I was advised that “offline support” would take over resolving my case and I should expect to hear back from them within 24 hours.

I never heard back from them.

I sent Expedia two messages requesting an update and support for my case.

I submitted the form to escalate the issue, providing all the info and trying to get some attention.

We’ll review it and contact you within 72 hours.

I never heard back from anyone at Expedia. I have no idea if they tried anything but they certainly never tried to assist me.

We bought a second pair of plane tickets from another airline directly and had to abandon our other flight. Complete waste of money.

Do not waste your time with Expedia. You’ll be far better off going directly to the airlines and hotels. It may be a bit more work to research, but the cost is the same if not better. And most importantly you’ll undoubtedly have customer service that actually tries to help you.

Expedia didn’t just fail us with poor customer service, they abandoned us altogether. We were their customer, we paid them for a product that could not be used. They completely disregarded their own policies.

There is no excuse for that.

Advertisements
Posted in Uncategorized

Endeca Information Discovery version 3.2.0

Oracle has quietly made available version 3.2.0 of Endeca Information Discovery. Mid February it arrived on their Software Delivery Cloud although their website doesn’t make much reference to it so perhaps an “official” release date is still planned.

Either way the question we all have is what’s new & improved with this version from 3.1?

I’ve reviewed the technical documentation that was available, so a quick summary of changes are below:
Oracle Endeca Server 7.7.0

  • Only supports Windows 2012 and Linux 6.
  • Requires Weblogic 12, support for 11gR1 (10.3.6) is dropped.
  • Requires Java 7 or 8, 6 is no longer suported.
  • Application Developer Framework (ADF) Runtime no longer required.
  • Reduced remote administration, which could be a security improvement but I think translates to saying “endeca-cmd –host remoteServerName” won’t work.
  • New EQL function, CountDistinctMembers(attrName) has been added to support multi-assign attributes. Same as the more convoluted Cardinality(Set_Unions(attrName)) but apparently faster.
  • Sorting of records by geocode distance from a reference point. (supported by the Conversation Web Service, not Studio’s UI)
  • Default idle timeout increased from 10 to 15 minutes. You can increase it with a custom data domain profile, but 15 minutes is also the minimum.
  • Improvements apparently to how hostname is handled. Possibly good since there were some issues with localhost references before, though I usually blamed them more on Windows 7.
  • Improved handling of individual records that the Bulk Load Interface (Data Ingest Web Service) can detect as invalid and will identify. Previously the whole batch would just be rejected.
  • Much faster restart if WebLogic restarts.

Oracle Integrator ETL 3.2.0

  • Requires OES 7.7 and Java 7 or 8.
  • Still includes the bundled Salience Engine, which is better news that you might think since with closely related Oracle Big Data Discovery they went with an in house text engine and Salience is more robust. At least as far as sentiment scoring went it was.
  • Graph formats have changed and would need to be converted from 3.1. What has changed isn’t clear, but there was a path for Studio 3.1 that required changes to how attributes were identified so perhaps there’s a bit of that.
  • Trans folder must be present. Wasn’t a strict requirement before that I can recall but I never did anything other than ignore it.
  • Secure Graph Parameters feature lets you create a master password that encrypts graph parameters.
  • Social Media .zip library is not longer separated. This language file (TE_SocialData.zip) is now part of Salience.

EID Studio (3.2.0)

  • Requires WebLogic 12, Java 7 or 8, and either Linux 6 or Windows 2012. When you read Windows 7 is supported but not for prod environments, don’t believe it. You’re on your own there.
  • Numerous bug fixes, though most seem to be available as patches to clients through Oracle support. No core functionality or new components are included as far as I’ve seen from the documentation.

I didn’t look at IAS (Integrator Acquisition System), Provisioning Services, Integrator ETL Server, or the Web Acqusition Toolkit. However I will note the first three at least all require the same versions of Java, WebLogic, and O/S.  3.2.0 is a Platform Upgrade.

 

Is it worth it for you to upgrade to 3.1?

Full disclosure I’ve only looked at the technical documents. I’ve seen nothing released identifying functionality changes other than release notes listing all the fixed bugs. Time allowing if I uncover any new usability functions I’ll report back.

Upgrading WebLogic might be a small nuisance. A newer version of Java isn’t a shock, I already get prompted to update Java more often than Acrobat! Upgrading server operating systems could be a hassle though.

If you’re not already running Linux 6 or Windows 2012 I’d say talk to Oracle support. Most of those bug fixes for Studio are available as patches for 3.1. Your business users will appreciate the bug fixes, but from what I’ve looked at so far they may not gain much from you upgrading Windows.

 

 

Posted in Endeca | Tagged , , , , ,

Setting up OEID 3.x as a Windows Service

Sorry for the long absence. I’ve been working less with EID in the past while and found myself too busy to post things. However I recently had a task at a client to setup their installation of Oracle EID 3.1 as Windows Services. This included OES, Studio, Provisioning Services and IAS, but the steps are virtually identical for all of them. Oracle has all this documented, but I found some gaps so here all the details to do this for you.

  1. If you are going to run the service under a network account, make sure that account has full control to the Weblogic folder (e.g. C:\Oracle\Middleware\).
  2. Copy these two files:
    1. C:\Oracle\Middleware\wlserver_10.3\common\bin\commEnv.cmd
    2. C:\Oracle\Middleware\wlserver_10.3\server\bin\installSvc.cmd
  3. Rename the copies as follows:
    1. C:\Oracle\Middleware\wlserver_10.3\common\bin\comEnv_oes.cmd
    2. C:\Oracle\Middleware\wlserver_10.3\server\bin\installSvc_oes.cmd
  4. Open up the file startWebLogic.cmd, this is the file that executes when you manually start OES, in a folder like this:
    1. C:\Oracle\Middlware\user_projects\domains\oes_domain\bin\
    2. Comment out the two lines starting with %JAVA_HOME% (around lines 175 & line 178) with “@rem “. One of these two lines executes to start up the domain application.
    3. Below (or above) each add the following lines:
    @echo MEM_ARGS: %MEM_ARGS%
    @echo JAVA_OPTIONS: %JAVA_OPTIONS%
    @echo CLASSPATH : %CLASSPATH%

    Note : you can optionally echo out the entire line be using @echo instead of @rem, but I found it helpful to focus in on those specific variables

  5. From a command prompt execute startWebLogic.cmd, make note of the values output for those three variables. These reflect YOUR environment and you’ll need them later on.
  6. Edit the file startWebLogic.cmd file to undo your changes and remove any lines you added.
  7. Edit the file commEnv_oes.cmd file as follows:
    1. Update the following line (around line 144):
      set JAVA_VM=-server

      Note: “-client” doesn’t seem to be recognized by beasvc, perhaps the assumption is you wouldn’t set up the windows service if you were implementing in Development mode.

    2. Add the following line in the “:continue” block (around line 155):
      set MEM_ARGS=-Xms128m –Xmx3072m –XX:CompileThreshold=8000 –XX:PermSize=128m –XX:MaxPermSize=512M

      Note: this should reflect whatever values were output when you executed StartWebLogic earlier, those happen to be the values for my environment, they may not be the same for yours

  8. Edit the file installSvc_oes.cmd as follows:
    1. Update the following line (around line 58)
      call “%WL_HOME%\common\bin\commEnv_oes.cmd
    2. Go to the section “:runAdmin” (around line 110)
    3. Add a line to append the CLASSPATH variable with additional references needed for the WebLogic support classes:
      set CLASSPATH=%CLASSPATH%;C:\Oracle\MIDDLE~1\ORACLE~1\modules\oracle.jrf_11.1.1\jrf.jar;C:\Oracle\MIDDLE~1\WLSSERV~1.3\common\derby\lib\derbyclient.jar;C:\Oracle\MIDDLE~1\WLSERV~1.3\server\lib\xqrl.jar

      Note: This amendment reflects what startWebLogic.cmd output that the CLASSPATH variable was otherwise missing. It may differ for your site.

    4. Update the cmdline assignment to remove some escaped quotations around CLASSPATH and MEM_ARGS
      set CMDLINE=’%HAVA_VM% %MEM_ARGS% %JAVA_OPTIONS% -classpath %CLASSPATH% -Dweblogic.Name=%SERVER_NAME% -Dweblogic.management.username=%WLS_USER% -Dweblogic.management.server=\”%ADMIN_URL%\” –Dweblogic.ProductionModeEnabled=%PRODUCTION_MODE% -Djava.security.policy=%WL_HOME%\server\lib\weblogic.policy\weblogic.Server”

      Note: Full disclosure I don’t think this was actually necessary, but when I was checking the final cmdline statement there were some odd quotation marks showing up and since I had no spaces to worry about I opted to remove them altogether just in case.

  9. Create a file called OES_Service.cmd as follows:
echo off
SETLOCAL
set DOMAIN_NAME=oes_domain
set USERDOMAIN_HOME=C:\Oracle\MIDDLE~1\USER_P~1\domains\OES~DO~1
set SERVER_NAME=AdminServer
set WL_HOME=C:\Oracle\MIDDLE~1\WLSERV~1.3
set WLS_USER=weblogic
set WLS_PW=yourPassword
set PRODUCTION_MODE=false
set JAVA_VENDOR=Sun
set JAVA_HOME=C:\Java\JDK
set JAVA_OPTIONS= -Xverify:none –da –Dplatform.home=C:\Oracle\MIDDLE~1\WLSERV~1.3 … -Dweblogic.ext.dirs=C:\Oracle\MIDDLE~1\patch_wls1036\profiles\default\sysext_manifest_classpath

Note: this line goes one much longer so for the sake of brevity I omitted most of it, as before set it to the same JAVA_OPTIONS that were output by startWebLogic.cmd

call “C:\Oracle\Middleware\wlserver_10.3\server\bin\installSvc_oes.cmd”
ENDLOCAL

Note: I am assuming your user domain is oes_domain, adjust it to suit your domain name & folder location.

  1. From a command line (Run As Administrator), execute the file OES_Service.cmd.
    1. This will create a windows service with the name “beasvc oes_domain_AdminServer”. You can change that name, it’s defined after “-svcname:” at the very end of the InstallSvc_oes.cmd file.
  2. Execute the following additional command to adjust the display name
    sc config “beasvc oes_domain_AdminServer” DisplayName= “Weblogic OES Domain”

    Note: The space after the equal sign was intentional, and no space before, but you can use any actual display name you like

  3. Open up the Services console (through Administrative Tools)
  4. Find the service labelled “Weblogic OES Domain” and edit its properties as follows:
    1. Verify the Startup Type is “Automatic (Delayed Start)”
    2. Change the “Log On As” to another account (optional)
    3. Note: if you are using LocalSystem then you may be able to avoid the Delayed Start, in the environment I was working on this addressed an issue we had authenticating the log on account. An alternative to delayed start could be to define dependencies on TCP/IP & AFD (Ancillary Function Driver for Winsock)
  5. Apply the changes.
  6. Try to start the service.
    1. This may time out. By default Windows Services will time out after 30 seconds, so there could be an issue with the service taking a long time to start and tricking Windows into thinking its failed. Optionally adjust that time out value if you want:
      1. Open up RegEdit to adjust that timeout value
      2. Find the section: HKLM\ System\ CurrentControlSet\ Control
    2. Create a new DWORD value in this section of the hive:
      1. Name: ServicesPipeTimeout
      2. Value: 30000
  7. If you encounter other issues starting the services be aware the following syntax can help in troubleshooting it:
    1. Editing the batch files installSvc_oes and its manual counterpart startWebLogic with @echo and @rem can help extract variables for comparison.
    2. The service control utility from a command prompt provides several useful commands:
      sc start “beasvc oes_domain_AdminServer”
      sc stop “beasvc oes_domain_AdminServer”
      sc queryex “beasvc oes_domain_AdminServer”
      sc delete “beasvc oes_domain_AdminServer”
    3. You can start the beasvc utility in debug mode using this command:
      1. C:\Oracle\Middleware\wlserver_10.3.6\server\bin\beasvc –debug “beasvc oes_domain_AdminServer”
      2. This will start up the service and let you see the messages.
  8. You can also log in to the weblogic console to see if everything is working correctly:
    1. http://localhost:7001/console (this assumes oes is installed on the default port 7001)

 

 

Posted in Uncategorized | Tagged , , , ,

Safe High Performance Sandboxes

Recently I attended a TDWI presentation to hear about some trends being seen within one of the Canadian government ministries. Largely I was hearing a success story for a traditional BI warehouse. Data was being well governed, actively analyzed, and predictive analytics were producing results.

What was also emerging was a demand from business users for raw access to data and the freedom to perform their own research. And like many organizations the IT department tries to satisfy this demand by granting “power users” the tools for ad hoc query access to the corporate data marts.

The presenter defined this area of BI as “high performance sandboxes”, and asked the question: “Is their architecture really optimized to support that need”?

Though every BI team would love to have completely satisfied customers, they are mandated to respect three critical responsibilities:

#1 Data Security & Privacy. The vault needs to have locks on it.

#2 Server Stability & Performance. The infrastructure needs to operate reliably for all users.

#3 Data Accuracy. The various reports, queries, and data models need to be correct.

Enforcing these demands time. And data is arriving on user desktops in greater formats and from a wider range of sources than ever before. So we see increasingly frustrated business users, slow moving analytics, and a popular focus on “self-serve” and “empowered users”.

The typical result is of course to use Excel. The tool is already on our desktops and lets us load and analyze data without having to wait for permission. The proverbial bird in the hand. This is what “self-serve” and “empowering” are supposed to deliver.

What Excel and other desktop analytical tools don’t do so well is they largely skip those three critical responsibilities. Spreadsheets are rogue outputs, without visibility in how they’re shared or assurance of accuracy. And our mailboxes and network folders are ripping at the seams with spreadsheets. I love Excel, but here’s an excellent article about Excel (https://davidmichaelross.com/blog/microsoft-excel-is-everywhere).

So let’s add some business challenges to this demand for “high performance sandboxes”.

#4 Faster Arriving Data. Between data extracted, user generated, shared from colleagues, or acquired independently data is arriving faster than IT shops can respond.

#5 Data of All Formats. Data is not only heavily unstructured but needs to be analyzed together.

#6 Data Collaboration. Whatever conclusions are reached, they need to engage a wider audience.

 

Oracle EID’s optional add on “Provisioning Services” provides an alternative that can deliver that “high performance sandbox” and help meet IT’s responsibilities and the challenges of a sandbox.

 

Data Security & Privacy

When you load data through Provisioning Services a data domain is automatically created on a centrally managed and secured Oracle Endeca Server. Access is only through EID Studio, where robust security features like SSL and LDAP can be applied. Rather than emailing copies of spreadsheets or saving them to crowded network folders, users can email a simple hyperlink to a Studio web page. This data is a corporate asset and critically needs to stay where IT can ensure only authorized users can access it.

Server Stability & Performance

That same centrally managed OES server is a machine that IT can manage and control. They can monitor and optimize performance to ensure all IT services perform reliably and optimally with each other. They can also automate backups to ensure business continuity of anything business users create. One user can’t run an ad hoc query that takes down a server, or produce a valuable report that gets accidentally deleted from their desktop.

Data Accuracy

With the data and analytics hosted through a centrally managed environment IT staff have the option to support users with what they’ve created. Business decisions can be made off these analytics and IT may need to be aware of without necessarily being a bottleneck in their creation. And if or when a correction needs to be made, it’s done once and we can avoid having multiple versions to maintain or deal with.

Faster Arriving Data

Provisioning Services puts the process of ETL directly in the hands of the users. There isn’t the range of complexity available to formal ETL tools, but it still offers very easy to use options for cleansing, filtering & structuring the data being loaded. Certainly more robust than loading data into Excel but sharing the same advantage of avoiding any delay waiting on IT processes.

Data of All Formats

Data is changing from controlled structured data to exploding volumes of unstructured data. Tools like Excel are very powerful for analytics, but mostly for crunching numbers. Oracle EID includes the standard charting & data visualizations, but also offers a much more unique set of features for robust searching, languages, and even some text enrichment options. Furthermore the data sets allow multiple sets of data from different systems to be explored and analyzed together.

Data Collaboration

The last challenge we face is how we share our research and conclusions with others. Generally we send people copies of spreadsheets with a sequence of steps or explanations of what they’re looking at. Sharing a spreadsheet collaboratively requires another tool like GoToMeeting or Lync. EID Studio is a centralized web service, so the audience is always working with the same version of data. The Studio features are intuitive enough to support users of all technical levels to directly explore a question on their own. With the convenience of bookmarks, portlets that can contain free form text (think discussions!), and isolated applications users can easily work in the same sandbox.

 

There is a point where I would expect a user generated sandbox to be migrated to a production state or to be retired. A “sandbox” is just an area to play in. And that production state could very well be better applied by tools outside of Oracle EID. The question is not what the best BI tool for analytics is. The question is how the IT architecture can safely support the demand for “high performance sandboxes”.

To that question I would suggest Provisioning Services can give your users a powerful tool that will not only support IT responsibilities but also the state of data as it evolves today. The nature of data and BI is evolving, so too must our set of tools.

Posted in Endeca | Tagged , , , , , , , , , , , ,

Tips & Tricks for Optimizing Oracle Endeca Data Ingestion

More than once I’ve been on a client site to try to deal with a data build that was either taking too long, or was no longer completing successfully. The handraulic analysis to figure out what was causing the issues can take a long time. The rewards however are tremendous. Not simply fixing a build that was failing, but in some cases cutting the time demand in half meant a job could be run overnight rather than scheduled for weekends. In some cases verifying with the business users what attributes are loaded and how they are interacted with can make their lives easier.

Below are a collection of tips I would suggest for anyone trying to speed up a data build in Integrator. I can swear these will work for you, every site is different. They are simply based on some experiences I’ve had so if they help you then great.

– Reduce the number of rows and columns. As a general rule less will always be faster so if you can filter out garbage records or redundant columns do. Depending on your version of OES the indexer will work in batch sizes of up to 150 or 180 MB. My assumption is the more records that can fit into each batch the faster the overall ingestion process will be. Calculating average record size may not be easy, particularly with multiassign attributes and ragged width records. However you can monitor the Rec/S and KB/s which Integrator will report which can at least help you measure when you’re processing more records.

– Defrag the data drives. Although the physical mechanics of how the data store file is managed are not obvious, the capability of the engine to read and write to/from disk contiguously appears to be significant.

– Specify a higher number of threads. Although in Windows the dgraph process may show any number, officially the default will be 2. The standard recommendation is to identify as many threads as you have CPU cores. My suggestion is experiment with different numbers until you find an optimal point. When you create or attach the data store specify the “–vars –threads X” parameter (X= the number you want). Full disclosuse this is a stock recommendation and I’ve not tested whether this impacts data ingestion or OES queries equally.

– Check if a newer version of OES that could be installed instead. On one client site running 2.3 we installed OES 7.4, it was both compatible and the performance improvements were notable. As mentioned above how batch sizes are defined has changed. With early versions of OES it was always set to 150 MB, with later versions it is dynamically defined and can scale up to 180 MB. This should benefit not only small data ingestions but large ones.

– Refine the data types. Strings will almost always consume the most disk space. The larger the footprint of a record the longer it tends to be to read or write it. And for analytical purposes strings are mostly qualitative fields. You need numbers for quantitative analysis and dates for trending.

– Check the 64-bit version of OES was installed. This is obviously dependent on your hardware but 32-bit servers are becoming hard to find and the 32-bit version of the software might have been installed by accident.

– Verify the bottleneck in the ingestion actually is the indexer. You can generally see this in integrator when the console log only shows time ticking and all the components are done except for the Bulk Add/Replace component. There is very little other feedback on what the indexer is doing, but you can monitor the files in the generations folder to see the activity the indexer is producing. If the bottleneck isn’t the indexer then don’t waste time putting out the wrong fire!

– Check Resource Monitor (Windows) and filter to the dgraph and javaw processes to track the amount of cpu/ram/etc.. being consumed. RAM will probably be your highest consistent resource and generally the easiest area to upgrade. In particular note the “Hard Faults/sec”. However if you see steady demand on Disk, CPU or Network instead you may have something else worth upgrading.

– Terminate or disable competing processes and services. Running multiple processes will chip away all the available resources and the more you can make available to your dgraph process the better. On almost every server I looked at I’ve found services that weren’t necessary just sitting idle and locking resources away.

– Use RAID 10 or RAID 0 (best balance for reads/writes) or a SAN. If you can align your ingestion process to read from one drive and write to another you may greatly speed things up and avoiding the heads from spinning back and forth. Optimizing disk is a bigger challenge to apply & more importantly accurately measure, but don’t neglect this since the impact is significant.

– If your data volume is particularly high moving your system paging file to a separate drive could also provide some benefits. Whenever your data store is larger then the available RAM there tends to be quite a bit of disk I/O.

– Run your indexes during off hours. During business hours the server will usually be dealing with user queries and juggling resources. Try to run your large data volume processes when users aren’t around.

– Run build steps in parallel. Sequential processes almost always means there will be latency when RAM, CPU or Disk are idle. Though processes running in parallel will individually take longer to complete, you’re still likely to complete more of them in a shorter time frame. Generally they’ll queue up waiting for resources and you’ll maximize the utilization of RAM, CPU and Disk.

– Review your attribute properties and the defaults. Ingesting data without any data modeling is easy, but if all your fields are Text Searchable your index creation will need to support that. Minimizing the number of searchable attributes can have a very significant reduction on the size of the data store. I’ve seen this in practice translate to half the disk footprint and the indexing time reduced by more than half.

– Review the cardinality of data values. In some cases attributes may be enabled for search, but the actual range of distinct values is so low that a search is almost meaningless. A search should provide an effective record filter, if the results are still going to be many millions maybe that value isn’t useful for searching on at all. Don’t forget you can still always use the Available Refinements to apply those kinds of filtering.

– Look for duplicate or combo fields. More than once I’ve found data sets where the same value was identical for more than one attribute. Particularly in cases when records were being merged from disparate data sources. This is a great way to ensure consistency in the source systems, but if that hasn’t been a problem then duplicating values may be offering zero return value. Same thing goes for fields that repeat the same information. Think of First_Name, Last_Name and Full_Name. There may be some business reason to format them for display purposes, but generally I’d keep the granular values and drop the combo version. You can always concatenate values through a view if you need to, meanwhile you’ve cut in half the memory those 3 fields consumed.

– Avoid updates by sticking to batch inserts. Multi-Assign attributes may require special attention, and the Bulk Add/Replace will certainly replace records. In theory if you can avoid updates you can avoid the overhead of the index having to deal with noncontiguous inserts. Note I haven’t definitively verified this would make a significant difference, so this is just a suggestion.

– Use the Bulk Add/Replace Records component. This one seems obvious but know that the other components will interact through the exposed/slow OES Web Services and the Bulk Add/Replace Records component is definitely faster.

– Verify your RecordSpec is appropriately unique. This is also a question of data integrity, but in some cases I’ve seen RecordSpecs defined on a number of concatenated fields that included metric values. Not only was the field much larger than it needed to be, but it also wasn’t reliably unique. If you aren’t going to be updating records the RecordSpec may be better defined on a smaller field guaranteed to be unique e.g. NEWID().

– Another suggestion is to presort your ingestion data by your record spec and submit as a batch. I tested this and have to admit I did not find any improvement, but with traditional databases presorted records tend to be processed faster. Try it out and leave a comment if you found this did or didn’t help.

– Review your managed attributes and dimensional hierarchies. If they’re not in use or serve a purpose remove them. I’ve seen some that weren’t being used but hadn’t been removed simply because it had been so complex to define and add them in the first place. They weren’t being used, but they were complicating the indexing process.

Hopefully these tips can help you out. Rest assured you can use Oracle Endeca to ingest, index, and search many millions of records with scores of attributes in OEID (I try to draw a line between 200 – 300). But some discretion, planning and review is always advisable.

Posted in Endeca | 1 Comment

Paradox of Structure

With possibly more optimism than time I routinely register for MOOC courses through coursera.org. I don’t bother with the assignments too much, just soak in some of the lectures. Not long ago I watched one around Creativity, Innovation & Change and the concept they presented has since come up more than once.

The anecdote they shared was that if you studied how children played around a playground in an open space you’d find they would tend to stay close to the center. Like electrons in a tight orbit around the nucleus of an atom. Though there was no fence to restrict them they would stick to the area that would feel safest and would avoid roaming off into the open areas.

pos_1

When they were instead playing around a playground that was fenced in they wouldn’t technically be able to roam around with as much freedom. However what they found is with the presence of the clear boundaries they were much less constrained to the center. In contrast they ranged around using more of the available space with more comfort.

pos_2

The paradox of structure is that, whatever its nature, any structure is both enabling and limiting at the same time.

This is a truism I’ve seen in practice many times. Introducing standards and procedures for project development may see initial adoption pains, but most developers are rapidly productive with better quality code when they have those structures presented to them in advance.

We’ve all attended meetings that lacked any agenda and ended up with one or two people talking a lot and little group productivity. A well run meeting with a clear focus is generally enabling to all the attendees. People can see where, when and how they can contribute and will take advantage of that comfort.

This isn’t to suggest that building walls and constraining everyone is the extreme we should rush to. Simply consider how the structures you introduce will operate when you’re defining the environment in which people will operate. Are the limits set far enough to still enable your team the freedom to be productive? Do the rules have the flexibility to adapt if those constraints are blocking? What is the ideal balance of structure to enable and align your team?

When I’m starting with a new client one of the first things I’ll ask about are their policies or procedures. I’m not excited to march in formation, I want to know where the lines are drawn so I know just how much freedom and latitude I have.

Look at the structures around you. Do they constrain your creativity or do they help to organize and focus it? Who says you can’t be creative inside the proverbial box?

Like that fence around the playground it’s just a structure.

Recognize its shape and size and you’ll see how much you can fill it.

Posted in Professional Services | Tagged , , ,

Multi-Assign Attributes – What they are and how they’re loaded

Oracle Endeca Server can support “Multi-Assign Attributes”. This means records can have one or more values assigned to the same attribute. All data domain records are stored as name value pairs, so in effect you end up with a record structure like this:

Movie_Title = True Romance
Actor_Name = Christian Slater
Actor_Name = Dennis Hopper
Actor_Name = Val Kilmer
Actor_Name = Brad Pitt

Just to name a few, that movie is well cast.

The power is that not only are all those values distinct and searchable, but since Endeca does not require the data schema to be modeled and formalized in advance one movie record to the next can dynamically have any number of additional values.

If we tried to represent this in the perspective of a traditional table it might look like so:

Movie_Title
Actor_Name_1
ActorName_2
Actor_Name_3
Actor_Name_4
True Romance
Christian Slater
Dennis Hopper
Val Kilmer
Brad Pitt
Open Water
Blanchard Ryan
Daniel Travis
NULL
NULL

Technically we’ve broken Codd’s first rule of Normalization, but that’s alright, Oracle Endeca Server is designed to support these sorts of ragged width records. The actual data domain record won’t even store NULL values for the actors that don’t exist for the Open Water film. For the record Open Water did have more than just those two their cast.

The records are actually stored in the data domain like this, note they vary in the number of columns and the attribute keys share the same name:

Movie_Title
Actor_Name
ActorName
Actor_Name
Actor_Name
True Romance
Christian Slater
Dennis Hopper
Val Kilmer
Brad Pitt
Open Water
Blanchard Ryan
Daniel Travis

The only bit of advance modeling we needed to do was updating the PDR record so the isSingleAssign property (<mdex-property_IsSingleAssign>) for Actor_Name is set to false. The default is true so we can leave it as is for Movie_Title. No other data modeling work was required.

The convenience of records being dynamically extended for multiple attribute values is awesome. This is a critical differentiator for Oracle Endeca Information Discovery, one I can’t really highlight the value enough.

 

So how do these records get ingested?

Oracle’s IntegratorETL is based on an open source ETL tool called CloverETL to which custom (Discovery) components have been added to interface with an Oracle Endeca Server. Since IntegratorETL is otherwise a “standard” ETL tool, it however still expects data to be in a traditional format. This means rows & columns since IntegratorETL doesn’t have the same native understanding of “multi assign” that OEID has.

Your data may arrive in a format that concatenates together the multi assign values to the single attribute. This is definitely the simplest structure to ingest. The value is already in an OES-friendly format.

Movie_Title, Actor_Name
True Romance, “Christian Slater, Dennis Hopper, Val Kilmer, Brad Pitt”
Open Water, “Blanchard Ryan, Daniel Travis”

In a case like this you wouldn’t need to do anything more than replace that comma with the character being used as the delimiter. For simplicity I’m just using the pipe character as my delimiter.

There are a few components that offer a place to add your transformation, the Reformat component works fine.  Simply update the mapping with the CTL2 replace function.

$out.0.Actor_Name = replace($in.0.Actor_Name, ",", "|");

Your output will look like this:

ma_03

The Discovery components such as Bulk Add/Replace Records include a property called “multi assign delimiter”, which specifies how a single attribute value will hold multiple values. Make sure this value matches your delimiter character, pipe or whatever.

ma_01

 

More commonly you’ll find yourself with a normalized data source with multiple separate records. Either in your initial source file or possibly as the result of a join during your ETL process. Like so:

Movie_Title, Actor_Name
True Romance, Christian Slater
True Romance, Dennis Hopper
True Romance, Val Kilmer
True Romance, Brad Pitt

You could pass those records directly through the Add Key Value Pair component (Add KVPs).

ma_02

The downside is you may dramatically increase the number of records passing through the data stream. The same single assign values may add some pointless overhead. You may have other reasons to want to track your record counts so that 10 records passing along the edge actually mean 10 individual records.

Whatever your reason here’s another option to transform those records into a multi-assign delimited format.

The first step is to add the FastSort or ExtSort components. Identify your key column as the sort column, in this example that would be Movie_Title.

Next add a Denormalizer component. You’ll use this component to create a single delimited value and a smaller output record set.

Once again you’ll identify your key column for Key property. Select the transform property and paste the following code:

//#CTL2

//global variables
integer n = 0;
string tmpMovie_Title = "";
string tmpActor_Name = "";

function integer append() {

//increment our record structure
n++;

tmpMovie_Title = $in.0.Movie_Title;

// assume we'll be adding multiple values
// therefore always add a delimiter
tmpActor_Name = tmpActor_Name + $in.0.Actor_Name + "|";

return n;
}

function integer transform() {

$out.0.Movie_Title = tmpMovie_Title;

// remove our trailing delimiter
$out.0.Actor_Name = left(tmpActor_Name, length(tmpActor_Name) - 1);

// reset our global variable
tmpActor_Name = "";

return OK;
}

Since this example sorted the records our output record format will look like so this time:

ma_04

These records can now be passed directly through the Bulk Add/Replace Record component. As before simply make sure the multi-assign delimiter property contains the matching character.

Your basic graph would look like this:

ma_05

And that’s all you need to do to reformat values for multi-assign attributes. Good luck!

And be sure to check out both those movies!

Posted in Endeca | Tagged , , , , , , , , , , , , | 1 Comment