An Approach to Data Obfuscation

originally published August 2012, http://www.EndecaCommunity.com

There are 4 critical questions to explore in delivering an obfuscated MDEX data graph.  RealDecoys’ depth of experience in the traditional ETL space and with developing OEID solutions has produced a set of guidelines and a tried and true process to answering these questions to best serve each particular client.

What data actually needs to be obfuscated is the first step in the process.  Not every field needs to be changed, and attempting to do so rarely justifies the cost in effort and processing time.  Identifying the specific fields, private data, or business sensitive values implies close collaboration with the client.  In some cases a single attribute may suffice to render the complete record otherwise secured.  The first guideline to obfuscating data is to be surgical in what data elements you target.

How you obfuscate the target data is the next question that needs to be evaluated with the client.  Values can be randomized, replaced, purged, or encrypted and each strategy provides different pros and cons.  These choices can directly impact performance or the ability to decrypt or deconstruct the original values.  More commonly the issue becomes how semantically real the obfuscated data needs to be.  Replacing Bill Smith with Pedro Martinez ensure downstream systems look & feel correct.  Replacing Bill Smith with adf9879Udsdf798&A is certainly obfuscated, but introduces questions downstream over possibly corrupted output or can invalidate development and testing efforts.

When the obfuscation takes place captures two elements.  Firstly whether the data can sit on a server in its original non-obfuscated state or needs to be secured right away.  If the data is being made available for testing or development it often must be transformed immediately on its arrival.  The second consideration is if the data needs to be made available by a certain time constraint.  Depending on what data and how it’s to be transformed the processing and performance may be demanding.  Randomizing or purging values takes very little time.  Encrypting many fields on millions of records can translate to a significant time requirement.

Where the obfuscation is applied is the final question, and the final step in our process.  Technical architecture aside RealDecoy has ultimately found obfuscating data can happen against either static repositories or dynamically as a component of the ETL process itself.  Static repositories implies the data can for some period of time reside in a pre-obfuscated state, although is often a more comfortable approach for clients.  Dynamically applying obfuscation as a part of the original ETL can be more complex to develop, however will generally perform faster and ensure a higher degree of security.

Sensitivity in the industry around data security is not decreasing, and as information and organizations cross borders the requirements to ensure information is responsibly managed are greater than ever.  OEID presents data in a unique manner and the tools to creating an MDEX graph introduce their own special challenges to ensure obfuscation needs are met.  The challenge is finding a development partner that has experience not only with how to effectively design obfuscation, but also has the expertise to deliver it with the OEID toolset.

Advertisements
This entry was posted in Information and tagged , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s