The Ultimate Manual For Data Obfuscation

Data privacy has become one of the most significant concerns in today’s modern world. According to Privacy Rights Clearinghouse’s Chronology of Data Breaches, over 10 billion data records have been breached in more than 9000 data breaches that have been made public since 2005.

To say the frequency and magnitude of these breaches are growing at a maddening pace is putting it lightly. Luckily, we can use data obfuscation to prevent the disclosure of many of these records—even when breaches are successful.

Organizations can protect their assets by obfuscating critical data. This way, the data will be rendered useless in the event of a data breach, protecting them from getting compromised. Read on to learn how to use data obfuscation to safeguard your data’s integrity.

What Is Data Obfuscation Anyway?

Data obfuscation refers to the process of exchanging sensitive data or personally identifiable information (PII) with data that looks real to protect confidential information in non-productive databases.

In a successful obfuscation where the development, testing, and installation of the applications are done correctly, the data maintains referential integrity and original characteristics. This is why data obfuscation is primarily used in test or development environments.

While developers and testers need realistic data to build and test software, they don’t necessarily need the actual data.

How Data Obfuscation Works

Organizations must protect data from unauthorized access, particularly if they store personal information or business-critical data. This is mandatory for data protection compliance or data security reasons.

If you have sensitive data that isn’t required for processing, you can simply remove it or null it. But if you want to reserve a complete data set, it’s best to obfuscate it to preserve privacy.

You should understand how data obfuscation works for new and existing offenses, assets, rules, and log source extensions before configuring the Security Information and Event Management (SIEM) system deployment.

Existing Event Data

When a data obfuscation profile is enabled, the system masks the data for each event as it is received by the SIEM system. Events that are received by the appliance before data obfuscation is configured remain in the original unobfuscated state. The older event data is not masked, and users can see the information.

Assets

When data obfuscation is configured, the asset model accumulates masked data while the pre-existing asset model data remains unmasked.

To prevent someone from using unmasked data to trace the obfuscated information, purge the asset model data to remove the unmasked data. Your SIEM system will repopulate the asset database with obfuscated values.

Offenses

To ensure that offenses do not display previously unmasked data, you have to close all existing offenses by resetting the SIM model.

Rules

You must update rules that depend on previously unmasked data. For example, rules based on a specific user name do not fire when the user name is obfuscated.

Log Source Extensions

Log source extensions that change the format of the event payload can cause issues with data obfuscation.

Now that we discussed its working, let’s discuss a few instances where obfuscating data is useful:

  • Testing. Accurate testing is only possible with production data. Data obfuscation can produce a database similar to the actual data but without any sensitive information, allowing you to test software without worrying about data security.
  • Data Exports. The contents of a data file can become vulnerable when moved from one system to another through a manual export-import process. Obfuscation can help hide critical data, making it unreadable in case the file gets intercepted.
  • Secure Transactions. Two systems can carry out a transaction without exposing data, such as an e-commerce server connecting to a secure payment system or similar situations. In such cases, data obfuscation allows this transaction to take place safely without revealing any sensitive data like credit card numbers.

Understanding the Different Data Obfuscation Methods

If you ask ten people the definition of data obfuscation, you’ll get ten different answers. This is because there are various ways to obfuscate data, each designed for specific goals and purposes.

To keep this guide brief, we’ll discuss three of the most common data obfuscation techniques commonly used: encryption, tokenization, and data masking (also known as data anonymization).

Method #1: Data Encryption

Data Encryption involves using an encryption algorithm that can only be unlocked by someone who has the decryption key. Encrypted data is unreadable while in transit and will often appear as a string of alphanumeric numbers that won’t make any sense to unauthorized eyes.

This data obfuscation process allows sensitive data to travel alongside other data with a lower security risk. So even if a data export contains encrypted tables, no one can access it during transit until it reaches the destination.

Once it arrives, the recipient can use the decryption key to restore the original data values.

Method #2: Data Tokenization

In data tokenization, each data value is linked to a random code or token. This token or code doesn’t have any value in itself, but it can be used to carry out a lookup when it’s passed back to the original system.

Suppose a database contains a list of credit card numbers. Every credit card will be linked to random tokens in a lookup table. A secure payment API can then use the token when interacting with other systems.

This way, you’ll never have to worry about the credit card number getting exposed, which will protect it from getting compromised.

Method #3: Data Anonymization

This technique is used for producing secure, usable test data. Data anonymization can be further categorized into three different methods, such as:

  • Substitution. Dummy values replace real data values, which are generated randomly or taken from a lookup table. For example, you can substitute an actual credit card number with a fake credit card number obtained from a list of non-active credit cards.
  • Randomization. Data values are shuffled before being shared. You can do this by either anagramming data or randomly shuffling columns so that every row contains inaccurate data values.
  • Ranged Substitutions. Even though this method uses dummy values, these values fall within the range of the actual data values. For example, you’ll have the highest value and the lowest value for every list of numbers. The dummy values will be generated randomly, but they’ll fall within the limits of this range.

Anonymized data or masked data looks just like real data. It’s what makes them the most appropriate option for testing software. But despite the similarities, the data doesn’t have any identifiable information.

Ideally, you shouldn’t be able to reverse the anonymization process to get the original data.

Other Data Obfuscation Methods

There are several other techniques you can use to obfuscate data in non-production environments. Let’s take a look at a few of them.

  • Nulling. You replace original values with a symbol that represents a null character. For example, you can use #########-####-1996 for a credit card number.
  • Non-deterministic Randomization. The real data value is replaced with another random value within specific constraints to the valid value. For instance, the new value of a credit card expiration date will be a valid month in the upcoming five years.
  • Blurring. Modifying a number while remaining in the general vicinity of the original number. For instance, changing the number of funds in a bank account to a random value within 10% of the original amount.
  • Repeatable Masking. Replacing a value with another random value, but in a way that maintains referential integrity. You must make sure the original values are always mapped to the same replacement values.
  • Shuffling. Modifying the order of digits in a number or code that lacks semantic meaning. For example, you can change a phone number from 815-7713 to 617-7187.

How to Implement an Effective Data Obfuscation Plan

You need to adopt a holistic approach to planning, data management, and execution before implementing a successful data obfuscation strategy. Here’s a breakdown of fundamental aspects to set you up for success.

Step 1: Identifying Sensitive or Critical Data

The first step of data obfuscation is to determine what data must be protected. Moreover, every company has specific security requirements, internal policies, data complexity, and compliance requirements, meaning every organization will have a unique set of rules applying to it.

Your job here is to identify the different data classes, identify the risk of data breaches for each class, and determine the extent to which data obfuscation can help reduce the risk.

Step 2: Test How the Different Obfuscation Types Impact Application

In step one, you may classify data based on functional classes, business classes, or classes mandated by a compliance standard. A typical classification, however, is categorized under public, sensitive, and classified data.

For classes that need to be protected by obfuscation, you should carefully test how the different types of obfuscation techniques will impact the application. Remember, your business should be able to function normally despite the continuous obfuscation of the data.

Step 3: Perform Data Obfuscation in Practice

In this step, you’ll build a solution to perform a data obfuscation in practice and then configure it according to the previously defined data classes and architecture. Your course of action should include the following:

  1. Integrate the data obfuscation component with existing data stores and applications.
  2. Prepare data sets and storage infrastructure to store obfuscated versions of the data safely.
  3. Kickstart the change management process.
  4. Define obfuscation rules for the different types of data.

Step 4: Testing and Deploying Data and Applications

After building the system, the next step on your agenda should be to carefully perform tests on all relevant data and applications. This is a precautionary step to ensure obfuscation is secure and doesn’t hinder business operations.

Testing involves creating one or more test datastores and trying to obfuscate at least one part of the production dataset. Once you’re nearing the deployment stage, you should perform user acceptance testing (UAT) and clearly define organizational roles that will be responsible for the entire obfuscation process. Creating scripts to automate obfuscation to make it a part of routine business processes is also crucial.

Incredible companies use Nira

Every company that uses Google Workspace should be using Nira.
Bryan Wise
Bryan Wise,
Former VP of IT at GitLab

Incredible companies use Nira