Unmasking the masked data

Posted By: Emil Kotrc Technical Content,

At IDUG NA 2024 in Charlotte I attended a great session “Securing Db2 for z/OS Data: Encryption - and Much More” by Robert Catterall from IBM. One of the topics that Robert discussed was Column Masks. This feature has been available in Db2 for z/OS since version 10. I asked myself again, are column masks really secure as they seem to be? A few years ago, I published a blog on how to unmask the masked data. I believe this is the right time to re-publish the simplified article on that topic.

I also asked Robert and Gaya from IBM if they could kindly write a follow up to my article, so that users are aware of the risks and know how to use column masks in a secure way. Stay tuned for their article that will follow mine.

Column masks

Column masks were introduced together with row permissions in Db2 10 for z/OS (respectively in Db2 10.1 for LUW). IBM documentation has the following definition:

“A column mask is a database object that describes a specific column access control rule for a column. In the form of an SQL CASE expression, the rule specifies the condition under which a user, group, or role can receive the masked values that are returned for a column.”

What is particularly important for this blog is the following highlighted note on the same documentation page:

“The application of column masks affects the final output only; it does not impact the operations, such as predicates and ordering, in an SQL statement.”

Read it carefully. Column masks affect the final output only! You need to keep this in mind when using the masks. Let’s start with an example of a column mask followed by an SQL that will show how the aforementioned quote can help to unmask certain types of data.

Example of using column masks

Let’s use the EMP sample table for the illustration.

When you run a SELECT * FROM EMP you'll get the list of all employees with their salaries and bonuses. Now, if you want to mask a bonus (from the column BONUS) that is higher than a certain amount (let’s say higher than 500), you can create a column mask as follows:

create mask bonus_mask on emp

  for column bonus

  return

    case

      when (bonus > 500.00) then null

      else bonus

    end

  enable; 

The case expression returns the bonus if it is equal or less than 500, or returns NULL otherwise, effectively hiding the high bonuses from the users. However, the column masks are not activated for your table, unless you do so explicitly via ALTER TABLE:

alter table emp activate column access control;

From now on, when you do the SELECT LASTNAME,BONUS FROM EMP ORDER BY BONUS, LASTNAME, you get the following output - the rows where the column mask is applied have the bonus of NULL:

LASTNAME   BONUS

---------- ------

...

WONG       500.00

YAMAMOTO   500.00

YOSHIMURA  500.00

BROWN        NULL

HENDERSON    NULL

JOHN         NULL

LUTZ         NULL

MARINO       NULL

...

LUCCHESI     NULL

HAAS         NULL

HEMMINGER    NULL

 

Please note that you can order the output by the BONUS column, which means that even though you cannot see the value of the highest bonus, you see who has the highest bonus (although it can be more than one employee). However, we can learn even more!

Unmasking the data

As we learned above, the column masks affect the output only, not the evaluation of the data. It means that we can still use the data (bonus) and with a smart technique we can reveal the secrets. What if we pair the bonus with known values, which can be then displayed instead of the bonus? Does it sound like a join? Correct. We will use an inner join. To do so, we need a second table to be joined with the employee table, what should that be? Bonus is a number and we can easily generate a table of numbers 1 to N using a recursive table expression. If we then join such a table expression with the EMP table on the bonus column, we will clearly see the data.

The unmasking SQL is:

 

with hack(number) as (                          

  select 1 from sysibm.sysdummy1         

  union all                                 

  select number+1 from hack where number <= 30000)

select lastname, bonus, number as unmasked_bonus from emp, hack where bonus = number order by bonus, lastname;

select lastname, bonus, number as unmasked_bonus from emp, hack where bonus = number order by bonus, lastname;     

This query contains a recursive common table expression that builds a table HACK having one column NUMBER with rows that range from 1 to 30000. The HACK table is then joined with the EMP table. What's important is the join condition, which compares the BONUS column we want to reveal with a known value. The selected column then references the HACK table to get the unmasked value. The output is as follows:

LASTNAME   BONUS  UNMASKED_BONUS

---------- ------ --------------

...

WONG       500.00            500

YAMAMOTO   500.00            500

YOSHIMURA  500.00            500

BROWN        NULL            600

HENDERSON    NULL            600

JOHN         NULL            600

LUTZ         NULL            600

MARINO       NULL            600

...

LUCCHESI     NULL            900

HAAS         NULL           1000

HEMMINGER    NULL           1000

We can clearly see the data despite the fact the original BONUS column has been masked.

Conclusion

In real life, creating a HACK expression for a certain column type (especially large numbers or characters) would be way more challenging, but do not underestimate the power of brute force. Sometimes you don’t even need to know the exact value, but a range or a close guess is sufficient.

What should be a key takeaway from this blog is to remember that column masks affect the output only. Hence, be careful when using column masks and also make sure to read a follow up article with other hints and tips on how to use column masks securely.