Unmasking the masked data
At IDUG NA 2024 in Charlotte I attended a great session “Securing Db2 for z/OS Data: Encryption - and Much More” by Robert Catterall from IBM. One of the topics that Robert discussed was Column Masks. This feature has been available in Db2 for z/OS since version 10. I asked myself again, are column masks really secure as they seem to be? A few years ago, I published a blog on how to unmask the masked data. I believe this is the right time to re-publish the simplified article on that topic.
I also asked Robert and Gaya from IBM if they could kindly write a follow up to my article, so that users are aware of the risks and know how to use column masks in a secure way. Stay tuned for their article that will follow mine.
Column masks
Column masks were introduced together with row permissions in Db2 10 for z/OS (respectively in Db2 10.1 for LUW). IBM documentation has the following definition:
“A column mask is a database object that describes a specific column access control rule for a column. In the form of an SQL CASE expression, the rule specifies the condition under which a user, group, or role can receive the masked values that are returned for a column.”
What is particularly important for this blog is the following highlighted note on the same documentation page:
“The application of column masks affects the final output only; it does not impact the operations, such as predicates and ordering, in an SQL statement.”
Read it carefully. Column masks affect the final output only! You need to keep this in mind when using the masks. Let’s start with an example of a column mask followed by an SQL that will show how the aforementioned quote can help to unmask certain types of data.
Example of using column masks
Let’s use the EMP sample table for the illustration.
When you run a SELECT * FROM EMP you'll get the list of all employees with their salaries and bonuses. Now, if you want to mask a bonus (from the column BONUS) that is higher than a certain amount (let’s say higher than 500), you can create a column mask as follows:
create mask bonus_mask on emp
for column bonus
return
case
when (bonus > 500.00) then null
else bonus
end
enable;
The case expression returns the bonus if it is equal or less than 500, or returns NULL otherwise, effectively hiding the high bonuses from the users. However, the column masks are not activated for your table, unless you do so explicitly via ALTER TABLE:
alter table emp activate column access control;
From now on, when you do the SELECT LASTNAME,BONUS FROM EMP ORDER BY BONUS, LASTNAME, you get the following output - the rows where the column mask is applied have the bonus of NULL:
LASTNAME BONUS
---------- ------
...
WONG 500.00
YAMAMOTO 500.00
YOSHIMURA 500.00
BROWN NULL
HENDERSON NULL
JOHN NULL
LUTZ NULL
MARINO NULL
...
LUCCHESI NULL
HAAS NULL
HEMMINGER NULL
Please note that you can order the output by the BONUS column, which means that even though you cannot see the value of the highest bonus, you see who has the highest bonus (although it can be more than one employee). However, we can learn even more!
Unmasking the data
As we learned above, the column masks affect the output only, not the evaluation of the data. It means that we can still use the data (bonus) and with a smart technique we can reveal the secrets. What if we pair the bonus with known values, which can be then displayed instead of the bonus? Does it sound like a join? Correct. We will use an inner join. To do so, we need a second table to be joined with the employee table, what should that be? Bonus is a number and we can easily generate a table of numbers 1 to N using a recursive table expression. If we then join such a table expression with the EMP table on the bonus column, we will clearly see the data.
The unmasking SQL is:
with hack(number) as (
select 1 from sysibm.sysdummy1
union all
select number+1 from hack where number <= 30000)
select lastname, bonus, number as unmasked_bonus from emp, hack where bonus = number order by bonus, lastname;
select lastname, bonus, number as unmasked_bonus from emp, hack where bonus = number order by bonus, lastname;
This query contains a recursive common table expression that builds a table HACK having one column NUMBER with rows that range from 1 to 30000. The HACK table is then joined with the EMP table. What's important is the join condition, which compares the BONUS column we want to reveal with a known value. The selected column then references the HACK table to get the unmasked value. The output is as follows:
LASTNAME BONUS UNMASKED_BONUS
---------- ------ --------------
...
WONG 500.00 500
YAMAMOTO 500.00 500
YOSHIMURA 500.00 500
BROWN NULL 600
HENDERSON NULL 600
JOHN NULL 600
LUTZ NULL 600
MARINO NULL 600
...
LUCCHESI NULL 900
HAAS NULL 1000
HEMMINGER NULL 1000
We can clearly see the data despite the fact the original BONUS column has been masked.
Conclusion
In real life, creating a HACK expression for a certain column type (especially large numbers or characters) would be way more challenging, but do not underestimate the power of brute force. Sometimes you don’t even need to know the exact value, but a range or a close guess is sufficient.
What should be a key takeaway from this blog is to remember that column masks affect the output only. Hence, be careful when using column masks and also make sure to read a follow up article with other hints and tips on how to use column masks securely.