Questioning Partitioned Table Space Access

Posted By: Craig Mullins Technical Content,

Partitioned table spaces have been with us in the world of Db2 forever, but even as sites work to move all of their table spaces to Universal table spaces, there still exists some confusion "out there" in terms of the behavior of partitioning... 

As a Db2 Gold Consultant I get questions from folks all the time, and I welcome them, even if I can’t always reply to everybody. That said, sometimes a question is asked repeatedly or sparks my desire to share the answer. This is such an occasion.

I recently received the following question (edited slightly for grammar)

If a table is in a partitioned table space, say four partitions, then can we access the table to process each partition separately. For example, can I run four instances of a batch program in parallel, one against each partition?

This type of question exhibits some confusion about how partitioned table spaces actually work. First of all, let's assume that we are talking about classic partitioned table spaces and/or Universal range-partitioned table spaces. And if you are designing this now you should be thinking Universal, as older table space types have been deprecated.

Introduced in Db2 9 for z/OS, Universal table spaces (UTS) combine the best attributes of partitioned and segmented table spaces. IBM is phasing out all of the older types of table spaces so soon all of your table spaces should be UTS. There are 2 types of Db2 Universal Table Spaces:

  1. Partition-by-growth (PBG): The PBG UTS creates new partitions as the amount of data grows without the need to specify key ranges. This type of UTS is beneficial for tables that grow over time and need the additional limits afforded by partitioning but can benefit from the performance of segmented.

  2. Partition-by-range (PBR): The range-partitioned, or PBR UTS requires a key range for partitioning like classic partitioned table spaces. A PBR UTS basically adds segmentation to the existing partitioned table space.

For the purposes of this question, only PBR UTS make sense as there is no delineation between partitions other than size for PBG UTS. OK, so now that we know the type of partitioning that we are discussing, let’s examine the further considerations.

The question asks about running four instances of a batch program in parallel, and yes, you can do this. Of course, that is not necessary to get Db2 to use parallel tasks to read the data. The best approach is to BIND the program (package) specifying the DEGREE(ANY) parameter. Then Db2 will choose the degree of parallelism for the batch program. This is simpler than any alternative approach you might choose to code up because DB2 handles it all for you.

But let’s assume you still wish to run four instances of the batch program. In that case you would probably want to BIND using DEGREE(1). In order for this to work the way I think you intend it to work, however, you will likely have to modify the program somewhat. I assume that you wish each job to process only against one of the four partitions. If you just run 4 instances of the program in parallel how does it know which partition you wish each task to process?

To accomplish this, you must provide some way for the program to identify and process only the data from one of the four partitions based on the partitioning key range specified in the partitioning index. For example, you might choose to input parameters to the batch run specifying the key range for that program to process. If it is partitioned by date, then you need to specify the date range of each partition, for each parallel execution of the program.

As long as the program adheres to that key range then it should only process data from the one partition that holds that data. If the partitioning scheme changes, then you have to change your programs (or at least the input parameters). Alternately, you can have a pseudo key for each partition (say, the partition number, 1 through 4), stored in the data that never changes. Using that for the input parameter would mean you do not have to change anything.

Of course, that means that your programmers must be diligent in applying the input parameter to every SQL statement in the program. Perhaps you could set up 4 views that apply the key range for each partition and then use only those views in each program. But that means you will have multiple versions of the same program.

This means that even though the short answer to this question is “yes you can,” the longer and more accurate answer to the question is “it depends!” Hopefully, this short article does justice to describing what it depends upon!

The Bottom Line

Generally speaking, I think I am like most IT people as I try to avoid extra work when I can. For that reason, if parallelism is important to you, just let Db2 handle it using DEGREE(ANY) when you BIND. That way Db2 will split up the processing into parallel tasks when it makes sense to do so...