IBM's Power10 Private Cloud Rack for Db2 – Chapter 2: Db2 12.1 Performance Deep Dive

Posted By: Peter Kokosielis Technical Content,

By Peter Kokosielis and Jana Wong

Db2 12.1 Data Warehouse on IBM’s Power 10 Private Cloud Rack (P10 PCR) is a new reference architecture solution to meet the most challenging heavy analytics and operational analytics workloads. This solution is an IBM pre-validated reference architecture tested with some of the heaviest data volumes we could create so that we could bring it to market with confidence.

Let’s first touch a bit about its lineage and purpose. Much of the reference architecture came inspired from the successful IIAS product (IBM Integrated Analytics Solution), an appliance based on IBM Power 8 two socket servers, IBM Flashsystem 900 technology and IBM Db2 11.5. As such, this reference architecture is aimed as our premier landing spot for customers looking to upgrade their IIAS based data warehouses.  The new reference architecture now makes use of IBM Power 10 two socket servers, and IBM Storage FlashSystem 7300 storage and of course the latest generation of Db2 12.1.

This solution is also ideal for any new on-premise data warehouses and not just for upgrades of IIAS workloads.  Not only do the newer generation IBM Power and storage provide a big improvement, but so do many new Db2 features found in 12.1 such as compact block storage and the AI Optimizer that help individual queries run at their best.

In this blog we will share our performance comparison findings and results using various test operations and concurrent workloads to compare between IIAS and Db2 on P10 PCR. 

Important Disclaimer: All measurements and observations in this blog were done in an isolated environment with skilled experts. Your experience on similar performance testing may vary.

 

 

Tale of the Tape

The comparison will be between a full rack IIAS and the Db2 Warehouse on IBM’s P10 PCR BRL (full rack).

A table with data and informationDescription automatically generated with medium confidence

Table 1: System & Hardware Configuration of an IIAS/Sailfish Full Rack and IBM's P10 PCR BRL

 

Performance Test Set #1:  Serial Query Performance

After creating a 100TB database similar to TPC-DS we will run a query set of 100 custom queries back to back.  The set of queries have about 5 that are considered heavy analytics (fact to fact joins and aggregations), 25 that would be considered Intermediate complexity (fac tables joined with dimension tables and aggregation), and 70 that would be considered simple complexity (fact table lookup, or dimension table join). 

The first time it is considered a ‘warmup’ run as the Db2 bufferpool is cold and more IO is expected.  The next 3 loops of the 100 are done immediately after and the total time is taken to complete the 3 loops (and 300 total queries).  As this measures query times, the lower the time, the better!

Figure 1: BDI 100TB Warmup & Serial Run Performance


From the serial performance perspective, we observed 2 to 2.54x improvement on Db2 12.1 on IBM’s P10 PCR vs the baseline IIAS.  This is great news as the serial query test won’t make the most utilization of the increased capacity of the P10 PCR architecture. We will explore that next in the concurrent tests.

 

Performance Test Set #2: Query Concurrency

As mentioned earlier, the test suite comprises a custom set of 100 queries, categorized as follows: 5 complex, 25 intermediate, and 70 simple. These queries are used to define different user concurrency models:

  • Heavy User: Simulates concurrent execution using both complex (5) and intermediate (25) queries.
  • Intermediate User: Focuses solely on the 25 intermediate queries.
  • Simple User: Utilizes only the 70 simple queries.
  • Complex User: Executes only the 5 complex queries.

These user models are incorporated into various test scenarios to evaluate different aspects of system performance and component behavior:

  • 16 Multi-User Run: Assesses the system's ability to handle moderate concurrency using a mix of complex and intermediate queries.
  • 32 Multi-User Run: Doubles the concurrency level to test system scalability and performance under higher load.
  • 100 Simple-User Run: A CPU-intensive test using only simple queries, where most data fits into the buffer pool—ideal for evaluating CPU throughput.
  • 20 Intermediate-User Run: Designed to stress disk I/O using only intermediate queries.
  • 5 Complex-User Run: A disk- and network heavy test focusing exclusively on complex queries to measure disk I/O performance.

 

Here we construct these scenarios based on the same 100TB database.

For purposes of fairness in evaluating the Queries Per Hour (QpH) Throughput score, we ensure all concurrent users are of the same type in a run.  Though this isn’t as realistic as most end user workloads, it is done out of fairness in the evaluation to not give unfair weight to simpler queries and starve heavier ones (or vice-versa) and to make comparison easier.  In each scenario, all users must complete the same amount of work in a shuffled order.

 

Figure 2: BDI 100TB 16- & 32 Heavy user Run Performance


In this first chart, we run the Heavy concurrent user workload at both 16 concurrent users and 32 concurrent users.  Each user is running the top 30 heaviest queries of the whole set. Db2 12.1 on IBM’s P10 PCR scored 3.72 to 4.06x improvement factor over the previous generation IIAS in this test. 

Figure 3: BDI 100TB 100-Simple user Run Performance

        

This second concurrent scenario has 100 users each running a set of 70 simple queries concurrently. Db2 on IBM’s P10 PCR shows a factor 3x performance improvement over the previous generation IIAS on this test.

Figure 4: BDI 100TB 20-intermediate user Run Performance


This next concurrency scenario has 20 clients running the 25 intermediate queries each and in this case Db2 on IBM’s P10 PCR achieves a 7.78x performance improvement over the previous generation IIAS! The improvement margin is much higher than the simple user run and is attributed to not only the increased core count but also the bigger available Db2 memory working set which allows more operations to be run in memory and avoiding use of partial result temp tables at this high scale factor which impacts the IIAS much more.

The last query concurrency test features 5 clients each running the 5 most complex queries in the workload.  In this case Db2 on IBM’s P10 PCR shows a performance improvement of factor 3x over the previous generation IIAS! Temp spilling can’t be avoided at this scale factor, but it is partially reduced, and the storage is faster on the P10 PCR which aids Db2 in faring much better than the predecessor solution.

 

Figure 5: BDI 100TB: 5-complex user Run Performance


The NMON system utilization graphs displayed in the image below illustrate how the five BDI concurrency user tests impose varying levels of stress on different system resources. These graphs provide insights into CPU, disk I/O, and network usage patterns, illustrating which components are most impacted under a specific concurrent workload test scenario.

Figure 6: P10 PCR System Utilization during BDI 100TB Concurrency Runs

 

Performance Test Set #3: Insert, Update, Delete, and Miscellaneous

 

Figure 7: BDI 100TB IUD Performance


This next set of tests target write-heavy operations like insert, update, and delete operations that require write IOs to storage. Insert sub-select (2 variations, one with predicate, one without) of approximately 10.7B rows is improved by ~58% to ~76%.   The update operation is improved by over 44% and the delete operation is improved by over 13%. Finally, the truncate table operation is improved by over 90%.  Various factors at play here including the superior storage and the new default Db2 partition layout (more logical nodes per server) help these operations over the predecessor solution.

One other notable test result is the “Unload Table Partition ALL” test.  This is where every partition dumps out its data into an external table flat file simultanously. Db2 on IBM’s P10 PCR finished this operation on 100TB in about 25 minutes, while on IIAS it took 1hr 37m. This represents an almost 4x speedup!



Sizing and Configurations

The Db2 Warehouse on IBM’s P10 PCR solution is offered in various sizes including multiple rack configurations to fit your IIAS replacement needs or your new modern data warehouse needs.

The performance comparisons earlier in this paper showed the characteristics between a full rack IIAS and the full rack (BRL) Db2 Warehouse on IBM’s P10 PCR. As noted in the charts the performance improvements for the Db2 Warehouse on IBM's P10 PCR solution seen were anywhere from 13% (slightly better) to almost 8x (a lot better) than the IIAS with most operations observed being between 2 to 3x factor. 

Certainly the performance observed in our lab can help in sizing considerations, but performance characteristics are not linear when it comes to data warehouse processing.  Many factors such as relative storage performance, amount of memory for data buffer, optimizer access plans and engine working set can play outsized roles in determining the performance.  Great care should be taken in sizing, especially if a smaller form factor Db2 Warehouse on IBM’s P10 PCR is considered from a bigger form factor IIAS.

The basic form factor equivalency:

IIAS Form Factors

Db2 Warehouse on P10CR Form Factors

M4002-003

Base Rack Small (BRS)

M4002-006

Base Rack Medium (BRM)

M4002-010

Base Rack Large (BRL)

M4002-020

Base Rack Large + Expansion Rack Large (BRL+ERL)

 

Keeping with the form factor equivalency will give you the confidence that your workload will improve.  Going with a smaller form factor than before may also see performance being maintained or even improved, but this should be validated with your own validation testing and in consultation with your IBM representative.


Scaling of Bulk Ingest Operations

The first comparison will be a build of a ~100TB database from raw csv files split and pre-partitioned by the number of expected Db2 logical nodes in the configuration. 

 

Operation

Base Rack Large (BRL)

Base Rack Large +

Expansion Rack Large (BRL+ERL)

Improvement

Factor

External Table Load Partition ALL 100TB
(from raw CSV data files)

1hr, 32m, 21s
(65 TB/hr)

0hr, 34m, 43s
(173 TB/hr)

2.66x

Runstats

(statistic collection)

1hr, 13m, 46s

0hr, 29m, 12s

2.53x


For both the pre-partitioned load and subsequent statistics collection improved by a factor over 2.5x going from a base rack large (BRL) configuration to a base rack large plus expansion rack large (BRL+ERL).  These are linear operations and are in line with the compute resource increase and demonstrate almost perfect scaling.


Once the 100TB database was built, a similar set of test operations were run just like in the IIAS vs Db2 Warehouse on IBM’s P10 PCR (Base Rack Large).

Figure 8: BDI 100TB Scalability Performance BRL to BRL+ERL

All operations observed improved performance with most seeing between 2 to 3x improvement. With these set of tests at these volumes we are confident that the scaling capabilities of the Db2 Warehouse on IBM’s P10 PCR solution will meet the needs of the most demanding workloads.

          

Conclusion and Final Thoughts

The IBM Db2 12.1 Warehouse on IBM’s P10 PCR solution is the latest offering available to meet the most demanding data warehouse workloads. It is a reference architecture built upon the trusted and foundational pieces of IBM Db2 12.1 and the IBM Power 10 processor and Flash System storage that is prescribed and validated by IBM.  This solution reference architecture represents a significant upgrade over the previous IBM Integrated Analytics System (IIAS) and PDOA appliances and will have a familiar feel. Designed for heavy analytics and operational analytics workloads, it also has built-in high availability, and variety of monitoring and disaster recovery (DR) options.   

It has been tested extensively for performance and integration both as a comparison to IIAS and in scaling from BRL to BRL+ERL with improvements in operations measured from 13% to almost 8x with excellent scaling.

Welcome to the next generation of Data Warehousing!


 

About the Authors

Peter Kokosielis is the manager of Db2 Performance Quality Assurance, Db2 Warehouse on P10CR QA and Big Data and Data Virtualization QA. He has extensive experience at IBM in Db2 LUW database performance both in OLTP and Data Warehouse settings along with deep experience in platform exploitation on Power and Intel based processing architectures, hardware accelerators, virtualization and operating systems.

Jana Wong is the principal performance focal for Data Warehouse on-premise solutions at the IBM Silicon Valley Lab, with over 15 years of experience in Databases, SQL, QA, and Project Management. She holds a Master’s in Computer Science from the University of Rostock. Recently, she led the development and automation of a benchmark kit for validating IBM's Power10 Private Cloud Rack and played a key role in evaluating the performance of reference architectures such as IIAS/Sailfish and P10 PCR. Jana can be reached at jfitzge@us.ibm.com.