The Db2 Instrumentation Facility
Earlier this year, I attended the GSE BeLux Conference in Brussels. Jan Cannaerts of Solidaris gave a really enjoyable presentation on his use of the Db2 instrumentation facility. Here is his presentation in blog form. So put the kettle on, make a cuppa, sit back and have a read...
Marcus Davage
IDUG Content Committee
_______________________________________________
Disclaimer
This project is the result of someone trying to solve a problem the fun way. Often, the fun way is both time consuming in its creation and its maintenance. Sadly, this makes it not economically viable. The fun way was adopted because I’ve not set up our vendor products correctly, and am using them wrong in general. You could solve the problem that is going to be described using off-the-shelf functionality of your vendor products. But that wouldn’t be as fun.
Problem
Our shop runs a number of non-mainframe applications that connect to Db2 through DDF. One of these applications is IBM Datapower Gateway (“Datapower” from here on out). This product exposes an arbitrary number of programable webservices to allow other applications to consume data from heterogenous sources without being dependent on executables to communicate with those sources. In the background, Datapower will connect to several other sources to fetch information relevant to the mentioned webservices it provides. One of those sources is Db2 for z/OS.
Datapower will connect to Db2 for z/OS through DDF using a number of pooled connections. Datapower will round-robin these connections to fetch information whenever one of its webservices is called.
Conventionally, you would distinguish based on CorrelationID, ApplicationName, or in a worst case, the location of the thread. Unfortunately, the Datapower webservices modules aren’t programmed to set the CorrelationID to something that would allow us to differentiate between the different services. In essence, this anonymizes the connections. This has posed problems when trying to determine the source of degradation or unavailability of individual Datapower services. The following image shows a number of threads originating from Datapower, with no way of distinguishing which thread correlates to which service.
But we are in luck. An architectural decision that might spark some controversy exists at our shop: DDF applications may only call stored procedures. The degree of how much business logic exists in the application or in the stored procedure varies from application to application. This decision was made to ensure all SQL is statically bound, but that is outside the scope. Each application, and thus also Datapower webservices, will have one “entry-point” stored procedure, which may call other stored procedures to perform certain functions.
This now allows us to filter our threads by package that was executed, in this case, the “entry-point” package of a certain service, effectively giving us the ability to isolate individual Datapower services that require data from Db2. The only problem that remains is that the textual table-based view of this data is very hard to interpret by humans. Is a service getting called more or fewer times per second than usual? Is the average consumed CPU time higher or lower, the amount of time the thread was active, wait times, getpages? A visual representation would make this easier, and while vendor products allow this, I’m using ours wrong.
A solution
A solution would consist of the following items:
- - A way to gather when a thread runs that executes a specific package, preferably in real-time.
- - A way to gather information about that execution, such as SQL counts, CPU-time, elapsed time, wait times, etc.
- - A way to persist that information.
- - A way to display that information graphically.
The last two items are relatively simple. We can just use Db2 to persist this information, and use a (publicly available) plugin for Grafana that I made a few years ago to fetch that information from Db2 and display it as nicely colored graphs.
The first two items are not as simple, but certainly doable. The good people developing Db2 for z/OS have incorporated what is called the Instrumentation Facility Component, more commonly known as the Tracing Facility into Db2, and with it the Instrumentation Facility Interface. Functionality that is often hidden behind vendor interfaces in other products, is available to mere Db2 database administrators.
Instrumentation Facility Component / Tracing facility
The Instrumentation Facility Component is a well known facility in Db2, used by many shops in many different ways. Commonly, we execute “-START TRACE…” commands to get Db2 to generate trace records or IFCIDs when certain events occur, some of these events corresponding to the problem explained in the previous paragraphs. Using this command, we can specify the destination of these traces. Most commonly we will let the tracing facility write its records to SMF, and we will use software to post process that information for use in sizing exercises, reporting, or charge-back.
You can also specify that the trace records should be written to a GTF trace, or passed to a routine with destination SRV. This last option is rarely used, and most often only at the direction of IBM when researching a problem.
One of the requirements set above was that we would gather information in real-time. Traditionally, with SMF as a destination, this is not possible. But with the advent of SMF logstreams, and in-memory SMF structures, this has become a possibility. But there is a destination that was created for this specific use case.
You can specify OPX or OPn (with N being a number from 1 to 8) as a destination. This will instruct the trace facility to write the records relevant to the trace to an in-memory buffer that resides in the Db2 MSTR address space. But how do we get it out?
Instrumentation Facility Interface
With the Instrumentation Facility Component comes the Instrumentation Facility Interface (henceforth IFI), which as the name implies allows us to interface with the former. Coincidentally, it is also the interface that is used by the different vendor monitoring solutions.
USING THE INSTRUMENTATION FACILITY INTERFACE
In order to read from the OP buffers, the trace must be started through IFI. Starting a trace with destination OPX or OPn from a console will leave the trace buffer unreachable, with the trace still costing resources. In order to use the interface, your program must meet the following requirements:
- - Be attached to Db2 (CAF, TSO, IMS, CICS, or RRSAF. DDF is not an attach facility).
- - Have the thread that is attached to Db2 have an authorization ID that has MONITOR1 or MONITOR2 authority, depending on which traces you intend to start.
After that, “all” you have to do is call DSNWLI, or DSNWLI2 if you’re using CAF and load DSNALI instead of linking it statically.
When calling DSNWLI, you must specify which function of the IFI you want to use. There are four:
- COMMAND: used to execute any Db2 command (except the -START command). We’re generally interested in the -START TRACE and -STOP TRACE commands, though you could use this is an extremely convoluted way to pass other commands to Db2 as well.
- READS: used to read trace data synchronously. Not all trace data can be read synchronously. To use this function you don’t need to start an additional trace as long as the IFCID you’re interested in is activated through another trace. The IFCIDs that you can read using READS are generally snapshot or interval statistics IFCIDs.
- READA: used to read data asynchronously. When we pass a -START TRACE command to IFI with a destination of OPX or OPn, we can use READA to read the data from the buffer that’s allocated by Db2.
- WRITE: used to insert our own information into the Db2 trace facility. I don’t have any examples of use cases for this function, but the fact that it exists must mean that some installations have some wild things happening.
When calling DSNWLI, there are a couple of recurring data structures that have to be passed:
- IFCA (Instrumentation Facility Communications Area): this is the general data structure that your application uses to communicate with the IFI. It is used to both pass information about your request, as well as receive information about how IFI processed it. This structure is defined by the DSNDIFCA macro in Db2’s MACLIB library.
- Return buffer: this more simple structure is used by Db2 to pass the resulting trace buffer contents on a READ request, or the response for a COMMAND request.
- Output area: this misleadingly named structure is used to pass the command you wish to execute on a COMMAND request.
COMMAND REQUESTS
For a COMMAND request, the call would look something like this:
DSNWLI or DSNWLI2 will take five parameters: the function to be performed, the IFCA, the return-area, the output-area, and in this case buffer-info.
The function can simply be a literal set to one of the previously discussed functions. In this case “COMMAND”.
Your command text gets passed in the output-area. The command text will be the same as if you had started the trace manually with another destination. You can select the class and IFCIDs, and filter which data should be written to the buffer by Db2. In our case, we’re filtering to only get records generated by DDF threads (through the PLAN() option, though there are other ways), that are executed with the authorization ID used by Datapower. Lastly, we also get to indicate how big IFI should allocate the OP buffer to contain our trace data. I generally don’t see an interesting reason to go too big with this buffer. Your application is going to have to process it eventually, if the trace data is being generated faster than your application can process it, no amount of buffer can prevent the inevitable loss of trace data.
The IFCA will in this case be empty on the initial COMMAND request, save for the mandatory length and eyecatcher fields.
Furthermore, because we’re sending a -START TRACE command with destination OPx, we have to tell IFI about our buffer, using the buffer-info structure. The main thing is that we pass an address to an Event Control Block (ECB) that Db2 is to (cross-memory) POST when the OPx buffer reaches the amount of bytes specified in the trigger field, after a READA request was performed. Our program can then WAIT on this ECB after that READA request, and inspect the IFCA and the application’s buffer for trace information.
The return-info structure will contain the response given by IFI when executing the command, identical to how Db2 would issue a console message when the command was passed to it through a console.
After performing the request, the program can inspect the IFCA to make sure the request processed successfully. IFCA_RC1 and IFCA_RC2 can be inspected, all return codes are of course properly documented and can be found in the knowledge base. In our case of a -START TRACE command with a destination of OPX, IFI will also inform our program of which of the eight buffers it allocated for our trace in the array_opn_names field. Generally, our programs will only allocate a single buffer, so we need only check the first item in the array. We need to know which OP buffer we were allocated, because we need to specify it on subsequent READA requests. But since we’re using the same IFCA over and over again, we can simply leave it as-is and not think about it anymore.
In conclusion, after a COMMAND request where we start a trace with destination OPX we will have:
- Two reason codes letting us know if the request succeeded, and if not, how it failed.
- The number of the OP buffer that IFI allocated for us.
- An ECB that IFI will POST whenever sufficient trace data has been gathered in its OP buffer after a subsequent READA request is made.
When you query Db2 for which traces are started using the -DISPLAY TRACE command, you should see something like this:
READA REQUESTS
In order to let IFI know that our monitoring application is ready to process trace data, we should issue a READA request. A READA request is again a DSNWLI or DSNWLI2 call, with three parameters:
We pass the function to be performed, the IFCA, and the buffer in our application’s address space which IFI should copy its OP buffer contents.
Again, the function is a literal, in this case “READA”.
The IFCA can be the reused IFCA from the COMMAND call. This way, it will already contain the OP buffer numbers in the array_opn_names array as they were set by IFI on the COMMAND request, as well as the mandatory length and eyecatcher.
Lastly, there’s the application buffer to which IFI should copy the trace data from its own OP buffer.
Because READA is asynchronous, there will be no data in your application buffer after you perform this call. In fact, the IFI will return control to your application immediately after the READA call, with the appropriate ifca_rc1 and ifca_rc2 to indicate if the call was successful or not. After this call, your program should WAIT for the ECB whose address you passed to IFI on the initial COMMAND call that started the trace.
When IFI POSTs the ECB, your program should take action. The buffer that you passed on the READA request will contain the trace data that IFI copied into it. The IFCA will only now contain relevant values in the bytes_moved and lost_records fields, respectively telling your application how many bytes of trace data were copied to your application buffer, and telling your application how many records IFI has lost.
Right now, your application can process the buffer, and unwind the IFCIDs it contains, saving relevant information to your use case. If you need help on how to unravel buffers and IFCID records, I can provide a presentation I gave in 2020 at our local GSE Db2 Belux user group where I go into the ADMIN_INFO_IFCID stored procedure and how to unwrap the contents of its return buffers.
You can now start looping this process, issuing READA requests until some external action tells your application to stop. Before your application ends, it’s best practice to first perform a COMMAND request with the appropriate -STOP TRACE command.
Here is a visual representation of the process flow:
The keen eyed among you will have already spotted an issue: what if IFI’s OP buffer is full before your application is done processing the previous application buffer? Well, IFI will wrap around on its OP buffer, and you will lose data. On the next READA request, IFI will indicate the amount of records that were overwritten, or lost, in the lost_records field. To prevent this, what you should really do is separate the READA requests, and the processing of the application buffer into different threads, and have multiple application buffers that you can rotate around. This way, the processing thread can process an application buffer that contains records copied to it by IFI, while your other thread can already issue a new READA request to IFI, but with a different buffer. While this certainly complicates things, it’s pretty much a necessity, but the implications are outside the scope of this article.
Back to our problem
Now that we know that the Instrumentation Facility Component can write IFCIDs to trace buffers when certain events occur, and now that we know we can use the Instrumentation Facility Interface to extract those IFCIDs for processing, we need to know which of these IFCIDs we can use for our particular problem. Lots of material already exists on this topic, not in the least some memorable presentations at some of the IDUG EMEA events I have attended over the years.
We are lucky yet again! Our Datapower services are also subject to another architectural decision: they may only commit once after the entry-point stored procedure returns to the caller. Usually this is done by closing the connection. This means that each and every Datapower service call that connects to Db2 will correlate to a single transaction. To trace transactions, we can activate the accounting trace class 1. This will instruct IFI to write an IFCID 3 trace record every time a transaction ends. Note however, in the case of DDF threads, in order to get a single IFCID 3 record for each DDF transaction, you must not roll up DDF accounting records, by setting the ACCUMACC subsystem parameter to NO.
Sadly, we are not completely in luck: the record is written for every transaction, so we would still have to have a way to distinguish between IFCID 3 records written different Datapower services. The -START TRACE command allows us to filter on PKGPROG(), which at first glance would allow us to just specify the “entry-point” package of the stored procedure that is first called, but alas, the PKGPROG() filter will make IFI look at the package that is active at the time of the commit that ends the transaction. In our case, this would be the NULLID package through which Datapower issues the “call procedure” statement. This doesn’t match the entry-point package, so we cannot use PKGPROG() here.
IFCID 3 – Transaction accounting
Nevertheless, IFCID 3 contains an absolute wealth of information that corresponds with our requirements: information about CPU usage, locking, logging, SQL counts, wait times, accelerator usage, buffer pool usage, and much more. All of these are immensely useful when trying to diagnose service degradation, especially if you can view this information temporally. If only there was an equivalent on the package level…
IFCID 239 – Package level accounting
And there is, IFCID 239 can be activated by activating accounting trace class 7. It even has the same layout as IFCID 3, meaning that it provides the same wealth of information. When activated, IFI will externalize an IFCID 239 record for each package that ran in a transaction, along with the IFCID 3 record for that transaction. The problem however is that each IFCID 239 record contains data only relevant to that package, not the entire transaction.
Which is a problem, because our entry-point stored procedure usually only delegates, it represents only a fraction of the CPU usage, SQL counts, and other metrics we’re interested in. If we were to only look at the entry-level stored procedure’s IFCID 239 record, we could still monitor call rates, but not much more.
Combining IFCID 3 and IFCID 239
By using both records, we should be able to receive an IFCID 3 record from IFI, and by correlating the IFCID 239 records that were written along with that IFCID 3 record, see if that transaction has executed a certain entry-point stored procedure.
When I was first researching, it seemed that IFI would always externalize all the relevant IFCID 239 records for a transaction, followed by the IFCID 3 record for that transaction:
In the image above, if we’re interested in transactions that ran package A, we could cruise through all IFCID 239 records, and if we saw one that corresponded to a package that we’re interested in, we’d wait for the first subsequent IFCID 3 record, and know that it’s for a transaction that ran the package we’re interested in. This way, we’d be able to filter the trace data in our application. Now we can use the IFCID 3 record to get the metrics for the sum of all packages that ran in that transaction.
However, this is not actually how it works. Sometimes, the IFCID 239 and IFCID 3 records are out of order:
An exaggeration is shown in the image above. But as a result, we cannot use the order in which the IFCIDs are externalized to correlate IFCID 239 to IFCID 3 records.
It took a lot of researching to eventually find nothing, so I had to reach out to our own GSE Db2 Belux IBM liaison Bart Steegmans, who managed to provide me with a (more than) complete answer in less than 12 hours, 8 of which he was supposed to be sleeping.
In short: both the IFCID 3 and IFCID 239 have a correlation header in their product sections. This correlation header contains two fields of interest: the STCK time of when IFI externalized the record to its OP buffer (QWHSSTCK), and the address of the Agent Control Element (QWHSACE) that serviced this transaction. An Agent Control Element will service many transactions throughout its lifetime, but never more than one at the same time. This makes the combination of the STCK and the ACE address unique for a single transaction, and both are present in the aforementioned correlation headers, allowing us to correlate IFCID 3 records to its IFCID 239 records.
To apply this information, what we now need is a queue structure onto which we can push the STCK and ACE address combination for each IFCID 239 record that was written for a package that we’re interested in, and for each IFCID 3 record that we process, we can search that queue for its combination of STCK and ACE address. When we find this combination, we know that we have previously processed an IFCID 239 record that we’re interested in, and that we should then be interested in this IFCID 3 record. Afterwards, we can pop this combination from the queue. If we do not find the combination on the queue, then there was no IFCID 239 record for a package that we were interested in, and we can discard this IFCID 3 record.
And that’s basically all we needed to implement a solution for our problem that only existed because I didn’t properly use our vendor software!
Additional implementation details
I’m going to close off this article with some more information about my implementation.
We don’t need to persist every IFCID 3 to Db2. This would be overly expensive in CPU resources consumed, and storage used. Furthermore, it would require additional processing before we would be able to display this information graphically. My implementation will add the different counters over a period that can be altered, but is generally 10 seconds. Before the data is persisted, it is first averaged, while the low and high watermarks are saved in the same record. This reduces the amount of data to an acceptable level, while not losing too much granularity.
For our specific problem, you don’t have to start a trace for every individual package you want to trace. Each trace will contain the same records since we can’t filter on package anyways. Furthermore, receiving the same data multiple times is wasteful. And lastly, there are only a finite amount of OP buffers that IFI can allocate, namely 8, some of which will undoubtedly already be in use for tracing accounting and audit records to SMF, and your vendor monitoring product that is tracing any number of IFCIDs to an OP buffer.
You can go even further than transaction or package tracing, and trace individual statements. To do this you enable a monitoring trace which enables IFCID 58. The trace records will contain the information needed to fetch the statement text from SYSPACKSTMT. If the statement is dynamic, you can perform an IFI READS request to fetch the statement text from the dynamic statement cache. The resulting IFCID gives you much of the information you find on the transaction or package level, but for a single statement. Doing this allows you to profile individual statements in a package to find a potential culprit while providing you with information about why it might be underperforming.
To visualize the data that is gathered by the monitoring application, I use a Grafana plugin that I developed some years ago to allow Grafana to use Db2 as a datasource. This plugin is freely available (but perhaps hard to find), though I’m always available to provide it to those that are interested. Since I’ve created this plugin, the developers at Grafana Labs have created their own Db2 datasource plugin, though it’s a paid feature. Using Grafana, I can turn our monitoring data into nice graphs, though you can use any data visualization solution that you may have at your disposal (Microsoft Excel with a Db2 datasource works just fine!).
Examples
Here is an example of the trace data displayed for the use case outlined above. The numbers are for entire transactions, but only for transactions in which a certain package (in this case SGX8834) has run:
For the next example, the use case was a single stored procedure that would provide some GDPR logging. Because it’s a single procedure that’s being called in a set of heterogenous applications, we’re not interested in the transaction statistics, but solely in the stored procedure that performs the GDPR logging. To be more specific, we were interested in the overhead it created for the other applications. To measure the impact, we first redeployed the applications while an empty version of the stored procedure was active, and then later made the swap to the live stored procedure with an ALTER PROCEDURE ACTIVATE VERSION command, while at the same time monitoring the resources that the new stored procedure consumed across all transactions. If you look closely, you can see the average elapsed time and the average CPU time per call going up in the two bottom graphs as the live procedure is activated.
Readers can find the buildable source at https://github.com/jcnnrts/db-2-datasource-v-8 (still compatible with the current version of Grafana). I can provide a built version (equally free of charge of course), but on individual request, since I prefer people knowing what they expose their Db2 for z/OS instances to.
Jan Cannaerts
Jan.Cannaerts@solidaris.be
Systeem Analist – Systeem Mainframe – Informatica Directie
Solidaris