Modern SQL tips and why to use them
The following is a collection of SQL tips and reminders about modern SQL that some people may have forgotten. These SQL tips are reminders that are not magic but some people may have forgotten or not considered the implication or minor benefit. Hence this article!
If you have additional modern SQL or tips then please add a “comment” below or email me!
-
Limit SELECT result set size with LIMIT
-
SQL Pagination - avoids programmatically dealing with result sets
-
GROUP BY GROUPING SETS to produce result sets that look like a simple report
-
COALESCE can turn ugly NULL into a pretty displayed result
-
Match on COLUMN sets -> prettier? Easier to read? (subjective but I say yes)
-
Simplify WHERE clause with COLUMN SETS -> start in the middle of the result set
-
When we have multiple tables, we can now easily select all COLS from ONE table
-
COMMON TABLE EXPRESSION (CTE) can change big ugly SQL to be easier to read
-
Set operators of EXCEPT and INTERSECT are forgotten but can be useful
-
MERGE - powerful DML!
-
SELECT from final table
Full disclosure: This article is based upon my 2023 presentation that you can see at IDUG NA and IDUG EMEA. Come to Prague! The presentation is “DB2 SQL go beyond the usual My current TOP 40 SQL tips tricks n opinions”
(1) Limit SELECT result set size with LIMIT
Sometimes we explicitly limit our SELECT result set with the final SQL clause of “FETCH FIRST 999 ROWS ONLY”.
> This clearly a good idea if there is a risk of an overly large result set being sent to the program. Especially if the program expecting the result is on your laptop and you don’t accidentaly need a gazillion rows sent there!
But typing “FETCH FIRST 999 ROWS ONLY” is so many tedious characters to type.
We can now use “LIMIT 999”. So much easier. Same result.
-- LIMIT RESULT SET TO 10 ROWS
-- USING FETCH FIRST NN ROWS ONLY
-- > SO MANY ANNOYING CHARACTERS TO TYPE
SELECT *
FROM EMPLOYEE
ORDER BY LASTNAME, FIRSTNME
FETCH FIRST 10 ROWS ONLY
;
-- LIMIT RESULT USING LIMIT KEYWORD
-- > SO EASY TO TYPE. SO EASY TO UNDERSTAND
SELECT *
FROM EMPLOYEE
ORDER BY LASTNAME, FIRSTNME
LIMIT 10
;
If you read the IBM reference documentation (you should always double-check the proper IBM reference documentation and not trust me), the documentation shows all the acceptable variations of syntax.
(2) SQL Pagination - avoids programmaticly dealing with result sets
If you have the requirement to look at the middle of a result set then SQL can easily place you in the middle. This is so much easier than having your program do it.
I like to use LIMIT with OFFSET. I find it more obvious when reading and reviewing. But it can be done with LIMIT alone and a comma (for those who really like to minimize typing!)
-- USING LIMIT KEYWORD ALONE TO SKIP
-- FIRST 10 ROWS IN RESULT, THEN LIMIT TO
-- THE NEXT 10 ROWS
SELECT *
FROM EMPLOYEE
ORDER BY LASTNAME, FIRSTNME
LIMIT 10, 10
;
-- USING LIMIT WITH OFFSET TO CLEARLY
-- SKIP 10 ROWS AND ONLY FETCH THE NEXT 10 ROWS
SELECT *
FROM EMPLOYEE
ORDER BY LASTNAME, FIRSTNME
LIMIT 10 OFFSET 10
;
Db2 LUW
https://www.ibm.com/docs/en/db2/11.5?topic=subselect-fetch-clause
https://www.ibm.com/docs/en/db2/11.5?topic=subselect-offset-clause
Db2 Z
https://www.ibm.com/docs/en/db2-for-zos/12?topic=subselect-fetch-clause
https://www.ibm.com/docs/en/db2-for-zos/12?topic=subselect-offset-clause
(3) GROUP BY GROUPING SETS to produce result sets that look like a simple report
We probably all use GROUP BY clause to have the SQL summarize table data for us. For example, to sum values or to find a max or min.
But we can also use GROUP BY GROUPING SETS to summarize the same raw data into other levels. The GROUP BY finds the values that are of interest to you but with GROUPING SETS you can also find the overall value for all interesting combinations of columns. It all shows up in the result set. This can save the hassle of writing a separate query to find the overall total or (even worse) to import into EXCEL to find the overall total.
And yes, I know advanced users might use GROUP BY ROLLUP or GROUP BY CUBE.
But to be honest, I prefer to use GROUP BY GROUPING SETS and just specify the column combination that I really really want.
-- PRODUCE SIMPLE SQL RESULT SET REPORT
-- TO SHOW TOTAL SALARY BY DEPT
-- AND USING GROUP SETS () TO SHOW FOR WHOLE COMPANY
SELECT WORKDEPT, SUM(SALARY) AS TOT_SALARY
FROM EMPLOYEE
GROUP BY GROUPING SETS ((WORKDEPT),())
ORDER BY WORKDEPT
;
Db2 LUW
https://www.ibm.com/docs/en/db2/11.5?topic=subselect-group-by-clause
Db2 Z
https://www.ibm.com/docs/en/db2-for-zos/12?topic=subselect-group-by-clause
Db2 12 - Db2 SQL - Examples of grouping sets, rollup, and cube queries
(4) COALESCE can turn ugly NULL into a pretty displayed result
I often write SQL whose results I want to quickly email or share with something. But depending upon the tool to display the result, the “null” can be displayed in a non-intuitive manner. So depending upon the needs of how I want the result to look, I often use COALESCE to turn NULL into a pretty textual output display
For example, in point (3) above, the following SQL produces the following result
SELECT WORKDEPT, SUM(SALARY) AS TOT_SALARY
FROM EMPLOYEE
GROUP BY GROUPING SETS ((WORKDEPT),())
ORDER BY WORKDEPT
;
But the COALESCE can make the null pretty:
-- PRODUCE SIMPLE SQL RESULT SET REPORT
-- TO SHOW TOTAL SALARY BY DEPT
-- AND USING GROUP SETS () TO SHOW FOR WHOLE COMPANY
-- > USE COALESCE TO MAKE OUTPUT TEXT PRETTY TO READ
SELECT COALESCE(WORKDEPT,'-all-') AS WRKDETP
, SUM(SALARY) AS TOT_SALARY
FROM EMPLOYEE
GROUP BY GROUPING SETS ((WORKDEPT),())
ORDER BY WORKDEPT
;
Minor digression.
The following SQL works in Db2 Z and puts the ‘-all-’ row at the top. I like that. But Db2 LUW fails with SQLCODE -206
Odd.
Must be something about the ORDER BY. I will have to think about this later.
-- SAME AS THE PREVIOUS QUERY, BUT ORDER BY DERIVED COL NAME IN RESULT
SELECT COALESCE(WORKDEPT,'-all-') AS WRKDETP
, SUM(SALARY) AS TOT_SALARY
FROM EMPLOYEE
GROUP BY GROUPING SETS ((WORKDEPT),())
ORDER BY WRKDEPT
;
WRKDETP|TOT_SALARY|
-------+----------+
-all- |1152525.00|
A00 | 204250.00|
B01 | 41250.00|
C01 | 118890.00|
D11 | 276620.00|
D21 | 179680.00|
E01 | 40175.00|
E11 | 147140.00|
E21 | 144520.00|
Db2 LUW
https://www.ibm.com/docs/en/db2/11.5?topic=functions-coalesce
Db2 Z
Another useful example of GROUPING SETS and COALESCE in Db2 Z.
In Db2 Z, I often query the super-useful real-time-statistics tables to find quick and easy meta-info about my Db2 and databases.
The following query shows the total “SPACE” (allocated space in KB) of my tablespaces in DBNAME, plus the overall total for the whole Db2. Very interesting.
SELECT COALESCE(DBNAME,'-ALL') AS DBNAME
, SUM(SPACE) AS SPACE
FROM SYSIBM.SYSTABLESPACESTATS
WHERE 1=1
GROUP BY GROUPING SETS ((DBNAME),())
ORDER BY SPACE DESC
;
DBNAME SPACE
-------- ----------
-ALL 2147037664
DIL01P 535818992
DYR01P 527364512
DIL04P 177562720
DSNDB07 166626432
DCL01P 131756736
DYB01P 122052960
I must comment more about real-time-statistics (do not confuse with static statistics).
The RTS is super useful and super accurate.
As always, read the IBM reference documentation to confirm the catalog table column definitions (as one should always re-read documentation)
https://www.ibm.com/docs/en/db2-for-zos/12?topic=tables-systablespacestats
SPACE is allocated space in KB. It is perfectly accurate (almost all the time - the exceptions are when you do dataset movement outside of Db2 - like DSN1COPY).
And of course, we know Db2 allocates 48K per track. So divide SPACE by 48 to see allocated tracks. The “track” amount will match what you see in ISPF DSLIST panel. (I always look at my Db2 allocated datasets in DSLIST - just to confirm stuff). I am happy when DSLIST and the RTS line-up exactly.
DATASIZE is the size of data inside the allocated space. But in bytes! Not in KB
> Obviously, between SPACE and DATASIZE you can figure how full is your tablespace or dbname and if it is over-allocated
SELECT COALESCE(DBNAME,'-ALL') AS DBNAME
, SUM(SPACE)/1024 AS SPACE_MB
, SUM(SPACE)/48 AS ALLOCATED_TRACKS
, SUM(DATASIZE)/1024/1024 AS DATASIZE_MB
, SUM(TOTALROWS) AS TOTALROWS
FROM SYSIBM.SYSTABLESPACESTATS
WHERE 1=1
GROUP BY GROUPING SETS ((DBNAME),())
ORDER BY SPACE DESC
;
(5) Match on COLUMN sets -> prettier? Easier to read? (subjective but I say yes)
When matching rows between tables, there might be many columns that need to line up. Traditionally, we compare each column directly with the opposite column in the other table. We join all the columns with AND. This is fine and functional. But it gets tedious when there are many columns that need to be lined up.
Modern SQL allows you to join on SETS of columns grouped within brackets.
I find this easier to read and logical.
The benefit becomes even more useful the more columns you need to join up.
Below is a simple Db2 LUW example against some SYSCAT tables.
But imagine all the times you write SQL that does a join. COLUMN sets will often make the SQL easier to read.
-- TRADITIONAL WAY TO JOIN COLUMNS BETWEEN 2 TABLES
SELECT I.TABSCHEMA , I.TABNAME, I.INDSCHEMA
, I.INDNAME, I.INDEXTYPE
, c.COLSEQ , c.COLORDER , c.COLNAME
FROM SYSCAT.INDEXES I
INNER JOIN SYSCAT.INDEXCOLUSE C
ON I.INDSCHEMA = C.INDSCHEMA
AND I.INDNAME = C.INDNAME
WHERE 1=1
-- AND I.TABSCHEMA = CURRENT SCHEMA
-- AND I.TABSCHEMA = 'DPB74213'
;
-- MODERN SQL - JOIN 2 TABLES USING COLUMN SETS
SELECT I.TABSCHEMA , I.TABNAME, I.INDSCHEMA
, I.INDNAME, I.INDEXTYPE
, c.COLSEQ , c.COLORDER , c.COLNAME
FROM SYSCAT.INDEXES I
INNER JOIN SYSCAT.INDEXCOLUSE C
ON (I.INDSCHEMA, I.INDNAME) = (C.INDSCHEMA, C.INDNAME)
WHERE 1=1
-- AND I.TABSCHEMA = CURRENT SCHEMA
-- AND I.TABSCHEMA = 'DPB74213'
;
(6) Simplify WHERE clause with COLUMN SETS -> start in the middle of the result set
If you want your WHERE to be clever and begin after a complex WHERE of many columns whose values are relative to each other then you classically wrote some wonky and slightly hard-to-read SQL.
For example, you know a person LASTNAME and FIRSTNME and MIDINIT. And you want the result to begin after that known row.
So let us say our classic IBM-supplied sample table of EMP, it has a row for employee DANEIL S SMITH. (below is a partial view of the contents of EMP)
LASTNAME |FIRSTNME |MIDINIT|WORKDEPT|SALARY |
----------+---------+-------+--------+--------+
ADAMSON |BRUCE | |D11 |25280.00|
ALONZO |ROY |R |E21 |23840.00|
...
SETRIGHT |MAUDE |F |E11 |15900.00|
SMITH |DANIEL |S |D21 |19180.00|
SMITH |PHILIP |X |E11 |17750.00|
SPENSER |THEODORE |Q |E21 |26150.00|
...
-- classic/traditioanl/convulated SQL
-- to get next EMPLOYEE after DANIEL S SMITH
SELECT LASTNAME, FIRSTNME, MIDINIT, WORKDEPT, SALARY
FROM EMPLOYEE
WHERE (LASTNAME = 'SMITH' AND FIRSTNME = 'DANIEL' AND MIDINIT > 'S')
OR (LASTNAME = 'SMITH' AND FIRSTNME > 'DANIEL')
OR LASTNAME > 'SMITH'
ORDER BY LASTNAME, FIRSTNME, MIDINIT
;
-- MODERN SQL TO MATCH ON SETS OF COLUMNS
-- let the datbase engine figure it out and do the work
SELECT LASTNAME, FIRSTNME, MIDINIT, WORKDEPT, SALARY
FROM EMPLOYEE
WHERE (LASTNAME, FIRSTNME, MIDINIT) > ('SMITH','DANIEL','S')
ORDER BY LASTNAME, FIRSTNME, MIDINIT
;
Both of the above SQL produce the same result! But one is easier to read than another.
But the handy thing I like about the matching on column sets is that it is easy to read and logical. We let the database engine figure it out and it does it right. The convoluted traditional old-fashioned SQL had a risk of error (syntax or logical).
LASTNAME |FIRSTNME |MIDINIT|WORKDEPT|SALARY |
---------+---------+-------+--------+--------+
SMITH |PHILIP |X |E11 |17750.00|
SPENSER |THEODORE |Q |E21 |26150.00|
SPRINGER |MICHELLE |F |E11 |15900.00|
STERN |IRVING |F |D11 |32250.00|
THOMPSON |MICHAEL |L |B01 |41250.00|
WALKER |JAMES |H |D11 |20450.00|
WONG |HELENA | |E21 |25370.00|
YAMAMOTO |KIYOSHI | |D11 |24680.00|
YOSHIMURA|MASATOSHI|J |D11 |24680.00|
(7) When we have multiple tables, we can now easily select all COLS from ONE table
I did not realize this convenience until recently. Hence I am sharing with you today.
When developing SQL, I often begin with the simple SELECT * and just get all the columns from all the tables. Later, I begin to code specific columns in the SELECT list.
However, we can select specific columns from one table and easily select all columns from another table.
I find this helps especially when I add tables to my join. I can select all the columns from the newly added table and then quickly look at the result and check what columns I actually want to include.
In this Db2 LUW example, I select specific columns from SYSCAT.INDEXES and all columns from SYSCAT.INDEXCOLUSE
-- MODERN SQL - JOIN 2 TABLES USING COLUMN SETS
-- AND WITH SPECIFC COLS FROM TABLE 1 AND ALL COLS FROM TABLE 2
SELECT I.TABSCHEMA , I.TABNAME, I.INDSCHEMA , C.*
FROM SYSCAT.INDEXES I
INNER JOIN SYSCAT.INDEXCOLUSE C
ON (I.INDSCHEMA, I.INDNAME) = (C.INDSCHEMA, C.INDNAME)
WHERE 1=1
-- AND I.TABSCHEMA = CURRENT SCHEMA
-- AND I.TABSCHEMA = 'DPB74213'
;
(8) COMMON TABLE EXPRESSION (CTE) can change big ugly SQL to be easier to read
I use common table expressions (CTE) all the time. I find it is a way to write complex code or pass through the data multiple times. And then the end result is perhaps a SQL with CTE which is long and many lines. But the CTE is very easy to read.
Try it. Basically, I think CTE are great.
For example, below is a subjectively “complex” SQL that basically queries the two Db2 Z real-time-statistics tables to produce a summary of space usage by database.
Overall, the SQL might be initially scary to look at. But upon review, the CTE breaks it up into individual easy-to-understand chunks.
WITH CTE_DB AS (
SELECT NAME AS DBNAME
FROM SYSIBM.SYSDATABASE
WHERE 1=1
AND NAME NOT LIKE 'DSN%' -- IGNORE
AND NAME NOT LIKE 'PTDB%' -- IGNORE
)
, CTE_RTS_TS AS (
SELECT DBNAME, 'TS' AS OBJ_TYPE
, SUM(SPACE) AS SPACE
FROM SYSIBM.SYSTABLESPACESTATS
WHERE 1=1
AND DBNAME IN (SELECT DBNAME FROM CTE_DB)
GROUP BY DBNAME
)
, CTE_RTS_IX AS (
SELECT DBNAME, 'IX' AS OBJ_TYPE
, SUM(SPACE) AS SPACE
FROM SYSIBM.SYSINDEXSPACESTATS
WHERE 1=1
AND DBNAME IN (SELECT DBNAME FROM CTE_DB)
GROUP BY DBNAME
)
, CTE_F AS (
SELECT COALESCE(Z.DBNAME,X.DBNAME) AS DBNAME
, COALESCE(Z.SPACE,0) AS TS_SPACE
, COALESCE(X.SPACE,0) AS IX_SPACE
, COALESCE(Z.SPACE,0)
+COALESCE(X.SPACE,0) AS ALL_SPACE
FROM CTE_RTS_TS Z
LEFT OUTER JOIN CTE_RTS_IX X
ON (Z.DBNAME = X.DBNAME)
)
--SELECT * FROM CTE_F; -- UNCOMMENT TO SEE CTE_F
SELECT COALESCE(DBNAME,'-ALL') AS DBNAME
, SUM(TS_SPACE)/1024 AS TS_SPACE_MB
, SUM(IX_SPACE)/1024 AS IX_SPACE_MB
, SUM(ALL_SPACE)/1024 AS ALL_SPACE_MB
FROM CTE_F
GROUP BY GROUPING SETS ((DBNAME),())
ORDER BY ALL_SPACE_MB DESC
;
DBNAME TS_SPACE_MB IX_SPACE_MB ALL_SPACE_MB
-------- ----------- ----------- ------------
-ALL 1905292 1321737 3227029
DYR01P 515004 559251 1074255
DIL01P 523263 189371 712635
DYB01P 119192 163476 282668
DIL04P 173401 64985 238386
DJC01P 98064 122092 220157
DCL01P 128668 54434 183103
DYD01P 90056 505 90562
DDB01P 59104 26476 85581
DCL04P 53501 26396 79897
DYC01P 34623 16507 51130
. . .
(9) Set operators of EXCEPT and INTERSECT are forgotten but can be useful
We are all probably familiar with SQL SET operators of UNION and UNION ALL (and don’t get me started on why it is important to know the difference between UNION and UNION ALL… that is for another article).
But beyond UNION and UNION ALL we have the often forgotten set operators of EXCEPT and INTERSECT (and EXCEPT ALL and INTERSECT ALL).
Basically, you use them in the same place that you might use UNION (or UNION ALL) but
-
EXCEPT returns all rows in the top result set (the top SELECT) that do not have an identical row in the second or bottom result set (the bottom SELECT)
-
INTERSECT returns the rows that exist in the top and the bottom.
Db2 LUW
https://www.ibm.com/docs/en/db2/11.5?topic=queries-fullselect
Db2 Z
https://www.ibm.com/docs/en/db2-for-zos/12?topic=queries-fullselect
The use cases for EXCEPT and INTERSECT are probably not super common in real life.
But they can be useful every once in a while.
For example, the other day I wrote some SQL to compare table columns between two different schemas. The table was supposed to be the same. Just different schema. Of course, all the useful meta-data about the tables are in the Db2 catalog and we can write SQL to compare the table structures.
> Of course, there are other methods to compare table structures. And lots of tools that help fulfill that requirement. But don’t ask, I wanted to do the comparison with SQL myself.
I wrote a longish SQL with CTE_A containing all the important information from schema 1 and then CTE_B with the same information from schema 2. Then I used EXCEPT between the two CTE to compare. I expected everything to be identical and the empty result (after the EXCEPT) confirmed what I expected. Excellent and easy.
>I now run this SQL every day in batch and send an email if the result is ever not empty. Basically, I am now paranoid that the tables get out of sync (again, don’t ask)
previous IDUG Content Committee articles on EXCEPT AND INTERSECT:
Db2 Z V8 to V12 new features by Tony Andrews (2022)
(10) MERGE - powerful DML!
The SQL statement MERGE is relatively new to Db2 Z. It also exists in Db2 LUW (I don’t know since when).
There was some disappointment when MERGE was first introduced to Db2 Z because it did not work quite as one would have logically expected. It had some shortcomings. But IBM has enhanced MERGE and it now works as it should and as you would expect. So if you gave up on MERGE, then revisit it!
The original and simple case for MERGE is to have one SQL Data Manipulation Language (DML) statement that does INSERT or UPDATE all in one statement. This is to avoid the classic old-school application problem where you do not know if the target row already exists in the target table. One classic solution for application is to first do a SELECT to determine if the row does exist and then it does UPDATE or INSERT as required. The second classic application approach was to first do the INSERT or UPDATE and if it worked then great otherwise, it would then do the other DML operation.
But now the MERGE takes care of both cases in one SQL DML statement. And as we know one SQL statement is better than two!
The MERGE that is available today in Db2 can do INSERT or UPDATE or DELETE of the target table. The source table which is the cross-reference table can itself have many rows that are matched. The end result is that many rows can be changed in the target. Super powerful.
As always, read the IBM documentation for the official answer on how to use MERGE
Db2 LUW
Db2 Z
previous IDUG Content Committee articles on MERGE:
MERGE Db2Z V12 enhancements by Chris Crone (2017)
MERGE Db2 V12 usage example from BANKDATA (2017)
(11) SELECT from final table
Db2 Z allows SELECT from final table. This is a super way to know all the automatically created values when changing data via DML (INSERT, DELETE, UPDATE, MERGE)
This can be very powerful in knowing and confirming the data you are changing.
The concept is properly described in the IBM documentation.
Also, there have been great blogs and presentations on the topic.
Just don’t forget it exists as an option in your application and queries!
previous IDUG Content Committee articles on SELECT FROM FINAL TABLE:
Prove your changes; NEW, FINAL and OLD tables by Mark Gillis (2022)
SELECT from data changes by Emil Kotrc (2015)
A minor caveat, the Db2 trace will show it as a SELECT statement. So if you are reviewing your accounting trace and counting INSERT and DELETE and UPDATE to know how many DML are occurring, you might miss the SELECT from final table. In that case, you might have to check things like ROWS_CHANGED counter to know the SELECT changed something.