25
Thu, Apr
0 New Articles

Practical SQL: Listing Based on Group Conditions

SQL
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

SQL has many features to group and summarize, but you need a little know-how to find the detail under the summaries.

 

If you know me, you know I love Common Table Expressions (or CTEs). In today's article, we're going to take advantage of CTEs and aggregate functions to learn how to find data that matches group conditions. Group conditions come up in many situations, whether it's looking for duplicate items, finding completed orders, or identifying mismatches on denormalized data. I'll show you how to use those group conditions to retrieve the underlying data.

 

Why Is This an Issue?

Think of it this way: in order to find group conditions, you group your data rows by key values. You then use aggregate functions such as COUNT and SUM to find the group conditions. In so doing, you by definition lose any other values in those rows. Let's give a simple example and the traditional solution:

 

select CUSTNO from CUSTMAST

group by CUSTNO having COUNT(*) > 1

order by CUSTNO

 

Not much to it, but it really underscores the issue. If I give the results of this query to users in order to help ascertain the problem, they're going to immediately want to at the very least know the name of the customer that's duplicated. (And yes, I know that you can avoid this sort of thing with unique keys, but not all systems have their data models set up that completely, and not all group conditions lend themselves to database constraints.) In the past, we've been able to easily enough give them at least one of the names this way:

 

select CUSTNO, max(CUSTNAME) from CUSTMAST

group by CUSTNO having COUNT(*) > 1

order by CUSTNO

 

Ah, very clever! We use another aggregate function, MAX, in order to at least select one of the customer names. Quick quiz question: what happens if you don't use MAX and instead just try to include the CUSTNAME field in the list? Answer: you get a "Column or expression in select list not valid" error! But you knew that already. If you think about it, though, the technique above is pretty limited. It only gets you one value, and if you want to add more fields, you have to do MAX on all of them, which means you might be getting values from different rows. All in all, not an optimal solution. Enter the CTE:

 

with CustDups as

(select CUSTNO from CUSTMAST

   group by CUSTNO having COUNT(*) > 1)

select * from CUSTMAST

where CUSTNO in (select CUSTNO from CustDups)

order by CUSTNO, CUSTNAME

 

There we go! Now I'll show all columns of every row that has a duplicate. I can, of course, limit the rows for the user by specifying a list of fields rather than the asterisk in my SELECT list.

 

Other Syntaxes for Other Uses

I could have done this with a JOIN as well:

 

with CustDups as

(select CUSTNO from CUSTMAST

   group by CUSTNO having COUNT(*) > 1)

select * from CUSTMAST join CustDups using (CUSTNO)

order by CUSTNO, CUSTNAME

 

It's a little more concise, but has its pros and cons. The biggest benefit of the JOIN approach makes itself felt when you're doing more analytical types of inquiries:

 

with OrderSum as

(select ORDNO, SUM(QTYORD) Ordered, SUM(QTYRCV) Received

   from ORDDTL group by ORDNO)

select * from OrderSum join ORDDTL using (ORDNO)

where Received >= Ordered

order by ORDNO, ORDLIN

 

This is a pretty simple case: I'm showing the detail lines for every order where the total received quantity is greater than or equal to the total ordered quantity. Note that this may or may not be a particularly useful query, since an over-received line might cancel out an under-received one. Rather than check an aggregate condition, you might want to test each line individually. For example, you might want to show all the lines for every order that has at least one over-received line. That would be cool!

 

You could do that with this query:

 

with OverOrders as

(select ORDNO from ORDTL where QTYRCV > QTYORD

   group by ORDNO)

select * from ORDDTL

where (ORDNO) in (select ORDNO from OverOrders)

order by ORDNO, ORDLIN

 

Slick, right? Although I know some of you might be saying we could replace the SELECT..GROUP BY with a simple SELECT DISTINCT, like this (I'm only going to show the CTE definition, which is the part that defines the temporary table OverOrders):

 

with OverOrders as

(select distinct ORDNO from ORDTL where QTYRCV > QTYORD)

 

You'd be right. Just using the SELECT DISTINCT would allow me to reduce a little of the complexity. I love DISTINCT; it allows me to do quite a few things, one of which I'll show you in a second. But the first syntax allows you a little finer-grained control. Let's imagine, for example, you needed to know situations where more than one line was over-received. You could do that with just a simple extension of our first query:

 

with OverOrders as

(select ORDNO from ORDTL where QTYRCV > QTYORD

   group by ORDNO having COUNT(*) > 1)

 

Like before, the clause above is just the definition of the CTE that identifies the orders that meet the criteria, not the subsequent SELECT that brings in the detail data. You can see that using the GROUP BY syntax allows you to use the COUNT aggregate function and would, by extension, allow you to use any aggregate function, such as SUM or MAX. It's a very powerful technique.

 

A Last Challenging Problem

OK, maybe it's not all that challenging, but it took me quite a while to figure it out. Now that I know the secret, though, I find myself using it all the time. The generic situation is simple: give me a list of detail for groups of records where some field in the row has more than one value within that group. The generic description is a little vague, so let me give you a more concrete example: give me a list of all the PO lines for items that we buy from multiple vendors. Yes, there may be a good reason to buy from multiple vendors, but there could also be an opportunity for savings by using a single source. So how do we do that? Simple:

 

with ItemVendor as

(select distinct POITEM, POVEND from PODTL),

MultiVend as
(select POITEM from ItemVendor

   group by POITEM having COUNT(*) > 1)

select * from PODTL

where POITEM in (select POITEM from MultiVendor)

 

This is a multi-step process. First, I use DISTINCT to create a list of item/vendor combinations. Then, I use GROUP BY to narrow that initial list to only those items purchased from more than one vendor. Why do I have to do this? Because I can't just get a list of all items with more than one line; the lines could all be from the same vendor. I first have to coalesce all the lines for each item/vendor and then use those to identify the items that truly are purchased from multiple vendors. Finally, I take that list and use it to select all the detail lines from the original detail table. At the end, we have a list of all the PO lines for every item that we purchase from different vendors. And because we're going back to the underlying detail, the query includes things like date and price. That's the sort of analytics that can really help you make business decisionsand without having to go to an expensive BI solution!

 

This should give you some ideas on how using aggregation can get you the data you need for your business. Happy data-diving!

 

Joe Pluta

Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been extending the IBM midrange since the days of the IBM System/3. Joe uses WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. He has written several books, including Developing Web 2.0 Applications with EGL for IBM i, E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. Joe performs onsite mentoring and speaks at user groups around the country. You can reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..


MC Press books written by Joe Pluta available now on the MC Press Bookstore.

Developing Web 2.0 Applications with EGL for IBM i Developing Web 2.0 Applications with EGL for IBM i
Joe Pluta introduces you to EGL Rich UI and IBM’s Rational Developer for the IBM i platform.
List Price $39.95

Now On Sale

WDSC: Step by Step WDSC: Step by Step
Discover incredibly powerful WDSC with this easy-to-understand yet thorough introduction.
List Price $74.95

Now On Sale

Eclipse: Step by Step Eclipse: Step by Step
Quickly get up to speed and productivity using Eclipse.
List Price $59.00

Now On Sale

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: