I love bugs. I live to find bugs in other peoples code. Especially stupid bugs, as they give me the opportunity to make fun of the unseen coders. I also love finding really dumb design decisions, as they give me another reason to laugh. Lately, Ive been so busy that I really needed reasons to laugh, and IBM and Microsoft got together to send me rolling on the floor.
The first funny is by Microsoft and relates to ActiveX Data Objects (ADO). This one could be a real showstopper, as it causes recordsets to be returned incompletely by ADO. The essence is that if you are using a recordset to return data, and you cause a divide by zero error in any of the returned fields of the recordset, you may not be seeing the complete set of results from your SQL statement. I will walk you through how I found this bug and give you a possible work-around.
I Wanna Be Sedated
I built a Web page for a company so they could see store sales for one month as compared to sales for the same month the previous year. The page also displayed the gross profit percentage and the percentage change between this year and last year. Everything worked fine for a number of months, and then the unthinkable happened; a store had zero sales. Because the SQL statement that creates the page does all of the gross profit and percent change calculations on the server, this caused a divide-by-zero error. Hey, zero sales and zero cost for a month equals zero gross profit percent, right? Wrong! It yields divide-by- zero gross profit percent, as gross profit percent is (Sales -Cost)/Sales, and if Sales is zero, you are dividing by zero.
Now, divide by zero is a not a bad thing on the server, as the server could return null. If the server is more sophisticated, it might return a NAN, or Not a Number. In the worst case, the server could just stop processing the statement at the point of a divide-by- zero error and return an Error. Fortunately, the AS/400 takes the high road and returns a null when a divide-by-zero error occursor does it? Well, here is the bug, and I am not sure whose bug it is. If you are reading one record at a time (your ADO cachesize is 1, or you are using SQLFetch from ODBC), the AS/400 will return a null for a divide-by-zero error on four out of five machines I tested. If your cachesize is greater than 1, the AS/400 will return all rows up to the row that has the divide-by-zero error and then signal end of file. End of file lets you assume that you have retrieved all records; however, you have not retrieved all records. The solution: Either use client-side cursors (this causes all records to
be retrieved) or ensure that your cachesize is set to 1 when opening a recordset. Another solution is to ensure that divide-by-zero errors do not happen in your code, and a sample of this protection using a case statement is provided in the downloadable code.
Im a Teenage Lobotomy
One machine I tested, MCs V4R4 model 170, did not return a null but instead issued an SQL System error that caused the QZDASOINIT job to wait for a reply on the job message queue. This is quite funny, because who exactly is going to reply to the message? I mean, I have a server-based program that is inquiring about sales from an AS/400 via the QSERVER subsystem QZDASOINIT job. Its not like there is a user sitting here running a terminal emulator application who can then change over to the messages and enter the reply. There is nothing but programs talking to programs here. The error causes a message wait on the QZDASOINIT job that causes the Web server to wait for someone to reply to the message. Both machines are stuck until someone comes along and either aborts the job or replies to the message. I had to laugh myself silly over this one; what a funny design decision it must have been that a server-based job should send a message requiring an inquiry to a middleware program. Think about it. To reply to the message, I would have to initiate my connection to the AS/400 and then capture the job number of the job. Next, I would need to spawn a task to monitor the job and detect that it was waiting for a reply to a message. My separate task could then retrieve the message of the stuck job and send a reply. Baring that set of gymnastics, I could play Karnac the Magnificent and predict that the job might possibly throw message X and place an exit program to trap all QZDASOINIT jobs and place a monitor message for message X to automatically reply to any of these process killers. But should I have to do this?
Gabba Gabba Gabba Hey!
Anyway, all of these situations are problems that you need to be aware of. ADO using a cachesize of greater than 1 can cause you not to retrieve all select statement results if you cause a divide by zero error. Message wait in a server-based job can be caused by divide by zero or things like level checks in stored procedures and can cause your server processes to hang. IBM offers a registry hack that you can make to ODBC connections to cause divide by zero to return a null all of the time, but according to the FAQ instructions, you must call the support line to get the instructions on how to do this. Download the sample code from the MC Web site to see these features in action, determine how you can avoid these pitfalls, and get a good laugh.