A number is a number is a number, right? Of course not. There are whole numbers, real numbers, imaginary numbers, complex numbers, and irrational numbers, among others. Students of mathematics learn about these number types and more.
We students of the AS/400 also have to understand the various number types available to us. We need to understand the traditional numeric data types-zoned decimal, packed decimal, and binary-as well as the newer integer and floating point types recently added to RPG IV. Otherwise, how do we know when each one is appropriate? How do we understand what values they can have?
In the paragraphs that follow, I explain the various numeric data types available in RPG IV. First, I explain the decimal types, because they are the ones most used in databases. Then, I explain the binary number types. If you are new to the AS/400 but have programmed in languages like C, Pascal, or BASIC, you have used the binary types, but you may not be familiar with decimal numbers. Midrange "old timers" will probably want to skip the section about decimal numbers and proceed directly to the discussion of binary data types.
I've prepared a summary chart of the numeric data types, which you can see in Figure 1. You may want to refer to it as you are reading.
Teaching computers to do arithmetic in decimal, rather than binary, was a milestone most of us fail to recognize and appreciate. Many early data processing applications were written in FORTRAN on computers that could do arithmetic with integers and real numbers, and precision errors were common. For instance, the total under a column of dollar amounts might be inaccurate by a couple of cents. Decimal arithmetic put an end to that.
The AS/400 supports two forms of decimal numbers-zoned decimal and packed decimal.
In a zoned decimal number, each decimal digit occupies one character. That is, to store a seven- digit decimal number requires 7 bytes of memory or disk space. Decimal points are never stored,
but assumed. That is, if you define a zoned decimal field to be nine digits long, with two positions after the decimal point, the system will allocate 9 bytes of storage. A program that uses the field will treat the number as having seven digits left of the decimal point and two digits right of the decimal point.
All bytes except the last can contain only the characters 0 through 9. If the number is positive or zero, the last byte also contains a digit. But if the number is negative, the last digit will contain a right brace or a letter from J to R instead. Let's make sure we understand why.
Each character in the EBCDIC collating sequence can be written as a pair of hexadecimal digits, 0-9 or A-F. The first of these digits is referred to as the zone, and the second one is called the digit. This terminology dates back to the days when data entry was done with punched cards. The top rows of a card were the zone rows. The bottom rows were the digits.
To indicate a negative number, keypunch machines punched an extra hole in the low-order (last) position of a number. This extra hole, called an overpunch, caused the character to look like a letter, instead of a number.
Figure 2 shows how three seven-digit numbers are stored in zoned decimal format.
Be aware that other systems, especially IBM mainframes, sometimes use other overpunches. The most common one is a zone of C to indicate signed positive numbers. This makes the low-order byte of a zoned decimal number look like a letter from A to I.
Along the way, someone realized that the zone portion of all bytes-except the last one-is useless. They got the idea of removing the useless zones and packing two digits to each byte. This is called packed decimal.
In packed decimal, all digits and the zone portion of the low-order byte are stored in pairs to conserve space. Figure 3 shows the same numbers as Figure 2, except that they're stored in packed format. Since each pair of hexadecimal digits stand for 1 byte, a seven-digit number occupies only 4 bytes of storage. As with zoned decimal numbers, decimal points are assumed.
Packed decimal is the preferred method of storing numbers in DB2/400 applications. It conserves disk storage and does not suffer from precision errors as binary representations do.
The binary data types are widely used in math and scientific programming languages, but they are not as useful for database applications as the decimal types. Nevertheless, much of the software being ported to or written for the AS/400 uses these types, so we need to understand them.
IBM midrange systems, even back to the S/3, have always supported one form of binary data. If you've ever used a B in column 43 of an RPG II or RPG III input spec, or in column 44 of an RPG II or RPG III output spec, you're familiar with this type.
In a binary number, each bit represents a power of 2. In a four-bit number, the bits (from left to right) indicate values of 8, 4, 2, and 1. The binary value 0111 means 4 + 2 + 1, or 7.
If the field length is defined as four digits or fewer, the binary field occupies 2 bytes of storage. If the field length is defined as five to nine digits, the field occupies 4 bytes of storage. The
system will not convert more decimal digits than the length you specify. That is, the highest number you may store in a three-digit binary field is 999, even though the 2 bytes allocated for the number will hold higher numbers. Decimal points are not stored; they are assumed.
Binary numbers are stored in twos complement format. This means the high-order bit is reserved as a sign bit. A value of 0 means the number is zero or positive; a value of 1 means the number is negative. To find the twos complement of a number, reverse the bits and add a binary 1. Figure 4 illustrates this process. This works for converting positive numbers to negative or for converting negative numbers to positive.
Binary has never been heavily used in midrange applications. Its main attraction is that it permits large numbers to be stored in small areas. If you've used certain subfields of a file feedback data structure, or certain APIs, you have had to use binary numbers.
Signed integer, usually referred to simply as integer, stores whole numbers in binary format. Like binary, the integer type stores numbers in twos complement format.
Integer differs from binary in three ways:
o The field length must be 5 or 10. A length of 5 allocates 2 bytes, a length of 10 allocates 4 bytes.
o The number of decimal positions must be zero. The number must be a whole number, and the system will not assume decimal positions.
o The integer type accepts a greater range of values. The largest value you can store in a 4-byte binary number is 999,999,999, but a 4-byte integer will let you store values as large as 2,147,483,647.
An unsigned integer differs from a signed integer in only one way: The high-order bit is not reserved for the sign of the number. Instead, the high-order bit represents another power of two. This means that (1) unsigned integers cannot be negative, and (2) unsigned integers can store larger positive values (up to 4,294,967,295).
Floating point is a format that stores real numbers (numbers with a fractional portion) in a binary format. A floating point number has two parts-a mantissa and an exponent. The mantissa is the digits of a decimal number, and the exponent is the power of 10 to which the digits are to be raised.
We express floating point numbers in the format sm.mEse, where s can be a plus or minus sign,
m.m represents the mantissa, and e is the exponent. (This is commonly called scientific notation.) The floating point constant +2.15E+4 means 2.15 x 104, or 21,500. A positive exponent moves the decimal point to the right. A negative exponent moves the decimal point to the left, so +2.15E-2 is 0.0215.
Floating point numbers come in different sizes. On the AS/400, your choices are 4-byte and 8- byte numbers. Four-byte numbers (commonly called single precision floating point) hold approximately eight decimal digits, and 8-byte numbers (double precision floating point) hold approximately 16 decimal digits.
The strength of floating point numbers is that they can hold very large or very small numbers. For instance, suppose you wanted to store the number 7 followed by 45 zeros (+7.0E+45). You could not do that with the decimal types, since they can hold only 30 digits. The integer type would also not be large enough. But floating point would have no trouble containing an approximate equivalent. In the same way, you could use floating point to store a number consisting of a decimal point followed by 70 zeros and the digit 4 (+4.0E-71).
I won't get into the way floating point numbers are encoded. It's complicated, and it probably wouldn't be of practical use to you. If you're interested in knowing about that, use a search engine on the Internet to look for IEEE and floating point. Look for information about the normalized method. One site that explains it in a simple and straightforward manner is http://www.cc.gatech.edu/people/home/ joshuam/essays/ieee.html.
Nevertheless, you should at least understand that the mantissa and exponent are not stored as decimal values, but as binary values. This means that you may not get the precision you always want when you use floating point numbers.
The short RPG IV program in Figure 5 illustrates what I mean. After the first eval, variable floatvar has a value of 1.099999994040E-1, because 11/100 cannot be converted to an exact binary number. After the second eval, variable decvar has a value of 0.109, not 0.11, as it should have.
Don't get the impression that floating point numbers are always inaccurate. If you don't give them too many significant digits, they'll usually come out OK. Keep in mind that the strength of floating point is its ability to store very large and very small numbers, not accuracy.
It is ironic to me that floating point, with its tendency toward inaccuracy of decimal values, can improve accuracy in division. If you specify FLTDIV(*YES) in the H-spec of an RPG IV program, the system will use floating point, not packed decimal, values to do division within expressions. Believe it or not, the results will be more accurate.
Deciding when to use which format is not that difficult. In many cases, the decision is made for you. For example, if you define a data structure to be passed as a parameter to an API, you must define all subfields as the API defines them.
But for work variables in programs, you can often choose from several types. What type you should use in a certain situation is subjective, but I've developed a list of guidelines:
o If speed of calculation is critical, consider using integer or unsigned variables. The AS/400 does its fastest math with integer and unsigned values.
o If you're working with decimal values, use packed variables for speed. Packed math is not as fast as integer math, but it's faster than zoned decimal and floating point.
o Consider using integer or unsigned variables for working variables that cannot have decimal positions, such as control variables of DO loops, subscripts of arrays, and operands of the %SUBSTR (substring) function.
o If decimal accuracy is important (e.g., fractional portion of a currency), use packed decimal.
o If a variable must be able to hold a wide range of values (e.g., from billionths to billions), use floating point.
o And last, my rule of thumb: When in doubt, use packed decimal.
Figure 1: Numeric data types permitted in RPG IV
Figure 3: Packed decimal representation
Figure 4: Finding the twos complement of a binary number
Figure 5: Floating point numbers cannot store some decimal values exactly
D floatvar s 4f
D decvar s 7p 3
C eval floatvar = 0.11
C eval decvar = floatvar
C eval *inlr = *on