Learn how to migrate data from a non-Unicode database to a Unicode database.
Editor's note: This article is an excerpt from the book DB2 10.1/10.5 for Linux, UNIX, and Windows Database Administration (Exams 611 and 311): Certification Study Guide.
Both BLU and pureScale require databases to be Unicode, so you should create all new databases as Unicode. Chances are good that the requirement to convert data to a Unicode code set will happen. Because a database cannot be altered to a different code set, this will mean you must unload and load the data.
In the past, string units of character data have been stored in a relational database in bytes. So for a single-byte character set (SBCS), each character represents one byte. The string unit of the column data type is a byte, sometimes referred to as an octet.
The byte approach can be problematic when moving from an SBCS to a multibyte character set (MBCS), as the representation of a character can be more than one byte, depending on the encoding. This can result in truncation of string data because the length of the target Unicode column is too small for the encoded MBCS data.
You can set the encoding method at the session level, the database level, or the column level. OCTETS encoding can be set for any database code set. CODEUNITS16 and CODEUNITS32 are allowed only in Unicode databases.
As previously stated, OCTETS string units are byte-length units. CODEUNITS16 string units are Unicode UTF-16 code sets and are double-byte-length units, and CODEUNITS32 string units are Unicode UTF-32 and are character-length units.
At the session level, execute the following command:
SET NLS_STRING_UNITS = Encoding
At the database level, use this UPDATE command:
UPDATE DB CFG FOR SAMPLE USING STRING_UNITS Encoding
Encoding is OCTETS, CODEUNITS16, or CODEUNITS32.
At the column level, execute the CREATE TABLE or ALTER TABLE statement:
CREATE TABLE TEST_ENCODING
(STR_COL VARCHAR(2000 CODEUNITS32),
STR_CLOB CLOB(1M OCTETS),
STR_GRAPH GRAPHIC(63 CODEUNITS32)
ALTER TABLE TEST_ENCODING
ALTER COLUMN STR_COL SET DATA TYPE VARCHAR(3000 OCTETS)
13 Quick Steps to Migrate Data
When moving data between databases with different code sets, you can use the EXPORT command with MODIFIED BY CODEPAGE=CodePage, where CodePage is the code page value of the target database (for example, a valid code page for UTF-8 is 1208).
To migrate data from a non-Unicode database to a Unicode database, follow these steps:
Step 1: Remove access to the source database.
Step 2: Export the data by using the Export utility. For all export statements, use a modifier code page and set it to the target code page used.
Step 3: Dump the DDL by using db2look.
Step 4: Dump the configurations that include the registry, DBM, and database.
Step 5: Drop the source database.
Step 6: Re-create the source database by using the CREATE DATABASE command as Unicode and by using ASM, which is now the default.
Step 7: Execute DDL from db2look output to create the required objects. If using the column-level method, make the encoding changes in the file before executing.
Step 8: Run the configurations for the registry, DBM, and database.
Step 9: Optional: Set SYSTEM_UNITS to the required encoding.
Step 10: Stop and start the DB2 instance to make all changes active.
Step 11: Import and load the data.
Step 12: Optional: Set NLS_STRING_UNITS to the desired encoding for each import or load session.
Step 13: Optional: Collect statistics and rebind as needed.