Oracle9i string declarations

Beginning with Oracle9i, you have the option when declaring string variables of specifying whether your length is in terms of bytes or in terms of characters. For example, to declare a CHAR variable capable of holding 50 bytes, you can do the following:

feature_name CHAR(50 BYTE)

However, if you want to be certain that your variable can hold 50 characters from the underlying character set, you would declare as follows:

feature_name CHAR(50 CHAR)

And what of the following simple declaration?

feature_name CHAR(50)

This is ambiguous unless you know a critical piece of information about how the database has been configured. You need to know the setting of the NLS_LENGTH_SEMANTICS parameter, which you can find using the following query:

SQL> SELECT * 2 FROM nls_session_parameters 3 WHERE parameter = 'NLS_LENGTH_SEMANTICS'; PARAMETER VALUE------------------------------ ------------------NLS_LENGTH_SEMANTICS BYTE

A value of BYTE is the default NLS_LENGTH_SEMANTICS setting, indicating that byte semantics are the default and that CHAR(50) is the same as CHAR(50 BYTE). A value of CHAR indicates character semantics, in which case CHAR(50) would be interpreted as CHAR(50 CHAR). Be aware that your DBA can change the NLS_LENGTH_SEMANTICS setting from its default using an ALTER SYSTEM command; you yourself can change it at the session level via the ALTER SESSION command. If you have any doubts about NLS_LENGTH_SEMANTICS, query the NLS_SESSION_PARAMETERS view for the current setting.

Your use of the BYTE and CHAR qualifiers has an effect that outlasts a variable's declaration. The qualifiers also affect the way in which certain string functions treat the variables and whether CHAR variables are padded (or not padded) with spaces to their maximum length. Consider the following two tables, created on a system using UTF-8 as the database character set:

SQL> DESCRIBE utest Name Null? Type ----------- -------- -------------- UCHAR CHAR(1 CHAR) SQL> DESCRIBE utest2 Name Null? Type ----------- -------- -------------- UCHAR2 CHAR(3)

One column has been declared using character semantics, while the other has been declared using byte semantics. Yet both columns consume three bytes, as the following data dictionary query demonstrates:

SQL> SELECT table_name, column_name, data_length, char_length, char_used 2 FROM user_tab_columns 3 WHERE column_name IN ('UCHAR','UCHAR2'); TABLE_NAME COLUMN_NAM DATA_LENGTH CHAR_LENGTH CHAR_USED---------- ---------- ----------- ----------- ---------UTEST UCHAR 3 1 CUTEST2 UCHAR2 3 3 B

You can see that both columns use three bytes (DATA_LENGTH), and that Oracle has kept track of whether character or byte semantics were used to declare the columns (CHAR_USED).

The following PL/SQL code snippet shows the effect of the differing semantics on the operation of the LENGTH function:

DECLARE uchar utest.uchar%TYPE; uchar2 utest2.uchar2%TYPE;BEGIN uchar:= 'ã'; uchar2:= 'ã'; DBMS_OUTPUT.PUT_LINE(LENGTH(uchar)); DBMS_OUTPUT.PUT_LINE(LENGTH(uchar2)); DBMS_OUTPUT.PUT_LINE(LENGTHB(uchar)); DBMS_OUTPUT.PUT_LINE(LENGTHB(uchar2));END;

The output is:

1223

The key to understanding this output lies in knowing that the LENGTH function (which you'll learn more about later in this chapter) always returns the length of a string in terms of characters. Likewise, LENGTHB counts the number of bytes occupied by the characters in a string, not the number of bytes used to declare the string. Here are some things to note about this code example:

· The uchar and uchar2 variable declarations are based on the underlying database columns. Therefore, each variable occupies three bytes.

· Both variables are set to the single character ã, which is represented in Unicode UTF-8 using two bytes.

· Because uchar was declared using character semantics, its length is reported in terms of characters. ã is one character. The length of uchar will always be one character.

· Because uchar2 was declared using byte semantics, there is one byte left over after the two-byte character ã. This extra byte is filled with a space character (as is always the case with extra bytes in CHAR strings). The resulting length, however, is still reported in terms of characters.

· The fact that both variables consume three bytes is not reflected in the values returned by LENGTH because that function is counting characters, not bytes.

· LENGTHB returns the length of each variable in terms of bytes, but the semantics used to declare each variable still affect the results. LENGTHB counts not the number of physical bytes occupied by the variable, but rather the number of bytes occupied by each variable's value. The value ã in uchar is two bytes long. The value in uchar2, however, is ã followed by a space character, for a total of three bytes.

Because of the confusion that can arise when character and byte semantics are mixed, Oracle recommends that you not use the CHAR and BYTE qualifiers and instead rely on the database default, which you can set using NLS_LENGTH_SEMANTICS. There isn't yet a large enough body of knowledge on this issue to enable us to agree or disagree with Oracle's recommendation. However, we strongly believe that if you are using multibyte character sets, you must familiarize yourself with the issues we've described in this section, and understand how the CHAR and BYTE qualifiers affect the operation of string variables.