Specifying a Character Set

From InterBase

Go Up to Character Data Types


When you define the data type for a column, you can specify a character set for the column with the CHARACTER SET argument. This setting overrides the database default character set that is assigned when the database is created.

You can also change the default character set, either with SET NAMES in command-line isql, or with IBConsole using the Edit | Options selection to open the SQL options window where you can specify a character set on the Options tab. For details about using interactive SQL in either environment, see the Operations Guide.

The character set determines:

  • What characters can be used in CHAR, VARCHAR, and BLOB text columns.
  • The collation order to be used in sorting the column.

For example, the following statement creates a column that uses the ISO8859_1 character set, which is typically used in Europe to support European languages:

CREATE TABLE EMPLOYEE
(FIRST_NAME VARCHAR(10) CHARACTER SET ISO8859_1,
 . . .);
Note:
Collation order does not apply to BLOB data.

For a list of the international character sets and collation orders that are supported by InterBase, see Character Sets and Collation Orders.

Characters vs. Bytes

InterBase limits a character column definition to 32,767 bytes. VARCHAR columns are restricted to 32,765 bytes. In the case of a single-byte character column, one character is stored in one byte, so you can define 32,767 (or 32,765 for VARCHAR) characters per single-byte column without encountering an error.

For multi-byte character sets, to determine the maximum number of characters allowed in a column definition, divide the internal byte storage limit for the data type by the number of bytes for each character. Thus, two-byte character sets have a character limit of 16,383 per field, and three-byte character sets have a limit of 10,922 characters per field. For VARCHAR columns, the numbers are 16,382 and 10.921 respectively.

The following examples specify a CHAR data type using the ­UNICODE_FSS character set, which has a maximum size of three bytes for a single character:

CHAR (10922) CHARACTER SET UNICODE_FSS; /* succeeds*/
CHAR (10923) CHARACTER SET UNICODE_FSS; /* fails */

Using CHARACTER SET NONE

If a default character set was not specified when the database was created, the character set defaults to NONE. Using CHARACTER SET NONE means that there is no character set assumption for columns; data is stored and retrieved just as you originally entered it. You can load any character set into a column defined with NONE, but you cannot load that same data into another column that has been defined with a different character set. No transliteration will be performed between the source and destination character sets, so in most cases, errors will occur during the attempted assignment.

For example:

CREATE TABLE MYDATA (PART_NUMBER CHARACTER(30) CHARACTER SET NONE);
SET NAMES LATIN1;
INSERT INTO MYDATA (PART_NUMBER) VALUES('à');
SET NAMES DOS437;
SELECT * FROM MYDATA;

The data (“à”) is returned just as it was entered, without the à being transliterated from the input character (LATIN1) to the output character (DOS437). If the column had been set to anything other than NONE, the transliteration would have occurred.

About Collation Order

Each character set has its own subset of possible collation orders. The character set that you choose when you define the data type limits your choice of collation orders. The collation order for a column is specified when you create the table.

For a list of the international character sets and collation orders that InterBase supports, see Character Sets and Collation Orders.

Advance To: