Character Sets for DOS

From InterBase
Jump to: navigation, search

Go Up to Character Sets and Collation Orders

The following character sets correspond to MS-DOS code pages, and should be used to specify character sets for InterBase databases that are accessed by ­Paradox for DOS and dBASE for DOS:

Character sets corresponding to DOS code pages
Character set DOS code page

DOS437

437

DOS850

850

DOS852

852

DOS857

857

DOS860

860

DOS861

861

DOS863

863

DOS865

865

The names of collation orders for these character sets that are specific to Paradox begin “PDOX”. For example, the DOS865 character set for DOS code page 865 supports a Paradox collation order for Norwegian and Danish called ­“PDOX_NORDAN4”.

The names of collation orders for these character sets that are specific to dBASE begin “DB”. For example, the DOS437 character set for DOS code page 437 supports a dBASE collation order for Spanish called ­“DB_ESP437”.

For more information about DOS code pages, and Paradox and dBASE collation orders, see the appropriate Paradox and dBASE documentation and driver books.


Character Sets for Microsoft Windows

There are five character sets that support Windows client applications, such as Paradox for Windows. These character sets are: WIN1250, WIN1251, WIN1252, WIN1253, and WIN1254.

The names of collation orders for these character sets that are specific to Paradox for Windows begin “PXW”. For example, the WIN1250 character set supports a Paradox for Windows collation order for Norwegian and Danish called ­“PXW_NORDAN4”.

For more information about Windows character sets and Paradox for Windows collation orders, see the appropriate Paradox for Windows documentation and driver books.


UNICODE_BE and UNICODE_LE Character Sets

InterBase now supports 16-bit UNICODE_BE and UNICODE_LE as server character sets. These character sets cannot be used as client character sets. If your client needs full UNICODE character support, please use UTF8 instead of UNICODE_LE and UNICODE_BE for the client character set (a.k.a LC_CSET). A client can use the UTF8 (or other native) client character set to connect with a UNICODE database.

A database schema is declared to use the new character set in the CREATE DATABASE statement, as follows:

CREATE DATABASE <filespec> <...>; DEFAULT CHARACTER SET UNICODE;

Note that InterBase uses “big endian” ordering by default.

The attributes for the UNICODE_BE and UNICODE_LE character sets are shown in InterBase Character Sets.

Note: InterBase 2008 does not support UNICODE collations in this release. The default collation is binary sort order for UNICODE.

Support for the UTF-8 Character Set

The UTF-8 character set is an alternative coded representation form for all of the characters of the ISO/IEC 10646 standard. To use the UTF-8 character set, you would declare a database schema to use the character set, in the CREATE DATABASE SQL statement, as shown below:

CREATE DATABASE <filespec> <...> DEFAULT CHARACTER SET UTF8;

Additionally, you may use the alias UTF_8.

The attributes for the UTF-8 character set are shown in InterBase Character Sets.

Additional Character Sets and Collations

Support for additional character sets and collation orders is constantly being added to InterBase. To see if additional character sets and collations are available for a newly created database, connect to the database with isql, then use the following set of queries to generate a list of available character sets and collations:

SELECT RDB$CHARACTER_SET_NAME, RDB$CHARACTER_SET_ID
 FROM RDB$CHARACTER_SETS
 ORDER BY RDB$CHARACTER_SET_NAME;
SELECT RDB$COLLATION_NAME, RDB$CHARACTER_SET_ID
 FROM RDB$COLLATIONS
 ORDER BY RDB$COLLATION_NAME;

Advance To: