2.1 New data types
character
. Character objects were encoded as single-byte characters that could have bits and font attributes.In this release, Liquid Common Lisp adds new character types and subtypes to the Common Lisp type hierarchy to support both single-byte and double-byte characters.
The following new character type specifiers have been added to the Common Lisp type hierarchy:
(character
repertoire-name)
, a subtype ofcharacter
base-character
, a subtype ofcharacter
standard-character
, a subtype ofbase-character
extended-character
, a subtype ofcharacter
(character
repertoire-name)
specifies a character in a particular character repertoire.A character repertoire is an unordered set of abstract characters; it does not specify the encoding of the characters. Liquid Common Lisp supports the following character repertoires:
(character :base)
and(character :ascii)
Type specifiers denoting the base character repertoire.
For the HP and RS6000 platforms, the base character repertoire contains the standard ASCII characters.
For the SunOS and Solaris platforms, the base character repertoire contains those ASCII characters that are in Extended UNIX Code (EUC) codeset 0.
Base characters are represented as single-byte characters. The base character repertoire can overlap with characters from other repertoires, such as:english
.
The typebase-character
is synonymous with(character :base)
.
(character :standard)
Type specifier denoting the repertoire of characters that consists of the 96 standard characters as defined by CLtL2. This type is a subtype of (character :base)
.
The typestandard-char
is synonymous with(character :standard)
and is a subtype ofbase-character
.
(character :english)
Type specifier denoting the English alphabet.
(character :kanji)
Type specifier denoting Japanese Kanji characters.
(character :katakana)
Type specifier denoting Japanese Katakana characters.
(character :hiragana)
Type specifier denoting Japanese Hiragana characters.
(character :sbcs)
Type specifier denoting any single-byte character.
(character :dbcs)
Type specifier denoting any double-byte character.
extended-character
denotes all characters that are not base characters. Extended characters are full-fledged Common Lisp characters; that is, you can use extended characters wherever you would use any Common Lisp character. In particular, you can construct symbol names and package names with extended characters, base characters, or a combination of extended and base characters.For example, if you could type Greek characters as well as ASCII characters from your keyboard, you could have the following interaction with Lisp:
> (setq f '(1 2 3)) (1 2 3)Extended characters are represented as double-byte characters.> (car f) 1
> (defun g (c) (car c)) g
> (g f) 1
> (defvar *c* '(9 8 7)) *c*
> (g *c*) 9
> (setq mixed-string "c-987") "c-987"
> mixed-string "c-987"
Table 2.1 shows which coded character set represents the extended characters for each supported platform.
Platform | Coded Character Set |
---|---|
HP | JIS |
RS6000 | PC932 (Shift JIS) |
SunOS/Solaris | EUC codeset 1 |
In Common Lisp, you can represent character objects by writing#\
followed by the character. For example,#\a
represents a lowercasea
. In Liquid Common Lisp, you can also represent characters as a hexadecimal character code; for example, the lettera
can be represented as#\c61
, which is the same as(int-char #x61)
.
Extended characters can be represented using the same hex notation; for example, you can enter a character of the Kanji character set by using any of the following notations:
(int-char #xc7ad) #\cc7ad #\Kanji ideogramOn streams of type
character
, strings of extended characters and character objects are printed out directly. On streams of typebase-character
, extended characters are printed in the hex notation. For example, if you write the extended character#\cc7ad
, it is printed as#\cC7AD
. Extended characters in strings and symbol names are printed in a similar syntax. For example, the string"abc#\cc7ad"
is printed as#<"abc[C7AD]">
, and the symbol nameabc#\cc7ad
is printed as#<Symbol |abc[C7AD]|>
. The type specifieraugmented-character
is equivalent tocharacter
except when used for character I/O operations. Specifically, whenaugmented-character
is specified as the value of the:element-type
keyword argument toopen
andmake-lisp-stream
, bits and font attributes are preserved; for all other element types, bits and font attributes are ignored. See Chapter 3, "Using Characters and Strings" for a complete description of the:element-type
keyword option.
Figure 2.1 shows the additions to the character type hierarchy and their relationship to the typecharacter
, which is the union of the typesbase-character
andextended-character
:
Figure 2.1 Character type hierarchy
Generated with Harlequin WebMaker