LispWorks supports all the characters in the Unicode range [0, #x10ffff]
, excluding the surrogate range [#xd800, #xdfff]
. Note that character objects corresponding to surrogate code points may be produced by some APIs in LispWorks, but not by the interfaces that you should normally use to generate characters and strings in Common Lisp (that is cl:code-char, reading from a stream, converting from a foreign string, loading and storing from or to strings).
The following subtypes of character are defined:
Characters with cl:char-code less than #x10000 (BMP stands for Basic Multilingual Plane in Unicode). | |
All characters. |
In LispWorks 6.1 and earlier versions, characters with codes up to #x10000
are supported, and surrogate code points are allowed.
bmp-char was new in LispWorks 7.0, and matches the range of characters in LispWorks 6.1 and earlier versions, except that surrogate code points are no longer valid.
In LispWorks 6.1 and earlier versions there is simple-char which is now a synonym for cl:character. Using cl:character is preferable and portable.
In LispWorks 6.1 and earlier versions character bits attributes are supported, and also some characters represent keyboard gestures. These are no longer supported.
All simple characters have names that consist of U+
followed by the code of the character in hexadecimal, for example #\U+764F
is (code-char #x764F)
.
The hexadecimal number must be 4-6 characters, for example #\U+a0
is illegal. Use #\U+00a0
instead.
Additionally, Latin-1 characters have names derived from the ISO10646 name, for example:
(char-name (code-char 190)) => "Vulgar-Fraction-Three-Quarters"
Names are also provided for space characters:
(name-char "Ideographic-Space") => #\Ideographic-Space
Note that surrogate characters, that is the inclusive range [#xd800, #xdfff]
are not acceptable, and trying to read such a character, for example #\U+d835
, produces an error.
In LispWorks 6.1 and earlier versions you can specify bits in character names. This is illegal in LispWorks 7.0 and later.
In LispWorks 6.1 and earlier versions character codes are limited to less than #x10000
, and surrogate code points are allowed.
String types are supplied which are capable of holding each of the character types mentioned above. The following string types are defined:
holds any base-char. | |
holds any bmp-char. | |
holds any cl:character (see 26.3.1 Character types). |
Compatibility note: bmp-string was new in 7.0. In LispWorks 6.1 and earlier versions there is augmented-string, this is now a synonym for text-string and is deprecated.
In LispWorks 6.1 and earlier versions, text-string could hold characters with codes less than #x10000
.
The types above include non-simple strings - those which are displaced, adjustable or with a fill-pointer.
The Common Lisp type string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in 26.5 String Construction. For example:
CL-USER 1 > (set-default-character-element-type 'base-char) BASE-CHAR CL-USER 2 > (coerce (list #\Ideographic-Space) 'string) Error: #\Ideographic-Space is not of type BASE-CHAR. 1 (abort) Return to level 0. 2 Return to top loop level 0. Type :b for backtrace or :c <option number> to proceed. Type :bug-form "<subject>" for a bug report template or :? for other options. CL-USER 3 : 1 > :a CL-USER 4 > (set-default-character-element-type 'character) CHARACTER CL-USER 5 > (coerce (list #\Ideographic-Space) 'string) " "
The following types are subtypes of cl:simple-string. Note that in the names of the string types, 'simple' refers to the string object and does not mean that the string's elements are of type simple-char.
holds any base-char. | |
holds any bmp-char. | |
holds any cl:character. |
The Common Lisp type simple-string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in 26.5 String Construction.
The type string (and hence simple-string) is defined by ANSI Common Lisp to be a union of all the character array types. This makes a call like:
(coerce s 'simple-string)
ambiguous because it needs to select a concrete type (such as simple-base-string or simple-text-string).
When LispWorks is running with *default-character-element-type* set to base-char, it expects that you will want strings with element type base-char, so functions like coerce treat references to simple-string as if they were (simple-array base-char (*))
.
If you call set-default-character-element-type with a larger character type, then simple-string is treated as an array of that character type.
In other functions such as typep and subtypep, the types string and simple-string always represent a union of all the character array types as specified by ANSI Common Lisp.
The compiler always does type inferencing for simple-string as if *default-character-element-type* was set to character.
For example, when you declare something to be of type simple-string, the compiler will never treat it as simple-base-string. Therefore calls like:
(schar (the simple-string x) 0)
will work whether x is a simple-base-string, simple-bmp-string or simple-text-string.
LispWorks® User Guide and Reference Manual - 01 Dec 2021 19:30:24