LispWorks supports all the characters in the Unicode range [0, #x10ffff]
, excluding the surrogate range [#xd800, #xdfff]
. Note that character objects corresponding to surrogate code points may be produced by some APIs in LispWorks, but not by the interfaces that you should normally use to generate characters and strings in Common Lisp (that is cl:code-char
, reading from a stream, converting from a foreign string, loading and storing from or to strings).
The following subtypes of character are defined:
Characters with cl:char-code
less than base-char-code-limit
(256
).
Characters with cl:char-code
less than #x10000
(BMP stands for Basic Multilingual Plane in Unicode).
All characters.
In LispWorks 6.1 and earlier versions, characters with codes up to #x10000
are supported, and surrogate code points are allowed.
bmp-char was new in LispWorks 7.0, and matches the range of characters in LispWorks 6.1 and earlier versions, except that surrogate code points are no longer valid.
In LispWorks 6.1 and earlier versions there is simple-char which is now a synonym for cl:character
. Using cl:character
is preferable and portable.
In LispWorks 6.1 and earlier versions character bits attributes are supported, and also some characters represent keyboard gestures. These are no longer supported.
All simple characters have names that consist of U+
followed by the code of the character in hexadecimal, for example #\U+764F
is (code-char #x764F)
.
The hexadecimal number must be 4-6 characters, for example #\U+a0
is illegal. Use #\U+00a0
instead.
Additionally, Latin-1 characters have names derived from the ISO10646 name, for example:
(char-name (code-char 190))
=>
"Vulgar-Fraction-Three-Quarters"
Names are also provided for space characters:
(name-char "Ideographic-Space")
=>
#\Ideographic-Space
Note that surrogate characters, that is the inclusive range [#xd800, #xdfff]
are not acceptable, and trying to read such a character, for example #\U+d835
, produces an error.
In LispWorks 6.1 and earlier versions you can specify bits in character names. This is illegal in LispWorks 7.0 and later.
In LispWorks 6.1 and earlier versions character codes are limited to less than #x10000
, and surrogate code points are allowed.
String types are supplied which are capable of holding each of the character types mentioned above. The following string types are defined:
holds any base-char.
holds any bmp-char.
holds any cl:character
(see Character types).
Compatibility note: bmp-string was new in 7.0. In LispWorks 6.1 and earlier versions there is augmented-string, this is now a synonym for text-string and is deprecated.
In LispWorks 6.1 and earlier versions, text-string could hold characters with codes less than #x10000
.
The types above include non-simple strings - those which are displaced, adjustable or with a fill-pointer.
The Common Lisp type string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in String Construction. For example:
CL-USER 1 > (set-default-character-element-type 'base-char)
BASE-CHAR
CL-USER 2 > (coerce (list #\Ideographic-Space) 'string)
Error: #\Ideographic-Space is not of type BASE-CHAR.
1 (abort) Return to level 0.
2 Return to top loop level 0.
Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.
CL-USER 3 : 1 > :a
CL-USER 4 > (set-default-character-element-type 'character)
CHARACTER
CL-USER 5 > (coerce (list #\Ideographic-Space) 'string)
" "
The following types are subtypes of cl:simple-string
. Note that in the names of the string types, 'simple' refers to the string object and does not mean that the string's elements are simple-char
s.
holds any bmp-char.
The Common Lisp type simple-string
itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in String Construction.
The type string (and hence simple-string
) is defined by ANSI Common Lisp to be a union of all the character array types. This makes a call like
(coerce s 'simple-string)
ambiguous because it needs to select a concrete type (such as simple-base-string or
simple-text-string).
When LispWorks is running with *default-character-element-type* set to
base-char
, it expects that you will want strings with element type base-char
, so functions like coerce
treat references to simple-string
as if they were (simple-array base-char (*))
.
If you call set-default-character-element-type with a larger character type, then simple-string
becomes a union of the array types that are subtypes of that character type.
The compiler always does type inferencing for simple-string
as if *default-character-element-type* was set to character
.
For example, when you declare something to be of type simple-string
, the compiler will never treat it as simple-base-string. Therefore calls like
(schar (the simple-string x) 0)
will work whether x is a simple-base-string, simple-bmp-string or simple-text-string.
LispWorks User Guide and Reference Manual - 20 Sep 2017