All Manuals > LispWorks® User Guide and Reference Manual > 26 Internationalization: characters, strings and encodings

26.3 Character and String types

26.3.1 Character types

LispWorks supports all the characters in the Unicode range [0, #x10ffff], excluding the surrogate range [#xd800, #xdfff]. Note that character objects corresponding to surrogate code points may be produced by some APIs in LispWorks, but not by the interfaces that you should normally use to generate characters and strings in Common Lisp (that is cl:code-char, reading from a stream, converting from a foreign string, loading and storing from or to strings).

The following subtypes of character are defined:

Characters with cl:char-code less than base-char-code-limit (256).
Characters with cl:char-code less than #x10000 (BMP stands for Basic Multilingual Plane in Unicode).
All characters.

26.3.2 Compatibility notes

In LispWorks 6.1 and earlier versions, characters with codes up to #x10000 are supported, and surrogate code points are allowed.

bmp-char was new in LispWorks 7.0, and matches the range of characters in LispWorks 6.1 and earlier versions, except that surrogate code points are no longer valid.

In LispWorks 6.1 and earlier versions there is simple-char which is now a synonym for cl:character. Using cl:character is preferable and portable.

In LispWorks 6.1 and earlier versions character bits attributes are supported, and also some characters represent keyboard gestures. These are no longer supported.

26.3.3 Character Syntax

All simple characters have names that consist of U+ followed by the code of the character in hexadecimal, for example #\U+764F is (code-char #x764F).

The hexadecimal number must be 4-6 characters, for example #\U+a0 is illegal. Use #\U+00a0 instead.

Additionally, Latin-1 characters have names derived from the ISO10646 name, for example:

(char-name (code-char 190))
=>
"Vulgar-Fraction-Three-Quarters"

Names are also provided for space characters:

(name-char "Ideographic-Space")
=>
#\Ideographic-Space

Note that surrogate characters, that is the inclusive range [#xd800, #xdfff] are not acceptable, and trying to read such a character, for example #\U+d835, produces an error.

26.3.4 Compatibility notes

In LispWorks 6.1 and earlier versions you can specify bits in character names. This is illegal in LispWorks 7.0 and later.

In LispWorks 6.1 and earlier versions character codes are limited to less than #x10000, and surrogate code points are allowed.

26.3.5 String types

String types are supplied which are capable of holding each of the character types mentioned above. The following string types are defined:

holds any base-char.
holds any bmp-char.

Compatibility note: bmp-string was new in 7.0. In LispWorks 6.1 and earlier versions there is augmented-string, this is now a synonym for text-string and is deprecated.

In LispWorks 6.1 and earlier versions, text-string could hold characters with codes less than #x10000.

The types above include non-simple strings - those which are displaced, adjustable or with a fill-pointer.

The Common Lisp type string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in 26.5 String Construction. For example:

CL-USER 1 > (set-default-character-element-type 'base-char)
BASE-CHAR
 
CL-USER 2 > (coerce (list #\Ideographic-Space) 'string)
 
Error: #\Ideographic-Space is not of type BASE-CHAR.
  1 (abort) Return to level 0.
  2 Return to top loop level 0.
 
Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.
 
CL-USER 3 : 1 > :a
 
CL-USER 4 > (set-default-character-element-type 'character)
CHARACTER
 
CL-USER 5 > (coerce (list #\Ideographic-Space) 'string)
" "

The following types are subtypes of cl:simple-string. Note that in the names of the string types, 'simple' refers to the string object and does not mean that the string's elements are of type simple-char.

holds any base-char.

holds any bmp-char.

holds any cl:character.

The Common Lisp type simple-string itself is dependent on the value of *default-character-element-type* according to the rules for string construction described in 26.5 String Construction.

26.3.5.1 String types at run time

The type string (and hence simple-string) is defined by ANSI Common Lisp to be a union of all the character array types. This makes a call like:

(coerce s 'simple-string)

ambiguous because it needs to select a concrete type (such as simple-base-string or simple-text-string).

When LispWorks is running with *default-character-element-type* set to base-char, it expects that you will want strings with element type base-char, so functions like coerce treat references to simple-string as if they were (simple-array base-char (*)).

If you call set-default-character-element-type with a larger character type, then simple-string is treated as an array of that character type.

In other functions such as typep and subtypep, the types string and simple-string always represent a union of all the character array types as specified by ANSI Common Lisp.

26.3.5.2 String types at compile time

The compiler always does type inferencing for simple-string as if *default-character-element-type* was set to character.

For example, when you declare something to be of type simple-string, the compiler will never treat it as simple-base-string. Therefore calls like:

(schar (the simple-string x) 0)

will work whether x is a simple-base-string, simple-bmp-string or simple-text-string.


LispWorks® User Guide and Reference Manual - 01 Dec 2021 19:30:24