Looks for the Unicode Byte Order Mark, which if found is assumed to indicate the matching Unicode encoding.
system
detect-unicode-bom pathname ef-spec buffer length => new-ef-spec
detect-utf32-bom pathname ef-spec buffer length => new-ef-spec
detect-utf8-bom pathname ef-spec buffer length => new-ef-spec
pathname⇩ |
Pathname identifying the location of buffer. |
ef-spec⇩ |
An external format spec. |
buffer⇩ |
A buffer whose contents are examined. |
length⇩ |
Length (an integer) up to which buffer should be examined. |
new-ef-spec |
A new external format spec created by merging ef-spec with the encoding that was found. |
These functions are called as part of open's encoding detection routine, and try to detect the encoding if it is not already supplied by ef-spec (i.e. is not :default
).
detect-unicode-bom
tries to detect UTF-16 encoding.
detect-utf32-bom
tries to detect UTF-32 encoding.
detect-utf8-bom
tries to detect UTF-8 encoding.
These functions work by checking whether the bytes in buffer (bounded by length) starts with the Unicode character #xFEFF
(BOM) encoded in the relevant encoding, and if it does assumes the file is encoded in this encoding. detect-unicode-bom
and detect-utf32-bom
also deduce the direction (little-endian or big-endian) if ef-spec does not include this.
Note that files starting with 0xff 0xfe 0x00 0x00
can match both UTF-16 and UTF-32 little-endian. By default detect-utf32-bom
is applied first, because it precedes detect-unicode-bom
in *file-encoding-detection-algorithm*. You can change this behavior by altering the order of functions in *file-encoding-detection-algorithm*.
LispWorks® User Guide and Reference Manual - 01 Dec 2021 19:31:02