Looks for the Unicode Byte Order Mark, which if found is assumed to indicate the matching Unicode encoding.
detect-unicode-bom pathname ef-spec buffer length => new-ef-spec
detect-utf32-bom pathname ef-spec buffer length => new-ef-spec
detect-utf8-bom pathname ef-spec buffer length => new-ef-spec
Pathname identifying the location of buffer.
An external format spec.
A buffer whose contents are examined.
Length (an integer) up to which buffer should be examined.
These functions are called as part of open's encoding detection routine, and try to detect the encoding if it is not already supplied in the external-format argument.
detect-unicode-bom
tries to detect UTF-16 encoding.
detect-utf32-bom
tries to detect UTF-32 encoding.
detect-utf8-bom
tries to detect UTF-8 encoding.
These functions work by checking whether the file starts with the Unicode character #xFEFF
(BOM) encoded in the relevant encoding, and if it does assumes the file is encoded in this encoding. detect-unicode-bom
and detect-utf32-bom
also deduce the direction (little-endian or big-endian).
Note that files starting with 0xff 0xfe 0x00 0x00
can match both UTF-16 and UTF-32 little-endian. By default detect-utf32-bom
is applied first, because it precedes detect-unicode-bom
in *file-encoding-detection-algorithm*. You can change this behavior by altering the order of functions in *file-encoding-detection-algorithm*.
LispWorks User Guide and Reference Manual - 13 Feb 2015