The :external-format
argument of open and related functions should be an ef-spec, where the name can be :default
. The symbol :default
is the default value.
If you know the format of the data when doing file I/O, you should definitely specify external-format explicitly, in the ef-spec syntax described in this section.
An ef-spec is "complete" if and only if the name is not :default
and the parameters include :eol-style
.
All external formats have an :eol-style
parameter. If eol-style is not explicit in an ef-spec a default is used. The allowed values are
This is the default on non-Windows systems, meaning that lines are terminated by Linefeed.
This is the default on Windows, meaning that lines are terminated by Carriage-Return followed by Linefeed.
Lines are terminated by Carriage-Return.
If open or with-open-file
gets a complete :external-format
argument then, it is used as is. For example, this form opens an ASCII linefeed-terminated stream:
(with-open-file (ss "C:/temp/ascii-lf"
:direction :output
:external-format
'(:ascii :eol-style :lf))
(stream-external-format ss))
=>
(:ASCII :EOL-STYLE :LF)
If you know the encoding of a file you are opening, then you should pass the appropriate :external-format
argument.
If open or with-open-file
gets a non-complete :external-format
argument ef-spec then the system decides which external format to use by calling the function guess-external-format.
The default behavior of guess-external-format is as follows:
:default
, this finds a match based on the filename; or (if that fails), looks in the Emacs-style (-*-) attribute line for an option called ENCODING or EXTERNAL-FORMAT or CODING; or (if that fails), chooses from amongst likely encodings by analysing the bytes near the start of the file, or (if that fails) uses a default encoding. Otherwise ef-spec's name is assumed to name an encoding and this encoding is used.:eol-style
parameter, it then also analyzes the start of the file for byte patterns indicating the end-of-line style, and uses a default end-of-line style if no such pattern is found. The file in this example was written by a Windows program which writes the Byte Order Mark at the start of the file, indicating that it is Unicode encoded. The routine in step 1 above detects this:
(set-default-character-element-type 'character)
=>
CHARACTER
(with-open-file (ss "C:/temp/unicode-notepad.txt")
(stream-external-format ss))
=>
(:UNICODE :LITTLE-ENDIAN T :EOL-STYLE :CRLF)
The behavior of guess-external-format is configurable via the variables *file-encoding-detection-algorithm* and
*file-eol-style-detection-algorithm*. See the manual pages for details.
To change the default for all file access via open, compile-file and so on, you can modify the value of *file-encoding-detection-algorithm*.
For example given the following definition:
(defun utf-8-file-encoding (pathname ef-spec buffer length)
(declare (ignore pathname buffer length))
(system:merge-ef-specs ef-spec :utf-8))
then this makes it use UTF-8 as a fallback:
(setq system:*file-encoding-detection-algorithm*
(substitute 'utf-8-file-encoding
'system:locale-file-encoding
system:*file-encoding-detection-algorithm*))
and this forces it to always use UTF-8:
(setq system:*file-encoding-detection-algorithm*
'(utf-8-file-encoding))
The example in Example of using UTF-8 by default will use UTF-8 even if the file is known to contain bytes that cannot be in this encoding. As an alternative way to use UTF-8 when possible, you can modify the value of *specific-valid-file-encodings*.
(pushnew :utf-8 system:*specific-valid-file-encodings*)
The :element-type
argument in open and with-open-file
defaults to the value of *default-character-element-type*.
If element-type is not :default
, checks are made to ensure that the resulting stream's stream-element-type is compatible with its external format:
:input
or :io
, the element-type argument must be a supertype of the type of characters produced by the external format.:output
or :io
, the element-type argument must be a subtype of the type of characters accepted by the external formatIf the element-type argument does not satisfy these requirements, an error is signaled.
If element-type is :default
the system chooses the stream-element-type on the basis of the external format.
The LispWorks Editor uses open with :element-type :default
to read and write files. On reading a file, the external format is remembered and used when saving the file. On writing a Unicode (UTF-16) file, the Byte Order Mark is written.
It is possible to insert characters in the Editor (for example by pasting clipboard text) which are not supported by the chosen external format. This will lead to errors on attempt to save the buffer. You can handle this by setting the external format appropriately.
The Unicode Byte Order Mark (BOM) is treated as whitespace in the default readtable. This allows the Lisp reader to read a 16-bit (UTF-16 or BMP encoded) file regardless of whether the BOM is present. See 16-bit External formats guide for more information.
Some editors including Microsoft Notepad and the LispWorks editor write the BOM when writing a file with 16-bit (UTF-16 or BMP) encoding.
LispWorks User Guide and Reference Manual - 20 Sep 2017