International Allegro CL

$Revision: 5.0.2.8 $

The document introduction.htm provides an overview of the Allegro CL documentation with links to all major documents. The document index.htm is an index with pointers to every documented object (operators, variables, etc.) The revision number of this document is below the title. These documents may be revised from time to time between releases.

1.0 Introduction
2.0 Quick guide to changes
    2.1 Differences between International and standard Allegro CL
    2.2 New in release 5.0.1
3.0 Encoding for extended character sets
    3.1 Extended UNIX Code or EUC
    3.2 JIS
    3.3 Shift-JIS
    3.4 Process code or Fat
4.0 Changes to standard Lisp caused by fat character representation
    4.1 Reading and writing files in International Allegro CL
        4.1.2 Stream external formats and the open function
    4.2 Character functions applied to extended characters
5.0 Foreign functions
    5.1 Example 1: Foreign function expects EUC string
    5.2 Example 2: Foreign function expects Process-Code string
    5.3 Functions that support passing strings to and from foreign code
6.0 Miscellaneous and known problems
    6.1.1 Known problems
7.0 Installation
Index

This document describes the International version of Allegro CL (abbreviated IACL).

Note that most functions and variables described in this document only exist in IACL or only behave the way described in IACL. The ones that exist only in IACL do not have description pages. The description pages of the ones that exist outside IACL typically do not mention the IACL behavior. This document is the complete description of specific IACL functionality.

1.0 Introduction

International Allegro CL, or IACL, the international release of Allegro CL, is a complete implementation of Common Lisp supporting most of the same additional products. In addition, International Allegro CL supports an extended, international character set.

This document discusses additional functionality in International Allegro CL and describes what things must be done differently when you use International Allegro CL (compared with standard Allegro CL).

This document contains the following main sections:

  1. Introduction. The section you are now reading.
  2. Quick guide to changes. A set of bulleted paragraphs describe the main differences between standard Allegro CL and International Allegro CL. Other paragraphs mention things to watch out for when using International Allegro CL. Each issue is described briefly with references to further discussion elsewhere in this or another document.
  3. Encoding for extended character sets. This section describes different encodings of extended character sets. Those acceptable to International Allegro CL are discussed in detail.
  4. Changes to standard Lisp caused by fat character representation. Two subsections describe reading and writing files and changes to character functions and types.
  5. Foreign functions. Passing of strings between Lisp and C is quite different in International Allegro CL compared to standard Allegro CL. This section describes the changes affecting even users not exercising the extended character set capability.
  6. Miscellaneous. This section contains miscellaneous information about the release.
  7. Installation. This section describes installing International Allegro CL.

Following section 7, an appendix lists all added or changed functions, macros, variables, etc. in International Allegro CL. They are listed with brief descriptions and a reference to their definition in the main part of this document. There is an index at the end.

Procedures for reporting bugs and getting help for International Allegro CL are the same as for standard Allegro CL. See introduction.htm.

2.0 Quick guide to changes

These bulleted paragraphs briefly describe the differences between version 4.3 and version 4.2 International Allegro CL, and between International Allegro CL and standard Allegro CL. Each paragraph comes with pointers to additional discussion on the issue raised. Please read this additional discussion if the item may affect you.

2.1 Differences between International and standard Allegro CL

2.2 New in release 5.0.1

On Windows, the external format is MultiByte (shift-jis for Japanese). On Unix, the external format is EUC. In 5.0.1, EUC files cannot be read into a Windows IACL and MultiByte files cannot be read into a Unix IACL.

There are new functions in release 5.0.1

excl:string-to-native
excl:native-to-string
excl:string-to-mb
excl:mb-to-string
excl:mb-to-native
excl:native-to-mb
excl:native-character-sizeof
excl:native-string-sizeof
excl:with-native-string

These new functions unify and expand existing Allegro CL string to foreign function conversion routines. In these routines, the term native is used. For Version 5.0.1 of Allegro CL, native representations are always 8-bit representations. In the non-International releases of Allegro CL 5.0.1, there are no character data alterations when converting from Lisp strings to native strings.

In the International releases of Allegro CL 5.0.1, the character data is converted based on the external-format keyword argument of the conversion routines. Under Unix, the default external-format is :euc (for Extended Unix Code). Under Windows, the default external-format is :mb (for Windows MultiByte).

Note that while the :external-format keyword argument is available in all the conversion routines described below, Allegro CL does not currently support using an external-format other than the default.

The following diagram illustrate the new functions' purposes:

               string-to-native
             >-------->--------->--
            /                      \
           ^                        v
    lisp-string        native (char* (or lpcstr or lptstr))
           ^                        v
            \                      / 
             --<-------<----------<
                native-to-string

The term "mb-vector" is also used by new string conversion routines. An mb-vector is a lisp vector of type (simple-array (unsigned-byte 8) (*)) holding the native representation of a string as 8-bit bytes. It is often useful to manipulate multi-byte strings as vectors within lisp.

The following diagram illustrates the new functions' purposes:

             string-to-mb                                         mb-to-native
           >-------->--------->--                      >-------->------>--
         /                                     \                   /                                \
       ^                                       v               ^                                  v
  lisp-string                               mb-vector                           native (eg, char *)
       ^                                       v               ^                                  v
        \                                      /                   \                                /
         --<-------<----------<                       -<------<---------<
           mb-to-string                                          native-to-mb

3.0 Encoding for extended character sets

The standard USA encoding of characters uses 7-bit ASCII encoding. 7-bit encoding allows for 128 different characters, ample for the 52 lower and upper case letters and 76 other standard (*, +, -, @, etc.) and control and special (backspace, linefeed, etc.) characters. 7 bits is inadequate, however, for additional Indo-European alphabets (greek, cyrillic), let alone the requirements of Asian languages (such as Japanese, which has several alphabets and thousands of special symbols (kanji) derived from Chinese characters).

Several standards for extended character sets have been proposed and/or are in use. All are concerned with Japanese extensions; some also deal with additional Indo-European alphabets. This concentration on Japanese is not unexpected. First, the problem with Japanese is much more severe than, say, with Russian or Greek (where straightforward transliteration algorithms make translating between ASCII and the native character set bothersome but not impossible). Second, Japan has historically been and remains very important to the computer industry. In this document, we will concentrate on Japanese more than any other non-English language.

The four encodings we will describe are Extended UNIX Code or EUC; JIS; Shift-JIS; and Process code or Fat encoding. International Allegro CL supports fat encoding and a subset of EUC. These encodings support all the important elements of JIS and Shift-JIS although the exact mappings of characters to bits are usually not the same as used by JIS and Shift-JIS.

We describe the four encodings next.

3.1 Extended UNIX Code or EUC

EUC uses the fact that 7-bit ASCII characters are stored in 8-bit bytes with the high bit 0. EUC is a variable length encoding which is a superset of ASCII. All of ASCII is included (with high bit 0). Additional characters (with high bit 1) are also supported. A high bit of 1 indicates to the system that it is reading a non-ASCII EUC character. EUC is divided into four codesets. Here are the representations:

Codeset

EUC Representation

In IACL

Codeset 0 0xxxxxxx yes
Codeset 1 1xxxxxxx
1xxxxxxx 1xxxxxxx
1xxxxxxx 1xxxxxxx 1xxxxxxx
no
yes
no
Codeset 2 SS2 1xxxxxxx
SS2 1xxxxxxx 1xxxxxxx
SS2 1xxxxxxx 1xxxxxxx 1xxxxxxx
yes
no
no
Codeset 3 SS3 1xxxxxxx
SS3 1xxxxxxx 1xxxxxxx
SS3 1xxxxxxx 1xxxxxxx 1xxxxxxx
no
yes
no

Table 1: EUC encoding

In the remainder of this document, we often abbreviate the codesets as cs0, cs1, cs2, and cs3 (for codeset 0, codeset 1, codeset 2, and codeset 3, respectively). In any variable length representation, the value of the first byte must tell the system if additional bytes need to be read. The values SS2 and SS3 (in hex 0x8E and 0x8F respectively) are such markers, telling the system to read at least one more byte. More information is contained in the second and third (if present) bytes to tell the system whether to read a third or fourth. Those markers are not important to our purpose.

Note that standard 7-bit ASCII is cs0.

The final column of the chart tells whether a particular subset of a codeset can be represented in International Allegro CL. Extended characters are stored in International Allegro CL in fat representation, described below. We will discuss why some codings are included and some left out.

3.2 JIS

JIS stands for `Japanese Industry Standard'. It is the standard encoding of Japanese characters, more or less the Japanese equivalent of ASCII. `JIS' is actually a collective term for more specific standards. The two supported (in somewhat different format) in International Allegro CL are JIS-X0201 and JIS-X0208. JIS-X0201 is a one-byte representation using all 8 bits, encoding half-size Katakana characters. JIS-X0208 is a two byte representation encoding 6349 Kanji characters, 453 non-Kanji characters, control codes, punctuation, Roman, Greek, and Cyrillic alphabets. In JIS two byte representation, the high bit of each byte is set to 0 (i.e. 14 bits of information). In fat representation (described below) JIS-X0208 characters are stored in two bytes with the high bit set to 1. The remaining bits are the same.

3.3 Shift-JIS

Shift-JIS was created to work with MS-DOS. The use of the seven low bits of each byte by JIS conflicted with some MS-DOS control codes so Shift-JIS remaps the codes using the high bit. Use of the high-bit conflicts with some Unix control codes. In any case, Shift-JIS is a distinct, one-to-one coding of JIS.

3.4 Process code or Fat

The problem with variable length encodings for use within Lisp is that traversing a string is computationally expensive since each character must be examined in order to know where the next character starts. In fixed length encoding, you always know the nth character starts at the (* (- n 1) len) byte (where len is the number of bytes per character). Therefore, the coding of characters internal to Lisp must have fixed width. We have chosen a two byte representation which permits encoding a subset of the EUC encoding. The subset chosen includes all important characters sets including Roman, Greek, and Cyrillic alphabets, JIS-X0208 characters (6349 Kanji and 453 non-Kanji Japanese characters, including Hiragana and full-size Katakana), JIS-X0201 characters (half-size Katakana), as well as control codes and punctuation.

This encoding is used by the JLE version of SunOS, where it is called Process Code. That name does not seem to be standard, however. We have adopted a shorter name, fat (since the representation takes two bytes rather than one), which we use in the remainder of this document. The encoding is as follows. The codesets correspond to the EUC codesets given above.

Codeset

EUC Representation

Fat Representation

Implementation

CS0 0xxxxxxx 00000000 0xxxxxxx ACSII
CS1 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx JIS-X0208
CS2 SS2 1xxxxxxx 00000000 1xxxxxxx JIS-X0201
CS3 SS3 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx User-defined

Table 2: EUC and Fat equivalents

In release 5.0 of International Allegro CL, only standard ASCII files or files coded in EUC or fat representation can be read and written. Further, EUC files can only contain characters which have a fat representation. If you have a file in JIS or Shift-JIS representation, you must convert it to EUC before trying to read it into International Allegro CL. The operating system may provide tools for such conversion.

Nemacs includes a standalone program called kconv which can be used to convert from JIS and Shift-JIS to EUC. We will not discuss encodings other than fat and EUC further in this document. We assume that your files are in EUC or fat representation.

There are other encoding proposed or existing at this time. One is the Unicode representation (two bytes, all bits used). These representations are not supported in International Allegro CL at this time.

4.0 Changes to standard Lisp caused by fat character representation

In this section, we describe the changes to standard Common Lisp functions required to support an extended character set. Changes for foreign functions are described in the next section. Translating strings already in Allegro CL is described in the section after that.

The main visible differences between International Allegro CL and standard Allegro CL occur when reading and writing files. Secondarily, we define the action of standard character functions (e.g. char-upcase) on characters in the extended set.

4.1 Reading and writing files in International Allegro CL

For the most part, fasl files are compatible between standard and International Allegro CL. It is reasonable for the two kinds of Lisp images to share common fasl files because in practice few functions will actually need to be compiled differently. These are functions that do explicit inlined manipulation of character objects (e.g. char-code and friends) or which use inlined functions like schar to extract characters from strings. These functions are statistically rare. When compile-file encounters one, it compiles the function twice, once for each kind of image. When the resulting fasl file is loaded, the correct one is automatically selected. When compile encounters such a function, it compiles only the version appropriate to the running image.

Note that it is not possible to pass data between ICS and non-ICS images using the fasl-write/fasl-read functions (links are to separate description pages). fasl-write only understands the mode of the current executing Lisp and always writes in that format.

The file compiler can be controlled. compile-file takes a new keyword argument ics-modes whose value should be a list of one or two of the keywords :+ics and :-ics which enable compilation for that mode. The default value is taken from the variable comp:*ics-modes* whose initial value is (:-ics :+ics). In our opinion there is little reason to limit to one or another mode unless you are very sure no one will ever want to use your code in the other mode. The savings in fasl file size are small for typical Lisp code.

Sometimes it is necessary to conditionalize your code depending on whether it will be running on an ICS-capable system. A new special form excl:ics-target-case (link is to separate description page) provides this conditionalization, causing the compiler to compile the containing form in both modes. It can be used both at top level of a file compilation and inside a function body. Simple examples:

(excl:ics-target-case
    (:-ics (defvar banner "Regular Allegro"))
    (:+ics (defvar banner "ICS-Capable Allegro")))
 
(defun string-size-in-bytes (string)
  (* 8
     (ceiling (+ 5
       (excl:ics-target-case
          (:-ics (length string))
          (:+ics (* 2 (length string)))))
        8)))

Two alternate versions of both the defvar and the function will be file-compiled, and the correct version selected at fasl load time. In each of the examples, the ics-target-case form could have been wrapped around a different boundary in the code with equivalent effect, for example:

(defvar banner (excl:ics-target-case
   (:-ics "Regular Allegro")
   (:+ics "ICS-Capable Allegro")))

Each of the clauses has the syntax of a progn body, returning the values(s) of the last body form. It is allowed to omit one or the other clause of a excl:ics-target-case, for example if a function needs be defined in only one kind of Lisp.

(excl:ics-target-case
   (:+ics (defun kanji-radical (char) ...)))

The multiple-compilation behavior of excl:ics-target-case occurs only when it is processed during a file compilation. When encountered by compile or by the interpreter, the excl:ics-target-case special form simply selects for the mode of the currently running Lisp. The indented paragraph below describes an obscure issue of language semantics relating to excl:ics-target-case, particularly concerning code walkers. If you don't understand it, you can probably safely ignore it.

The multiple-compilation behavior of excl:ics-target-case involves complex interaction with the file compiler. excl:ics-target-case is therefore implemented as a special form. A macro definition exists for the benefit of macroexpand and other code walkers, but that macro only provides the non-file-compilation behavior that simply selects for the current Lisp type. This means that if you have your own code walkers that operate at file compilation time, they probably won't process excl:ics-target-case correctly. In the event this should affect you, the simplest thing would be to separate any excl:ics-target-case forms from forms that you need to walk. If this isn't practical, contact Franz technical support and it may be possible to extend your walker to handle the special form.

An ICS Lisp has :ics on its *features* list and reader conditionalization using #+ and #- do work. However, it is the intention that most application code will be conditionalized using excl:ics-target-case. This indeed is the way Allegro CL itself is now written. The advantage to Franz Inc. is that only a single set of fasl files need be maintained, and a single version of a patch file suffices for both ICS and non-ICS Lisps. Those who write applications on top of Allegro obtain the same benefits automatically. Since Allegro CL fasl files contain native machine code, fasl files must be differentiated for each of the several families of processors. The compatible ICS fasl file design avoids the need to double this number.

4.1.2 Stream external formats and the open function

In previous releases of International Allegro CL, character file streams supported only EUC and 16-bit process-code external formats. As a result, 8-bit extensions to standard 7-bit ASCII (used in many European language character sets) could not be supported since they collided with EUC multi-byte escape sequences.

In International Allegro CL character streams will support several external formats, controlled by the external-format keyword argument to open and friends. This argument existed in earlier releases, but starting in International Allegro CL 4.3, more values are supported giving more control over how characters are read from files.

[Function]
open

Arguments: filename &key external-format [other keyword args]

Package: common-lisp

The ANSI standard for Common Lisp added the external-format keyword argument for open. It is implemented in standard Allegro CL, but that version only accepts the value :default. The argument can have more values in International Allegro CL, as we describe here. Other features of open are the same as in standard Allegro CL will not be discussed in this document.

In International Allegro CL, the external-format argument can take any of the following four values:

:default: Use the value of excl:*default-external-format*. This is the default if no value is specified for this argument.

:euc: This says that the file connected to the stream is in EUC format. Note that 7-bit ASCII is a subset of EUC so all standard ASCII files will be read exactly as they are in standard Allegro CL. The only requirement on non-ASCII characters is that they have a fat representation. Note that the reader acts blindly, keying off the first byte and the high bit of the first byte. If the high bit is 0, the reader assumes an ASCII (cs0) character and reads the remainder of the byte only. If the high bit is 1, the reader checks if the first byte is SS2 or SS3 (hex values, recall, of 0x8E and 0x8F). If the first byte is SS2, one more byte is read. If it is SS3, two more bytes are read. If the first byte is neither SS2 or SS3, the character is assumed to be in cs1 and one more byte is read. The reader converts the characters to fat format as they are read. No checking is done to ensure that the fat format is valid. When writing a file, fat characters are converted to EUC format. This is the default value if no value is explicitly specified for the external-format argument.

:fat-string:
:process-code: These two equivalent values assume the file is in fat format. The reader reads two bytes at a time, creating characters without decoding or translation. When writing, two bytes are written for each character.

:ascii:
:8-bit: These two equivalent values assume the file uses one byte per character. This is equivalent to character streams in non-International Allegro CL. Note that 8-bit characters with the high bit 1 (used by many European language character sets to represent characters with accents and diacritical marks and for other purposes) will not be interpreted as fat (multi-byte) characters when read in this mode.

You can get the International Allegro CL 5.0 behavior by setting excl:*default-external-format* to :euc, which is the default value established by build-lisp-image. A site that wants to build an ICS system that is capable of running, say, in Japan, but which wants to develop in Europe, might override this to :ascii. All that is necessary to run the delivered system under EUC (which would allow EUC text to be read and printed) is to execute something like

(setf excl:*default-external-format*
      (setf (stream-external-format *terminal-io*) :euc))

at startup time, perhaps in sys:siteinit.cl or .clinit.cl, or else before executing excl:dumplisp to create the application.

[Function]
load

Arguments: filename &key external-format [other keyword args]

Package: common-lisp

This function also takes an external-format keyword argument. Possible values are the same as for open, described above. The default is the value of :default, which means use the value of excl:*default-external-format*.

The function file-string-length can be used to assist in determining how many bytes are required for a string or character in a file. The formal definition of this function is:

[Function]
file-string-length

Arguments: file-stream object

Package: common-lisp

Returns the number of bytes needed to output a string or a character which is the value of object to the stream file-stream. The external format of files-stream is used to make the calculation. (If the external format is :process-code or :fat-string, the result is twice the number of characters in object.) object must be a character or string; file-stream must be a stream connected to a file.

4.2 Character functions applied to extended characters

All character functions behave on extended characters just as they behave in standard Allegro CL on ASCII characters. Therefore, characters in Codeset 0 have exactly the same behavior. When applied to characters in Codesets other than 0, the following standard Lisp functions do the following:

upper-case-p returns nil.
lower-case-p returns nil.
both-case-p returns nil.
alphanumericp returns nil.
alpha-char-p returns nil.
char-upcase returns a non-Codeset-0 character unchanged.
char-downcase returns a non-Codeset-0 character unchanged.
char-int returns a unique integer code for every character. When applied to a ASCII (cs0) character, the same value is returned as in standard Allegro CL.
char-name returns nil when applied to any non-ASCII character.
int-char is the inverse of char-int (see just above).
graphic-char-p returns true on all non-ASCII characters.

Pre-ANSI Common Lisp specified that characters may have font and bits information associated with characters. While these have been removed from the standard, they are still supported in Allegro CL (symbols naming associated functions are in the cltl1 package). These attributes are supported for non-ASCII (non-cs0) characters in both International and non-International Allegro CL

The type hierarchy has also been changed. The following types have been added (the list of types following each bullet are equivalent). All are in the excl package.

The character hierarchy looks as follows:

ics-classes.jpg (5665 bytes)

Everything below string-char has 0 bits and font attributes. Standard Common Lisp follows the left fork, with the base-character and ascii types added between string-char and standard-char.

5.0 Foreign functions

The foreign function interface in International Allegro CL works just like that in standard Allegro CL except when you want to pass strings between Lisp and foreign code. The foreign function may expect string arguments to be in fat format or in EUC format. You (presumably) know which is required by the foreign code you load into Lisp. We have provided a number of functions which can convert between the different representations.

Before we formally define the functions, let us consider some examples.

5.1 Example 1: Foreign function expects EUC string

Here is a simple C function which expects an EUC string and prints it out.

pcharst(st)
char *st;
{
    printf("st=%s\n", st);
}

If we are to call this function from within Lisp, we must convert from fat format (used internally by Lisp) to EUC format. Here are two different solutions using similar def-foreign-call forms but different functions (excl:string-to-mb and excl:mb-to-native) for extracting the string.

Method 1: We can convert the string we want to pass to C into EUC format and store the result in an (unsigned-byte 8) array. That is what the function string-to-euc does.

USER(12): (ff:def-foreign-call pcharst ((st (* :char)))
  :returning :void)
t
USER(13): (pcharst (excl:string-to-mb "Nihongo Allegro CL"))
st=Nihongo Allegro CL
0
USER(14):

Method 2: We can instead pass a pointer to a (char *) created by Lisp. The (char *) array is not a Lisp array and is stored in space created by malloc for storing non-Lisp objects. The pointer to that array is an integer so Lisp passes that integer to C.

USER(15): (ff:def-foreign-call pcharst ((st (* :char))) :returning :void)
t
USER(16): (pcharst (excl:mb-to-native
            (excl:string-to-euc "Nihongo Allegro CL")))
st=Nihongo Allegro CL
0
USER(17):

5.2 Example 2: Foreign function expects Process-Code string

In this example, the C function expects a string in fat format. It prints the string out to the operating system standard output.

typedef unsigned short wchar;

pwcharst(wst)
wchar *wst;
{
    while (*wst) {
          printf("%c", *wst++);
    }
    printf("\n");
}

Again, we propose two different solutions.

Method 1: Since strings are stored internally in Lisp in fat format, we can declare that the argument to the C function as string in the call to def-foreign-call.

USER(20): (ff:def-foreign-call pwcharst ((wst (* :short))) :returning :void)
t 
USER(21): (pwcharst "Charley Cox")
Charley Cox
nil
USER(22):

Method 2: Alternatively, we can pass an integer which is a pointer to the first 2-byte character of the string by using the function string-to-wchar*.

USER(23): (ff:def-foreign-call pwcharst ((wst (* :short))) :returning :void)
t 
USER(24): (pwcharst (ff:string-to-wchar* "Charley Cox"))
Charley Cox
nil 
USER(25):

5.3 Functions that support passing strings to and from foreign code

Given these examples, we now describe the new conversion functions for International Allegro CL. Most function discussed in this section are in the foreign-functions package (nicknamed ff). Others are in the excl package.

[Function]
euc-to-char*

Arguments: eucvector &optional address

Package: foreign-functions

ff:euc-to-char* should be considered obsolete. New code should use excl:mb-to-native. See also excl:with-native-string. Old descriotion: Converts a Lisp (simple-array (unsigned-byte 8) 1) of EUC characters to a C string by copying. If address is specified, then that address is used. Otherwise, the system malloc (memory allocator) is used to make space for the target (char *) string.

[Function]
char*-to-euc

Arguments: address

Package: foreign-functions

ff:char*-to-euc should be considered obsolete. New code should use excl:native-to-mb. Old description: Converts a C-style (char *) string (which must be null-terminated) to a Lisp (unsigned-byte 8 (*)) array. The resulting array consists of the characters from the source string in EUC code.

[Function]
wchar*-to-string-length

Arguments: address

Package: foreign-functions

Computes and returns the length of the C style (null-terminated) fat (2-bytes per character) string by looking for the null terminator.

[Function]
wchar*-to-string

Arguments: address

Package: foreign-functions

Converts a C Process-Code string to a Lisp string by copying the characters starting at address, which must be an integer.

[Function]
string-to-wchar*

Arguments: string &optional address

Package: foreign-functions

ff:string-to-wchar* has been changed to return the following:

(excl:string-to-native string :address address :external-format :16-bit)

Users are encouraged to use excl:string-to-native instead of ff:string-to-wchar* for new code. See also excl:with-native-string. Old description: Converts a Lisp string to a C style fat format string by copying. If address is specified, then that address is used. Otherwise, the system malloc (memory allocator) is used to make space for the target character array.

[Function]
string-to-euc

Arguments: string &key null-terminate

Package: foreign-functions

excl:string-to-euc has been changed to return the following:

(string-to-mb string :null-terminate null-terminate :external-format :euc)

Users are encouraged to use excl:string-to-mb instead of excl:string-to-euc for new code. See also excl:with-native-string. Old description: Creates a Lisp (unsigned-byte 8 (*)) array containing the EUC character translations for the characters in string. If the value of the :null-terminate keyword argument is t (the default), then a null character is placed at the end of the result array. (When passed to C, of course, the Lisp array looks like a C string containing EUC characters.)

[Function]
euc-to-string

Arguments: eucvector &key drop-last-null

Package: excl

excl:euc-to-string has been changed to return the following:

(mb-to-string eucvector :end (if* drop-last-null
                then (position 0 eucvector)
                else (length eucvector))
        :external-format :euc)

Users are encouraged to use excl:mb-to-string instead of excl:euc-to-string for new code. Old description: Convert EUC vector (which must have type (unsigned-byte 8 (*)) to a Lisp string. If the value of the drop-last-null keyword argument is t (the default), then if the last character in eucvector is a null character, it is not included in the Lisp string.

To use any of the functions in the foreign-functions package, you may have to load foreign.fasl into Lisp. You do this by evaluating the form (require :foreign). You need not load that module to use excl:string-to-euc or excl:euc-to-string. Foreign functions are described in foreign_functions.htm but all the information on its interaction with non-standard character sets is in this document.

Because strings have a different representation in International Allegro CL than in standard Allegro CL, some examples in the foreign_functions.htm will not work in International Allegro CL. Hence the following warning:

Warning: The mechanisms described in 6.0 Passing strings between Lisp and C in foreign_functions.htm for passing arrays of strings from Lisp to C and for passing a string from C to Lisp do not work with International Allegro CL.

6.0 Miscellaneous and known problems

Many terminal drivers strip off (force to be 0) the high bit of every byte of input before passing it on to the application expecting the output. Since most characters on the keyboard are ASCII characters anyway, this does not affect input typed directly to Lisp. It can effect you, however, if you are using an input tool which supports EUC (e.g kterm, mule, nemacs, etc.). In that case, your input (from, e.g., a Japanese keyboard or an input tools which passes 8 bits) will be incorrect if the high bit of every byte is stripped.

When International Allegro CL is invoked, it checks to see if the input terminal driver is allowing all 8 bits of bytes to be passed as input. If the terminal driver is stripping the high bit and only passing 7 bits, you will see the following at International Allegro CL startup:

% cli
Allegro CL 5.0.1 [SPARC; R1] with EUC/Japanese (8/12/98 3:19)
Copyright (C) 1985-1999, Franz Inc., Berkeley, CA, USA. All Rights Reserved.
Warning: Terminal driver is stripping 8th bit from input. To accept 8-bit
input, run the lisp function (set-8-bit-input). This is equivalent
to executing `stty -istrip' from the shell.
;; Optimization settings: safety 1, space 1, speed 1, debug 2
;; For a complete description of all compiler switches given the current
;; optimization settings evaluate (explain-compiler-settings).
user(1):

If you see this message, you can choose to ignore it in which case this Allegro session will not be able to properly read EUC characters typed or piped directly to it. Otherwise, the Lisp will behave normally. If this is a problem for you, you can call the following function (which changes your terminal environment to suppress stripping the high bit). The function is in the excl package.

[Function]
set-8-bit-input

Arguments:

Package: excl

Performs the UNIX ioctl that tells the terminal driver not to strip off the 8th bit from data input. This is equivalent to executing stty -istrip to the shell. Note that this function has a side effect on the terminal driver which remains in effect after the Lisp process has exited. To re-enable 8th bit stripping after the Lisp is exited, one can execute stty istrip at the shell.

6.1.1 Known problems

7.0 Installation

The excl:build-lisp-image program (used to build Allegro CL images) builds an IACL image if the running image (the one from which excl:build-lisp-image was called) is an IACL image. There is no way to build a non-IACL image from an IACL image and no way to build an IACL image from a non-IACL image.

Appendix A: List of changed or added functions, etc.

This appendix contains a listing of functions, macros, variables, etc. in International Allegro CL which are either not in standard Allegro CL or which are changed in some way from standard Allegro CL.

We give only brief descriptions of the functions here. The complete descriptions are in the main part of this document. We provide a reference to the documentation in the main part.

We list symbols in alphabetical order ignoring the package qualifier. A package qualifier is supplied with each symbol. The packages are excl, ff (foreign-functions), and lisp (common-lisp).

If the name has a link, it is to the fuller description in this document above unless otherwise indicated.

Table 3: Information on changed or added functions, etc.
Name Arguments (if applicable) Notes
cl:alphanumericp [Function] char Accepts an extended character argument and returns nil. See 4.2 Character functions applied to extended characters for more information.
cl:alpha-char-p [Function] char Accepts an extended character argument and returns nil. See 4.2 Character functions applied to extended characters for more information.
excl:ascii [Character Type] Character type. See 4.2 Character functions applied to extended characters for more information.
cl:both-case-p   [Function] char Accepts an extended character argument and returns nil. See 4.2 Character functions applied to extended characters for more information
cl:char-code-limit [Function] nil The character code size supported by IACL is 16 bits, giving a value for this constant of 2^16 or 65536. In a non-International ACL image, the char code size is 8-bits, meaning that the largest value char-code can ever return is 255. However, since both kinds of images can compile code files loadable by the other, and since the language requires that the value of a constant not change between compile time and load time, starting with release 4.3 the value of the char-code-limit constant is 65536 in each kind of image. The ANSI language standard specifically allows char-code-limit to be larger that the maximum actual value the implementation may support. See the description page or the entry below in this table for excl:real-char-code-limit, which is a variable whose value is the actual limit in the running image.
cl:char-downcase [Function] char Accepts and returns unchanged extended character arguments. See 4.2 Character functions applied to extended characters for more information.
cl:char-int [Function] char Returns a unique integer code for each character. The codes for ASCII (cs0) characters are the same in both an ICS and a non-ICS Lisp. See 4.2 Character functions applied to extended characters for more information.
cl:char-name [Function] char Accepts an extended character argument and returns nil. See 4.2 Character functions applied to extended characters for more information.
cl:char-upcase [Function] char Accepts and returns unchanged extended character arguments. See 4.2 Character functions applied to extended characters for more information.
ff:char*-to-euc [Function] address This function is obsolete. New code should use excl:native-to-mb (link to a separate document). The old description: Converts a C-style (char *) string to a Lisp (unsigned-byte 8 (*)) array. See 5.0 Foreign functions for more information.
excl:codeset-0 [Character Type] See 4.2 Character functions applied to extended characters for more information.
excl:codeset-1 [Character Type] See 4.2 Character functions applied to extended characters for more information.
excl:codeset-2 [Character Type] See 4.2 Character functions applied to extended characters for more information.
excl:codeset-3 [Character Type] See 4.2 Character functions applied to extended characters for more information.
excl:*default-external-format*
(link is to separate description page)
[Variable]
This variable supplies the default value for the external-format argument to open, load, and related functions. The allowable values are :default, :euc, :fat-string and :process-code, :ascii and :8-bit. See the discussion of open in 4.1 Reading and writing files in International Allegro CL for the meaning of these values.
cl:digit-char-p [Function] char Accepts an extended character argument and returns nil. See 4.2 Character functions applied to extended characters for more information.
ff:euc-to-char* [Function] eucvector &optional address This function is obsolete. New code should use excl:mb-to-native (separate document). See also excl:with-native-string.Old description: Converts a Lisp vector of EUC characters to a C string by copying. See 5.0 Foreign functions for more information.
excl:euc-to-string [Function] eucvector &key drop-last-null excl:euc-to-string has been changed to return the following:

(mb-to-string eucvector :end (if* drop-last-null
                then (position 0 eucvector)
                else (length eucvector))
        :external-format :euc)

Users are encouraged to use excl:mb-to-string instead of excl:euc-to-string for new code.Old description: Convert EUC vector to a Lisp string. The default value of the :drop-last-null keyword argument is t. See 5.0 Foreign functions for more information.

excl:native-to-string [Function] address &key string make-string? length (external-format :default) This function converts (according to the :external-format argument) and copies the string data from the memory location specified by address into a lisp string. The string is returned. The number of characters copied to the string is returned as the second value.
excl:mb-to-string [Function] mb-vector &key string make-string? (start 0) (end (or (position 0 mb-vector :start start) (length mb-vector))) (external-format :default) This function converts (according to the :external-format argument) and copies the string data from the subsequence of mb-vector denoted by the :start and :end arguments into a lisp string. The string is returned. The number of characters copied to the string is returned as the second value.
excl:file-string-length [Function] file-stream object Returns the number of bytes needed to output a string or a character which is the value of object to the stream file-stream. See  4.1 Reading and writing files in International Allegro CL for more information.
excl:gaiji [Character Type] See 4.2 Character functions applied to extended characters for more information.
cl:graphic-char-p [Function] char See 4.2 Character functions applied to extended characters for more information.
excl:half-sized-kana
[Character Type]
See 4.2 Character functions applied to extended characters for more information.
excl:half-size-kana
[Character Type]
See 4.2 Character functions applied to extended characters for more information.
excl:half-sized-katakana
[Character Type]
See 4.2 Character functions applied to extended characters for more information.
excl:half-size-katakana
[Character Type]
See 4.2 Character functions applied to extended characters for more information.
excl:ics-target-case (link is to separate description page)
[Function]
&rest clauses See 4.1 Reading and writing files in International Allegro CL for more information on this special operator. Each clause is a list that starts with either :+ics or :-ics followed by one or more forms (which are evaluated as if in a progn is the keyword at the start of the clause applies to the running Lisp).
cl:int-char [Function] int The inverse of cl:char-int. See 4.2 Character functions applied to extended characters for more information.
cl:load [Function] filename &key external-format [and other keyword args] As with open, the external-format argument can be used to allow loading or source files containing characters of various extended formats. See 4.1 Reading and writing files in International Allegro CL for more information.
cl:lower-case-p   [Function] char Accepts and returns nil when passed extended character arguments. See 4.2 Character functions applied to extended characters for more information.
cl:open  [Function] filename &key external-format [and other keyword args] This function accepts the :external-format keyword argument. The value of that argument can be :default, :ascii, :8-bit, :euc, :fat or :process-code. The changes to open are described in 4.1 Reading and writing files in International Allegro CL.
excl:real-char-code-limit (link is to separate description page)
[Variable]
This variable is the actual largest char code supported by the running Lisp image. The value is 256 in a non-IACL image and 65536 in an IACL image. The value is never changed by the implementation, but the semantics of the language do not permit this to be a constant. See char-code-limit  above in this table for further information.
excl:set-8-bit-input  [Function] nil Performs the UNIX ioctl that tells the terminal driver not to strip off the 8th bit from data input. See #6.0 Miscellaneous and known problems for more information.
excl:string-to-euc  [Function] string &key null-terminate excl:string-to-euc has been changed to return the following:

(string-to-mb string :null-terminate null-terminate :external-format :euc)

Users are encouraged to use excl:string-to-mb instead of excl:string-to-euc for new code. See also excl:with-native-string. Old description: Creates a Lisp (unsigned-byte 8 (*)) array containing the EUC character translations for the characters in string. The default value of the :null-terminate keyword argument is t. See 5.0 Foreign functions for more information.

excl:string-to-native [Function] string &key (start 0) (end (length string)) address (external-format :default)) This function converts (according to the :external-format argument) and copies the string data from indices specified by :start to :end out of the lisp string into static (ie, non-lisp heap) memory and returns an address to the first character of that data.
excl:string-to-mb [Function] string &key (null-terminate t)  (start 0) (end (length string)) mb-vector make-mb-vector? (external-format :default) This function converts (according to the :external-format argument) and copies the string data from indices specified by :start to :end out of the lisp string into a lisp array of type (simple-array (unsigned-byte 8) (*)). This array is returned.
excl:mb-to-native [Function] vector8 &key address (length (position 0 vector8)) This function copies the 8-bit bytes from vector8, a (simple-array (unsigned-byte 8) (*)) array, into static (ie, non-lisp heap) memory and returns an address to the first character of that data.
excl:native-to-mb [Function] address &key vector length This function copies 8-bit byte data from the memory location specified by address into a lisp vector of type (simple-array (unsigned-byte 8) (*)). This vector is returned.
ff:string-to-wchar*  [Function] string &optional address ff:string-to-wchar* has been changed to return the following:

(excl:string-to-native string :address address :external-format :16-bit)

Users are encouraged to use excl:string-to-native instead of ff:string-to-wchar* for new code. See also excl:with-native-string. Old description: Converts a Lisp string to a C style fat format string by copying. See 5.0 Foreign functions for more information.

cl:upper-case-p [Function] char Accepts and returns nil when passed extended character arguments. See 4.2 Character functions applied to extended characters for more information.
ff:wchar*-to-string [Function] address Converts a C Process-Code string to a Lisp string by copying the characters starting at address. See 5.0 Foreign functions for more information.
ff:wchar*-to-string-length [Function] address Computes and returns the length of the C style (null-terminated) fat (2-bytes per character) string by looking for the null terminator. See 5.0 Foreign functions for more information.
excl:with-native-string [Macro] (string-var string-exp &key (start 0) end native-length-var (external-format :default)) &body body This macro provides an efficient, portable, and non-garbage (from the lisp garbage collector's point of view) way of converting lisp-strings to addresses acceptable for foreign functions expecting native string arguments.

Index

A

Allegro Common Windows (not supported with IACL 5.0) Section 2.1
Allegro Composer (not supported with IACL 5.0) Section 2.1
alpha-char-p (function, common-lisp package) Section 4.2, Table 3
alphanumericp (function, common-lisp package) Section 4.2, Table 3
ascii (character type, excl package) Section 4.2, Table 3

B

both-case-p (function, common-lisp package) Section 4.2, Table 3

C

char*-to-euc (function, foreign functions package) Table 3
character sets section 3
character type hierarchy in IACL Section 4.2
char-code-limit (constant, common-lisp package) Table 3
char-downcase (function, common-lisp package) Section 4.2, Table 3
char-int (function, common-lisp package) Section 4.2, Table 3
char-name (function, common-lisp package) Section 4.2, Table 3
char-upcase (function, common-lisp package) Section 4.2, Table 3
codeset-0 (character type, excl package) Section 4.2, Table 3
codeset-1 (character type, excl package) Section 4.2, Table 3
codeset-2 (character type, excl package) Section 4.2, Table 3
codeset-3 (character type, excl package) Section 4.2, Table 3
codesets 1, 2, and 3 Table 1
cs0 (codeset 0) Table 2
cs1 (codeset 1) Table 2
cs2 (codeset 2) Table 2
cs3 (codeset 3) Table 2

D

*default-external-format* (variable, excl package) Table 3
digit-char-p (function, common-lisp package) Section 4.2, Table 3

E

EUC (Extended UNIX Code character set) Section 3.1
euc-to-char* (function, foreign-functions package) Table 3
euc-to-string (function, excl package) Table 3
Extended character support (in IACL) Section 3
Extended character type hierarchy (in IACL) Section 4.2
Extended UNIX Code (character set) Section 3.1

F

fasl files, in IACL Section 2.1
fasl files, sharing between International and standard images Section 2.1
fasl-read (function, excl package) Section 4.1
fasl-write (function, excl package) Section 4.1
Fat (character set) Section 3.4
files (source and text) Section 2.1
file-string-length (function, excl package) Section 4.2, Table 3
Foreign functions (in IACL) Section 5

G

gaiji (character type, excl package) Section 4.2, Table 3
graphic-char-p (function, common-lisp package) Section 4.2, Table 3

H

half-sized-kana (character type, excl package) Section 4.2, Table 3
half-sized-katakana (character type, excl package) Section 4.2, Table 3
half-size-kana (character type, excl package) Section 4.2, Table 3
half-size-katakana (character type, excl package) Section 4.2, Table 3

I

*ics-modes* (variable, compiler package) Section 4.1
ics-target-case (special form, excl package) [link is to separate description page]
       Table 3
int-char (function, common-lisp package) Section 4.2, Table 3

J

Japanese Industry Standard (character sets) Section 3.2
JIS (Japanese Industry Standard character sets) Section 3.2

K

kanji (character type, excl package) Section 4.2, Table 3

L

load (function, common-lisp package) Table 3
Loading files (in IACL) Section 2.1
lower-case-p (function, common-lisp package) Section 4.2, Table 3

M

mb-to-native (function, excl package) Table 3
mb-to-string (function, excl package) Table 3
Mule (Japanese language version of Emacs) Section 2.1

N

native-character-sizeof (function, excl package)
native-string-sizeof (function, excl package)
native-to-mb (function, excl package) Table 3
native-to-string (function, excl package) Table 3

O

open (function, common-lisp package) Table 3

P

passing strings to foreign code (in IACL) Section 5.0
Process code (character set) Section 3.4

R

real-char-code-limit [link to separate description page]
      (variable, excl package) Table 3

S

set-8-bit-input (function, excl package) Table 3
Shift-JIS (character set) Section 3.3
source files (in IACL) Section 2.1
string-to-euc (function, excl package) Table 3
string-to-mb (function, excl package) Table 3
string-to-native (function, excl package) Table 3
string-to-wchar* (function, foreign-functions package) Table 3

T

Terminal driver (and IACL) Section 2.1
text files (in IACL) Section 4.2

U

upper-case-p (function, common-lisp package) Section 4.2, Table 3

W

wchar*-to-string (function, foreign functions package) Table 3
wchar*-to-string-length (function, foreign functions package) Table 3

Copyright (C) 1998-1999, Franz Inc., Berkeley, CA. All Rights Reserved.