From ee07b7b5a4d626cc45d941aa34edd63933af6a1d Mon Sep 17 00:00:00 2001 From: =?utf8?q?Rapha=C3=ABl=20Van=20Dyck?= Date: Thu, 1 Jan 2026 14:27:49 +0100 Subject: [PATCH] add reference manual --- src/ide.jsx | 2 +- system-files/BIBLIOGRAPHY | 5 +- system-files/REFERENCE-MANUAL | 151 ++++++++++++++++++++++++++++++++++ system-files/all-caps.css | 67 ++++++++++++++- system-files/all-caps.js | 3 +- 5 files changed, 222 insertions(+), 6 deletions(-) diff --git a/src/ide.jsx b/src/ide.jsx index b963eda..affa9c0 100644 --- a/src/ide.jsx +++ b/src/ide.jsx @@ -1230,7 +1230,7 @@ function init(systemFiles) { init([ 'USER-MANUAL', 'TUTORIAL', - //'REFERENCE-MANUAL', + 'REFERENCE-MANUAL', //'IMPLEMENTATION-NOTES', 'BIBLIOGRAPHY', 'LICENSE', diff --git a/system-files/BIBLIOGRAPHY b/system-files/BIBLIOGRAPHY index 1a988ce..cdeec8d 100644 --- a/system-files/BIBLIOGRAPHY +++ b/system-files/BIBLIOGRAPHY @@ -11,15 +11,18 @@

Bibliography

Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs, second edition, 1996, MIT Press

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, FranÃ§ois Yergeau (Editors), Extensible Markup Language (XML) 1.0, fifth edition, 2008, https://www.w3.org/TR/2008/REC-xml-20081126/

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and FranÃ§ois Yergeau (Editors), Extensible Markup Language (XML) 1.0, fifth edition, 2008, https://www.w3.org/TR/2008/REC-xml-20081126/

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, FranÃ§ois Yergeau, and John Cowan (Editors), Extensible Markup Language (XML) 1.1, second edition, 2006, https://www.w3.org/TR/2006/REC-xml11-20060816/

R. Kent Dybvig, The Scheme Programming Language, fourth edition, 2009, MIT Press

Daniel P. Friedman and Mitchell Wand, Essentials of Programming Languages, third edition, 2008, MIT Press

Paul Graham, On Lisp: Advanced Techniques for Common Lisp, 1994, Prentice Hall

Paul Graham, ANSI Common Lisp, 1996, Prentice Hall

Richard Kelsey, William Clinger, and Jonathan Rees (Editors), Revised⁵ Report on the Algorithmic Language Scheme, 1998, search for “R5RS” on the internet to get a link

Peter Norvig, Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp, 1992, Morgan Kaufmann Publishers

Kent Pitman, Common Lisp HyperSpec, 2005, https://www.lispworks.com/documentation/HyperSpec/Front/index.htm

Christian Queinnec, Lisp in Small Pieces, 2003, Cambridge University Press

Michael Sperber, R. Kent Dybvig, Matthew Flatt, and Anton van Straaten (Editors), Revised⁶ Report on the Algorithmic Language Scheme, 2007, https://www.r6rs.org/

Guy L. Steele Jr., Common Lisp: The Language, second edition, 1990, Digital Press

The Unicode Consortium, The Unicode Standard, Version 17.0.0, 2025, https://www.unicode.org/versions/Unicode17.0.0/

diff --git a/system-files/REFERENCE-MANUAL b/system-files/REFERENCE-MANUAL index 785c083..a31c944 100644 --- a/system-files/REFERENCE-MANUAL +++ b/system-files/REFERENCE-MANUAL @@ -9,5 +9,156 @@ +

Reference Manual

The reference manual provides a detailed account of the programming language. It supplements the user manual (particularly the sections “Programming Language” and “Listener Buffers”) and the tutorial.

Syntax

Introduction

Listener Buffers

Let's examine what happens when the form (+ 123 456) is evaluated in a listener buffer:

> (+ 123 456)
579

The process can be broken down into the following five steps:

Step 1 The sequence of characters (+ 123 456) is read from the listener buffer.

Step 2 The reader converts the sequence of characters (+ 123 456) into the list (+ 123 456). (The sequence of characters is a readable representation of the list.) This step can be broken down into the following two substeps:

Step 2.1 A component of the reader called the tokenizer converts the sequence of characters (+ 123 456) into a sequence of tokens. A token consists of the following pieces of information bundled together: a category and, if required by the category, a value. Each token category has an associated pattern and each token has an associated lexeme. A lexeme is a sequence of contiguous characters extracted from the input sequence of characters. The lexeme associated with a token must match the pattern associated with the token's category and the sequence of characters resulting from the concatenation of the lexemes associated with the tokens must match the input sequence of characters. The sequence of characters (+ 123 456) is converted into the following sequence of tokens:

A token of category opening-parenthesis whose associated lexeme is (. This token has no value.
A token of category variable whose associated lexeme is +. The value of this token is the object of type variable whose name is +.
A token of category whitespace whose associated lexeme is . This token has no value.
A token of category number whose associated lexeme is 123. The value of this token is an object of type number representing the mathematical number $123$.
A token of category whitespace whose associated lexeme is . This token has no value.
A token of category number whose associated lexeme is 456. The value of this token is an object of type number representing the mathematical number $456$.
A token of category closing-parenthesis whose associated lexeme is ). This token has no value.

Note that some whitespace is needed to separate the lexeme + from the lexeme 123 and the lexeme 123 from the lexeme 456 but no whitespace is needed to separate the lexeme ( from the lexeme + or the lexeme 456 from the lexeme ).

Step 2.2 A component of the reader called the parser converts the sequence of tokens from step 2.1 (minus the tokens of category whitespace, which are ignored by the parser) into a cons whose car is the variable from step 2.1 and whose cdr is a cons whose car is the first number from step 2.1 and whose cdr is a cons whose car is the second number from step 2.1 and whose cdr is the empty list. Together, those three conses represent the list (+ 123 456).

Step 3 The evaluator evaluates the list (+ 123 456) to the number 579. The evaluation of the top-level form (+ 123 456) entails the evaluation of other non-top-level forms. Each form must be classified in order to determine how it should be evaluated. The form (+ 123 456) is classified as a plain function call. The variable + is treated as an abbreviation for the form (fref +). The form (fref +) is classified as an fref-form. The forms 123 and 456 are classified as self-evaluating objects. Because the global function + is a closure, its invocation entails the evaluation (and thus the classification) of other forms. The component of the evaluator responsible for classifying forms is called the syntax analyzer.

Step 4 The printer converts the number 579 into the sequence of characters 579. (The sequence of characters is the printable representation of the number.)

Step 5 The sequence of characters 579 is written into the listener buffer.

EVLambda has three levels of syntax:

The token level contains the rules used by the tokenizer to convert a sequence of characters into a sequence of tokens.
The object level contains the rules used by the parser to convert a sequence of tokens into a sequence of objects.
The form level contains the rules used by the syntax analyzer to classify forms.

Each level of syntax is described later in its own section.

EVLambda Source Files

The reader is used not only to convert the characters typed into a listener buffer into an object but also to convert the characters contained inside an EVLambda source file into a sequence of objects.

EVLambda source files come in two varieties: the plain EVLambda source files, which contain only EVLambda source code, and the documented EVLambda source files, which contain a mix of EVLambda source code and documentation in XML format.

Here is an example of a plain EVLambda source file:

(fdef fact (n)
  (if (= n 0)
      1
    (* n (fact (- n 1)))))

(test 1 (fact 0))
(test 120 (fact 5))
(test 3628800 (fact 10))

(fdef fib (n)
  (if (= n 0)
      0
    (if (= n 1)
        1
      (+ (fib (- n 1)) (fib (- n 2))))))

(test 0 (fib 0))
(test 5 (fib 5))
(test 55 (fib 10))

Here is an example of a documented EVLambda source file:

<chapter>
<title>Recursive Functions</title>
<para>...para...</para>
<para>...para...</para>
<section>
<title>Factorial Function</title>
<para>...para...</para>
<para>...para...</para>
(fdef fact (n)
  <para>...block...</para>
  <para>...block...</para>
  (if (= n 0)
      1 <comment>...eol...</comment>
    (* n (fact (- n 1))))) <comment>...eoll...</comment>

(test 1 (fact 0))
(test 120 (fact 5))
(test 3628800 (fact 10))
</section>
<section>
<title>Fibonacci Sequence</title>
<para>...para...</para>
<para>...para...</para>
(fdef fib (n)
  <para>...block...</para>
  <para>...block...</para>
  (if (= n 0)
      0 <comment>...eol...</comment>
    (if (= n 1)
        1 <comment>...eol...</comment>
      (+ (fib (- n 1)) (fib (- n 2)))))) <comment>...eoll...</comment>

(test 0 (fib 0))
(test 5 (fib 5))
(test 55 (fib 10))
</section>
</chapter>

Documented EVLambda source files can be converted to HTML by a component of the programming language called the documentation generator.

Extensible Markup Language (XML)

An XML document is a annotated text document. An XML document is divided into two intermingled parts: the character data (the content) and the markup (the annotations). Markup can take many forms. Documented EVLambda source files use the following forms of markup:

start tags (without attributes): <chapter>, <section>, <title>, <para>, <comment>, …
end tags: </chapter>, </section>, </title>, </para>, </comment>, …
empty-element tags (without attributes): <br/>, …
comments: , …
entity references: < (refers to the character <), > (refers to the character >), & (refers to the character &), …
character references (decimal representation): ⏎ (refers to the character ⏎), …
character references (hexadecimal representation): ⏎ (refers to the character ⏎), …

Many constraints must be satisfied for an XML document to be well-formed. The main well-formedness constraints are noted below.

Well-formedness constraint: Start tags and end tags must appear in pairs. In each pair, the start tag must precede the end tag and both tags must have the same name.

An element is a sequence of characters delimited by a pair of start and end tags or by an empty-element tag. The characters of the delimiting tags or tag belong to the element. The characters that belong to an element but not to its delimiting tags or tag constitute the content of the element. The content of an element delimited by an empty-element tag is empty. The name of an element is the name of its delimiting tags or tag.

Well-formedness constraint: Elements must not overlap. Let $X$ and $Y$ be two distinct elements. One of the following conditions must be true: $X$ precedes $Y$, $Y$ precedes $X$, $X$ is inside the content of $Y$, or $Y$ is inside the content of $X$.

Let $X$ and $Y$ be two elements. If $X$ is inside the content of $Y$ and there does not exist a third element $Z$ such that $X$ is inside the content of $Z$ and $Z$ is inside the content of $Y$, then $X$ is called a child of $Y$.

Let $X$, $Y$, and $Z$ be three elements. If $X$ is a child of $Y$ and $Z$, then $Y$ and $Z$ are the same element. That element is called the parent of $X$.

Not all elements have a parent. An element that has no parent is called a root element.

Well-formedness constraint: There must exist exactly one root element.

Well-formedness constraint: The characters < and & must not appear literally inside character data. They must be escaped using an entity reference or a character reference.

Documented EVLambda source files are structured as follows:

The root element is a chapter element.
The chapter element must contain a child title element and may contain any number of child section and paragraph-level elements. The chapter element may also directly contain character data, which will be interpreted as EVLambda source code.
The section elements must contain a child title element and may contain any number of child section and paragraph-level elements. The section elements may also directly contain character data, which will be interpreted as EVLambda source code.
The only character data to be interpreted as EVLambda source code is the character data directly contained inside a chapter or section element.
EVLambda source code can contain comments in XML format: block comments (in the form of paragraph-level elements), end-of-line comments (in the form of comment elements), and end-of-last-line comments (also in the form of comment elements). End-of-last-line comments are special because they are located outside the piece of code they are logically connected to. (They are located after the last closing parenthesis.)

Because the characters < and & can appear inside EVLambda source code, a documented EVLambda source file is not always a well-formed XML document.

Documentation Generator

The documentation generator converts a documented EVLambda source file to HTML in two steps:

Step 1 The characters < and & appearing inside EVLambda source code are escaped and some tags are added to better delimit the EVLambda source code and the comments. The resulting file is a well-formed XML document.

Step 2 The resulting file from step 1 is converted to HTML by an XSLT stylesheet.

Unicode

The characters contained inside listener buffers and EVLambda source files are Unicode characters. Unicode is a character set containing, as of version 17.0, $159801$ characters. Each Unicode character is uniquely identified by a nonnegative integer called its code point. Code points range from $0$ to $1114111$ in decimal and from 0 to 10FFFF in hexadecimal. (Not all code points are assigned to a character.) The Unicode character whose code point is $\hex$ in hexadecimal is denoted by U+$\hex$. The order on integers directly translates into an order on Unicode characters. With respect to that order, the Unicode character $c_1$ precedes the Unicode character $c_2$ if and only if the code point of $c_1$ is strictly less than the code point of $c_2$. That order can be used to define ranges of Unicode characters.

The range from $0$ to $1114111$ is divided into $17$ planes each containing $65536$ code points. The first plane is called the basic multilingual plane (BMP) and the other planes are called the supplementary planes. Most of the characters in common use in the world are located in the BMP.

An encoding form is a mapping that maps a character to a sequence of $n$-bit words called code units. The following encoding forms are in common use:

UTF-$32$: Code units are $32$-bit words and each character is mapped to a sequence of one code unit. Each character is mapped to the $32$-bit unsigned integer representing the character's code point.
UTF-$16$: Code units are $16$-bit words and each character is mapped to a sequence of one or two code units. Each character belonging to the BMP is mapped to the $16$-bit unsigned integer representing the character's code point. Each character belonging to a supplementary plane is mapped to a sequence of two code units called a surrogate pair. The first element is called the high-surrogate code unit and the second element is called the low-surrogate code unit.
UTF-$8$: Code units are $8$-bit words and each character is mapped to a sequence of one, two, three, or four code units. Each character belonging to the BMP is mapped to a sequence of one, two, or three code units. In particular, each of the $128$ ASCII characters is mapped to the $8$-bit unsigned integer representing the character's code point. Each character belonging to a supplementary plane is mapped to a sequence of four code units.

An encoding scheme is a mapping that maps a character to a sequence of bytes. (A byte is an $8$-bit word.) The following encoding schemes are in common use:

UTF-$32$BE: UTF-$32$ encoding form where each $32$-bit code unit is represented by four bytes ordered from most significant to least significant (big-endian order).
UTF-$32$LE: UTF-$32$ encoding form where each $32$-bit code unit is represented by four bytes ordered from least significant to most significant (little-endian order).
UTF-$16$BE: UTF-$16$ encoding form where each $16$-bit code unit is represented by two bytes ordered from most significant to least significant (big-endian order).
UTF-$16$LE: UTF-$16$ encoding form where each $16$-bit code unit is represented by two bytes ordered from least significant to most significant (little-endian order).
UTF-$8$: Because UTF-$8$ code units are bytes, UTF-$8$ is both an encoding form and an encoding scheme.

Contrary to what was said in the user manual, an object of type character represents a UTF-$16$ code unit (instead of a Unicode character) and an object of type string represents an indexed sequence of UTF-$16$ code units (instead of an indexed sequence of Unicode characters).

Each Unicode character has a name (“LATIN CAPITAL LETTER A” for instance) and a set of properties. An important property is the general category. The general category property can take the following values:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Lu	Uppercase_Letter	uppercase letters
Ll	Lowercase_Letter	lowercase letters
Lt	Titlecase_Letter	digraphs whose first constituent is an uppercase letter
LC	Cased_Letter	Lu, Ll, or Lt
Lm	Modifier_Letter	noncombining modifier letters
Lo	Other_Letter	letters from unicase alphabets and ideographs
L	Letter	Lu, Ll, Lt, Lm, or Lo
Mn	Nonspacing_Mark	nonspacing combining marks (accents, …)
Mc	Spacing_Mark	spacing combining marks
Me	Enclosing_Mark	enclosing combining marks
M	Mark	Mn, Mc, or Me
Nd	Decimal_Number	decimal digits
Nl	Letter_Number	letterlike numeric characters (Roman numerals, …)
No	Other_Number	other numeric characters (fractions, …)
N	Number	Nd, Nl, or No
Pc	Connector_Punctuation	connecting punctuation marks (underscore, …)
Pd	Dash_Punctuation	dashlike punctuation marks (hyphen, dashes, …)
Ps	Open_Punctuation	opening punctuation marks (opening parenthesis, …)
Pe	Close_Punctuation	closing punctuation marks (closing parenthesis, …)
Pi	Initial_Punctuation	initial quotation marks
Pf	Final_Punctuation	final quotation marks
Po	Other_Punctuation	other punctuation marks (period, comma, colon, semicolon, …)
P	Punctuation	Pc, Pd, Ps, Pe, Pi, Pf, or Po
Sm	Math_Symbol	mathematical symbols
Sc	Currency_Symbol	currency symbols
Sk	Modifier_Symbol	noncombining modifier symbols
So	Other_Symbol	other symbols (Emojis, …)
S	Symbol	Sm, Sc, Sk, or So
Zs	Space_Separator	space characters
Zl	Line_Separator	line separator character
Zp	Paragraph_Separator	paragraph separator character
Z	Separator	Zs, Zl, or Zp
Cc	Control	C0 and C1 control characters (horizontal tab, line feed, carriage return, …)
Cf	Format	format control characters (left-to-right and right-to-left marks, …)
Cs	Surrogate	surrogate code points
Co	Private_Use	private-use characters
Cn	Unassigned	noncharacters and unassigned code points
C	Other	Cc, Cf, Cs, Co, or Cn

Most Unicode characters have associated visual representations called glyphs. For instance, the Unicode character “LATIN CAPITAL LETTER A” (U+0041) has the following associated glyphs (and infinitely more considering all possible variations in font, size, weight, style, etc.):

A (serif)
A (serif bold italic)
A (sans-serif)
A (sans-serif bold italic)

In general, the association is not between Unicode characters and glyphs but between sequences of Unicode characters and glyphs and it is possible for different sequences of Unicode characters to have the same associated glyphs. For example, the sequence of one Unicode character “LATIN CAPITAL LETTER A WITH DIAERESIS” (U+00C4) and the sequence of two Unicode characters “LATIN CAPITAL LETTER A” (U+0041) “COMBINING DIAERESIS” (U+0308) have the same associated glyphs:

Ä, Ä, Ä, Ä, … (U+00C4)
Ä, Ä, Ä, Ä, … (U+0041 U+0308)

diff --git a/system-files/all-caps.css b/system-files/all-caps.css index 6b51d3a..c25246b 100644 --- a/system-files/all-caps.css +++ b/system-files/all-caps.css @@ -11,14 +11,62 @@ div.preamble { display: none; } +h1 { + counter-reset: h2counter h3counter h4counter h5counter h6counter; +} + +h2 { + counter-increment: h2counter; + counter-set: h3counter h4counter h5counter h6counter; +} + +h2:before { + content: counter(h2counter) ". "; +} + +h3 { + counter-increment: h3counter; + counter-set: h4counter h5counter h6counter; +} + +h3:before { + content: counter(h2counter) "." counter(h3counter) ". "; +} + +h4 { + counter-increment: h4counter; + counter-set: h5counter h6counter; +} + +h4:before { + content: counter(h2counter) "." counter(h3counter) "." counter(h4counter) ". "; +} + +h5 { + counter-increment: h5counter; + counter-set: h6counter; +} + +h5:before { + content: counter(h2counter) "." counter(h3counter) "." counter(h4counter) "." counter(h5counter) ". "; +} + +h6 { + counter-increment: h6counter; +} + +h6:before { + content: counter(h2counter) "." counter(h3counter) "." counter(h4counter) "." counter(h5counter) "." counter(h6counter) ". "; +} + .bg { background-color: lightgray; } pre.repl { margin-left: 2em; - padding: 10px; border-radius: 10px; + padding: 10px; width: 800px; background-color: lightgray; } @@ -35,14 +83,26 @@ div.trace li { list-style-type: none; } -table { +span.charseq { + font-family: monospace; + line-height: 2em; +} + +span.char { + margin: 1px; + border: 1px solid; + padding: 1px; +} + +table.plain { border-collapse: collapse; border: 1px solid; } -table th, table td { +table.plain th, table.plain td { border: 1px solid; padding: 5px; + text-align: left; } table.ks { @@ -57,6 +117,7 @@ table.ks th, table.ks td { table.bnf { font-family: monospace; + margin-left: 1em; } td.lhs, td.def, td.rhs { diff --git a/system-files/all-caps.js b/system-files/all-caps.js index b2f6e8c..f5a5eb1 100644 --- a/system-files/all-caps.js +++ b/system-files/all-caps.js @@ -69,7 +69,8 @@ window.MathJax = { vector: '\\mlvar{vector}', function: '\\mlvar{function}', primitivefunction: '\\mlvar{primitive-function}', - closure: '\\mlvar{closure}' + closure: '\\mlvar{closure}', + hex: '\\mlvar{hex}' } }, output: { -- 2.39.5