From ee07b7b5a4d626cc45d941aa34edd63933af6a1d Mon Sep 17 00:00:00 2001 From: =?utf8?q?Rapha=C3=ABl=20Van=20Dyck?= Date: Thu, 1 Jan 2026 14:27:49 +0100 Subject: [PATCH] add reference manual --- src/ide.jsx | 2 +- system-files/BIBLIOGRAPHY | 5 +- system-files/REFERENCE-MANUAL | 151 ++++++++++++++++++++++++++++++++++ system-files/all-caps.css | 67 ++++++++++++++- system-files/all-caps.js | 3 +- 5 files changed, 222 insertions(+), 6 deletions(-) diff --git a/src/ide.jsx b/src/ide.jsx index b963eda..affa9c0 100644 --- a/src/ide.jsx +++ b/src/ide.jsx @@ -1230,7 +1230,7 @@ function init(systemFiles) { init([ 'USER-MANUAL', 'TUTORIAL', - //'REFERENCE-MANUAL', + 'REFERENCE-MANUAL', //'IMPLEMENTATION-NOTES', 'BIBLIOGRAPHY', 'LICENSE', diff --git a/system-files/BIBLIOGRAPHY b/system-files/BIBLIOGRAPHY index 1a988ce..cdeec8d 100644 --- a/system-files/BIBLIOGRAPHY +++ b/system-files/BIBLIOGRAPHY @@ -11,15 +11,18 @@

Bibliography

Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs, second edition, 1996, MIT Press

-

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau (Editors), Extensible Markup Language (XML) 1.0, fifth edition, 2008, https://www.w3.org/TR/2008/REC-xml-20081126/

+

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau (Editors), Extensible Markup Language (XML) 1.0, fifth edition, 2008, https://www.w3.org/TR/2008/REC-xml-20081126/

+

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, and John Cowan (Editors), Extensible Markup Language (XML) 1.1, second edition, 2006, https://www.w3.org/TR/2006/REC-xml11-20060816/

R. Kent Dybvig, The Scheme Programming Language, fourth edition, 2009, MIT Press

Daniel P. Friedman and Mitchell Wand, Essentials of Programming Languages, third edition, 2008, MIT Press

Paul Graham, On Lisp: Advanced Techniques for Common Lisp, 1994, Prentice Hall

Paul Graham, ANSI Common Lisp, 1996, Prentice Hall

+

Richard Kelsey, William Clinger, and Jonathan Rees (Editors), Revised5 Report on the Algorithmic Language Scheme, 1998, search for “R5RS” on the internet to get a link

Peter Norvig, Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp, 1992, Morgan Kaufmann Publishers

Kent Pitman, Common Lisp HyperSpec, 2005, https://www.lispworks.com/documentation/HyperSpec/Front/index.htm

Christian Queinnec, Lisp in Small Pieces, 2003, Cambridge University Press

Michael Sperber, R. Kent Dybvig, Matthew Flatt, and Anton van Straaten (Editors), Revised6 Report on the Algorithmic Language Scheme, 2007, https://www.r6rs.org/

Guy L. Steele Jr., Common Lisp: The Language, second edition, 1990, Digital Press

+

The Unicode Consortium, The Unicode Standard, Version 17.0.0, 2025, https://www.unicode.org/versions/Unicode17.0.0/

diff --git a/system-files/REFERENCE-MANUAL b/system-files/REFERENCE-MANUAL index 785c083..a31c944 100644 --- a/system-files/REFERENCE-MANUAL +++ b/system-files/REFERENCE-MANUAL @@ -9,5 +9,156 @@ +

Reference Manual

+

The reference manual provides a detailed account of the programming language. It supplements the user manual (particularly the sections “Programming Language” and “Listener Buffers”) and the tutorial.

+

Syntax

+

Introduction

+

Listener Buffers

+

Let's examine what happens when the form (+ 123 456) is evaluated in a listener buffer:

+
> (+ 123 456)
579
+

The process can be broken down into the following five steps:

+

Step 1 The sequence of characters (+ 123 456) is read from the listener buffer.

+

Step 2 The reader converts the sequence of characters (+ 123 456) into the list (+ 123 456). (The sequence of characters is a readable representation of the list.) This step can be broken down into the following two substeps:

+

Step 2.1 A component of the reader called the tokenizer converts the sequence of characters (+ 123 456) into a sequence of tokens. A token consists of the following pieces of information bundled together: a category and, if required by the category, a value. Each token category has an associated pattern and each token has an associated lexeme. A lexeme is a sequence of contiguous characters extracted from the input sequence of characters. The lexeme associated with a token must match the pattern associated with the token's category and the sequence of characters resulting from the concatenation of the lexemes associated with the tokens must match the input sequence of characters. The sequence of characters (+ 123 456) is converted into the following sequence of tokens:

+ +

Note that some whitespace is needed to separate the lexeme + from the lexeme 123 and the lexeme 123 from the lexeme 456 but no whitespace is needed to separate the lexeme ( from the lexeme + or the lexeme 456 from the lexeme ).

+

Step 2.2 A component of the reader called the parser converts the sequence of tokens from step 2.1 (minus the tokens of category whitespace, which are ignored by the parser) into a cons whose car is the variable from step 2.1 and whose cdr is a cons whose car is the first number from step 2.1 and whose cdr is a cons whose car is the second number from step 2.1 and whose cdr is the empty list. Together, those three conses represent the list (+ 123 456).

+

Step 3 The evaluator evaluates the list (+ 123 456) to the number 579. The evaluation of the top-level form (+ 123 456) entails the evaluation of other non-top-level forms. Each form must be classified in order to determine how it should be evaluated. The form (+ 123 456) is classified as a plain function call. The variable + is treated as an abbreviation for the form (fref +). The form (fref +) is classified as an fref-form. The forms 123 and 456 are classified as self-evaluating objects. Because the global function + is a closure, its invocation entails the evaluation (and thus the classification) of other forms. The component of the evaluator responsible for classifying forms is called the syntax analyzer.

+

Step 4 The printer converts the number 579 into the sequence of characters 579. (The sequence of characters is the printable representation of the number.)

+

Step 5 The sequence of characters 579 is written into the listener buffer.

+

EVLambda has three levels of syntax:

+ +

Each level of syntax is described later in its own section.

+

EVLambda Source Files

+

The reader is used not only to convert the characters typed into a listener buffer into an object but also to convert the characters contained inside an EVLambda source file into a sequence of objects.

+

EVLambda source files come in two varieties: the plain EVLambda source files, which contain only EVLambda source code, and the documented EVLambda source files, which contain a mix of EVLambda source code and documentation in XML format.

+

Here is an example of a plain EVLambda source file:

+
(fdef fact (n)
(if (= n 0)
1
(* n (fact (- n 1)))))

(test 1 (fact 0))
(test 120 (fact 5))
(test 3628800 (fact 10))

(fdef fib (n)
(if (= n 0)
0
(if (= n 1)
1
(+ (fib (- n 1)) (fib (- n 2))))))

(test 0 (fib 0))
(test 5 (fib 5))
(test 55 (fib 10))
+

Here is an example of a documented EVLambda source file:

+
<chapter>
<title>Recursive Functions</title>
<para>...para...</para>
<para>...para...</para>
<section>
<title>Factorial Function</title>
<para>...para...</para>
<para>...para...</para>
(fdef fact (n)
<para>...block...</para>
<para>...block...</para>
(if (= n 0)
1 <comment>...eol...</comment>
(* n (fact (- n 1))))) <comment>...eoll...</comment>

(test 1 (fact 0))
(test 120 (fact 5))
(test 3628800 (fact 10))
</section>
<section>
<title>Fibonacci Sequence</title>
<para>...para...</para>
<para>...para...</para>
(fdef fib (n)
<para>...block...</para>
<para>...block...</para>
(if (= n 0)
0 <comment>...eol...</comment>
(if (= n 1)
1 <comment>...eol...</comment>
(+ (fib (- n 1)) (fib (- n 2)))))) <comment>...eoll...</comment>

(test 0 (fib 0))
(test 5 (fib 5))
(test 55 (fib 10))
</section>
</chapter>
+

Documented EVLambda source files can be converted to HTML by a component of the programming language called the documentation generator.

+

Extensible Markup Language (XML)

+

An XML document is a annotated text document. An XML document is divided into two intermingled parts: the character data (the content) and the markup (the annotations). Markup can take many forms. Documented EVLambda source files use the following forms of markup:

+
+
start tags (without attributes)
+
<chapter>, <section>, <title>, <para>, <comment>, …
+
end tags
+
</chapter>, </section>, </title>, </para>, </comment>, …
+
empty-element tags (without attributes)
+
<br/>, …
+
comments
+
<!-- FIXME -->, …
+
entity references
+
&lt; (refers to the character <), &gt; (refers to the character >), &amp; (refers to the character &), …
+
character references (decimal representation)
+
&#9166; (refers to the character ⏎), …
+
character references (hexadecimal representation)
+
&#x23CE; (refers to the character ⏎), …
+
+

Many constraints must be satisfied for an XML document to be well-formed. The main well-formedness constraints are noted below.

+

Well-formedness constraint: Start tags and end tags must appear in pairs. In each pair, the start tag must precede the end tag and both tags must have the same name.

+

An element is a sequence of characters delimited by a pair of start and end tags or by an empty-element tag. The characters of the delimiting tags or tag belong to the element. The characters that belong to an element but not to its delimiting tags or tag constitute the content of the element. The content of an element delimited by an empty-element tag is empty. The name of an element is the name of its delimiting tags or tag.

+

Well-formedness constraint: Elements must not overlap. Let $X$ and $Y$ be two distinct elements. One of the following conditions must be true: $X$ precedes $Y$, $Y$ precedes $X$, $X$ is inside the content of $Y$, or $Y$ is inside the content of $X$.

+

Let $X$ and $Y$ be two elements. If $X$ is inside the content of $Y$ and there does not exist a third element $Z$ such that $X$ is inside the content of $Z$ and $Z$ is inside the content of $Y$, then $X$ is called a child of $Y$.

+

Let $X$, $Y$, and $Z$ be three elements. If $X$ is a child of $Y$ and $Z$, then $Y$ and $Z$ are the same element. That element is called the parent of $X$.

+

Not all elements have a parent. An element that has no parent is called a root element.

+

Well-formedness constraint: There must exist exactly one root element.

+

Well-formedness constraint: The characters < and & must not appear literally inside character data. They must be escaped using an entity reference or a character reference.

+

Documented EVLambda source files are structured as follows:

+ +

Because the characters < and & can appear inside EVLambda source code, a documented EVLambda source file is not always a well-formed XML document.

+

Documentation Generator

+

The documentation generator converts a documented EVLambda source file to HTML in two steps:

+

Step 1 The characters < and & appearing inside EVLambda source code are escaped and some tags are added to better delimit the EVLambda source code and the comments. The resulting file is a well-formed XML document.

+

Step 2 The resulting file from step 1 is converted to HTML by an XSLT stylesheet.

+

Unicode

+

The characters contained inside listener buffers and EVLambda source files are Unicode characters. Unicode is a character set containing, as of version 17.0, $159801$ characters. Each Unicode character is uniquely identified by a nonnegative integer called its code point. Code points range from $0$ to $1114111$ in decimal and from 0 to 10FFFF in hexadecimal. (Not all code points are assigned to a character.) The Unicode character whose code point is $\hex$ in hexadecimal is denoted by U+$\hex$. The order on integers directly translates into an order on Unicode characters. With respect to that order, the Unicode character $c_1$ precedes the Unicode character $c_2$ if and only if the code point of $c_1$ is strictly less than the code point of $c_2$. That order can be used to define ranges of Unicode characters.

+

The range from $0$ to $1114111$ is divided into $17$ planes each containing $65536$ code points. The first plane is called the basic multilingual plane (BMP) and the other planes are called the supplementary planes. Most of the characters in common use in the world are located in the BMP.

+

An encoding form is a mapping that maps a character to a sequence of $n$-bit words called code units. The following encoding forms are in common use:

+ +

An encoding scheme is a mapping that maps a character to a sequence of bytes. (A byte is an $8$-bit word.) The following encoding schemes are in common use:

+ +

Contrary to what was said in the user manual, an object of type character represents a UTF-$16$ code unit (instead of a Unicode character) and an object of type string represents an indexed sequence of UTF-$16$ code units (instead of an indexed sequence of Unicode characters).

+

Each Unicode character has a name (“LATIN CAPITAL LETTER A” for instance) and a set of properties. An important property is the general category. The general category property can take the following values:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LuUppercase_Letteruppercase letters
LlLowercase_Letterlowercase letters
LtTitlecase_Letterdigraphs whose first constituent is an uppercase letter
LCCased_LetterLu, Ll, or Lt
LmModifier_Letternoncombining modifier letters
LoOther_Letterletters from unicase alphabets and ideographs
LLetterLu, Ll, Lt, Lm, or Lo
MnNonspacing_Marknonspacing combining marks (accents, …)
McSpacing_Markspacing combining marks
MeEnclosing_Markenclosing combining marks
MMarkMn, Mc, or Me
NdDecimal_Numberdecimal digits
NlLetter_Numberletterlike numeric characters (Roman numerals, …)
NoOther_Numberother numeric characters (fractions, …)
NNumberNd, Nl, or No
PcConnector_Punctuationconnecting punctuation marks (underscore, …)
PdDash_Punctuationdashlike punctuation marks (hyphen, dashes, …)
PsOpen_Punctuationopening punctuation marks (opening parenthesis, …)
PeClose_Punctuationclosing punctuation marks (closing parenthesis, …)
PiInitial_Punctuationinitial quotation marks
PfFinal_Punctuationfinal quotation marks
PoOther_Punctuationother punctuation marks (period, comma, colon, semicolon, …)
PPunctuationPc, Pd, Ps, Pe, Pi, Pf, or Po
SmMath_Symbolmathematical symbols
ScCurrency_Symbolcurrency symbols
SkModifier_Symbolnoncombining modifier symbols
SoOther_Symbolother symbols (Emojis, …)
SSymbolSm, Sc, Sk, or So
ZsSpace_Separatorspace characters
ZlLine_Separatorline separator character
ZpParagraph_Separatorparagraph separator character
ZSeparatorZs, Zl, or Zp
CcControlC0 and C1 control characters (horizontal tab, line feed, carriage return, …)
CfFormatformat control characters (left-to-right and right-to-left marks, …)
CsSurrogatesurrogate code points
CoPrivate_Useprivate-use characters
CnUnassignednoncharacters and unassigned code points
COtherCc, Cf, Cs, Co, or Cn
+

Most Unicode characters have associated visual representations called glyphs. For instance, the Unicode character “LATIN CAPITAL LETTER A” (U+0041) has the following associated glyphs (and infinitely more considering all possible variations in font, size, weight, style, etc.):

+ +

In general, the association is not between Unicode characters and glyphs but between sequences of Unicode characters and glyphs and it is possible for different sequences of Unicode characters to have the same associated glyphs. For example, the sequence of one Unicode character “LATIN CAPITAL LETTER A WITH DIAERESIS” (U+00C4) and the sequence of two Unicode characters “LATIN CAPITAL LETTER A” (U+0041) “COMBINING DIAERESIS” (U+0308) have the same associated glyphs:

+ diff --git a/system-files/all-caps.css b/system-files/all-caps.css index 6b51d3a..c25246b 100644 --- a/system-files/all-caps.css +++ b/system-files/all-caps.css @@ -11,14 +11,62 @@ div.preamble { display: none; } +h1 { + counter-reset: h2counter h3counter h4counter h5counter h6counter; +} + +h2 { + counter-increment: h2counter; + counter-set: h3counter h4counter h5counter h6counter; +} + +h2:before { + content: counter(h2counter) ". "; +} + +h3 { + counter-increment: h3counter; + counter-set: h4counter h5counter h6counter; +} + +h3:before { + content: counter(h2counter) "." counter(h3counter) ". "; +} + +h4 { + counter-increment: h4counter; + counter-set: h5counter h6counter; +} + +h4:before { + content: counter(h2counter) "." counter(h3counter) "." counter(h4counter) ". "; +} + +h5 { + counter-increment: h5counter; + counter-set: h6counter; +} + +h5:before { + content: counter(h2counter) "." counter(h3counter) "." counter(h4counter) "." counter(h5counter) ". "; +} + +h6 { + counter-increment: h6counter; +} + +h6:before { + content: counter(h2counter) "." counter(h3counter) "." counter(h4counter) "." counter(h5counter) "." counter(h6counter) ". "; +} + .bg { background-color: lightgray; } pre.repl { margin-left: 2em; - padding: 10px; border-radius: 10px; + padding: 10px; width: 800px; background-color: lightgray; } @@ -35,14 +83,26 @@ div.trace li { list-style-type: none; } -table { +span.charseq { + font-family: monospace; + line-height: 2em; +} + +span.char { + margin: 1px; + border: 1px solid; + padding: 1px; +} + +table.plain { border-collapse: collapse; border: 1px solid; } -table th, table td { +table.plain th, table.plain td { border: 1px solid; padding: 5px; + text-align: left; } table.ks { @@ -57,6 +117,7 @@ table.ks th, table.ks td { table.bnf { font-family: monospace; + margin-left: 1em; } td.lhs, td.def, td.rhs { diff --git a/system-files/all-caps.js b/system-files/all-caps.js index b2f6e8c..f5a5eb1 100644 --- a/system-files/all-caps.js +++ b/system-files/all-caps.js @@ -69,7 +69,8 @@ window.MathJax = { vector: '\\mlvar{vector}', function: '\\mlvar{function}', primitivefunction: '\\mlvar{primitive-function}', - closure: '\\mlvar{closure}' + closure: '\\mlvar{closure}', + hex: '\\mlvar{hex}' } }, output: { -- 2.39.5