Modula-2 Reloaded

A Modern Typesafe & Literate Programming Notation

Site Menu

Project

Specification

Implementation

Recommendations

Reference

Needs Updating

Work in Progress

Wastebasket

Wiki Manual

edit SideBar

Lexical Entities

1.1 Character Sets

By default only the printable characters of the 7-bit ASCII character set, whitespace, tabulator and newline are legal within Modula-2 source text. Unicode characters may be permitted within quoted literals and comments, subject to recognition and verification of the encoding scheme used.

1.2 Reserved Words

Reserved words are symbols that consist of a sequence of all-uppercase letters, are visible in any scope, have special meaning in the language and may not be redefined. There are 49 reserved words:

 ALIAS           DEFINITION      IF              OF              RETURN
 AND             DIV             IMPLEMENTATION  OPAQUE          SET
 ARGLIST         DO              IMPORT          OR              THEN
 ARRAY           ELSE            IN              POINTER         TO
 BEGIN           ELSIF           LOOP            PROCEDURE       TYPE
 BLUEPRINT       END             MOD             RECORD          UNTIL
 BY              EXIT            MODULE          REFERENTIAL     VAR
 CASE            FOR             NEW             RELEASE         WHILE
 CONST           FROM            NONE            REPEAT          YIELD
 COPY            GENLIB          NOT             RETAIN          

1.3 Schrödinger's Tokens

Schrödinger's tokens are symbols that may either be used as reserved words or as identifiers, depending on context. There are 32 Schrödinger's tokens:

 ABS             INSERT          STORE           TMAX            VAL
 ADDRESS         LENGTH          SUBSET          TMIN            VALUE
 APPEND          OCTET           SXF             TORDERED        WRITE
 CAST            READ            TDYN            TREFC           WRITEF
 COUNT           READNEW         TFLAGS          TSCALAR
 COROUTINE       REMOVE          TLIMIT          TSORTED
 EXISTS          SEEK            TLITERAL        UNSAFE

1.4 Special Symbols

Special symbols are symbols that consist of one, two or three non-alphanumeric quotable characters, are visible in any scope, have special meaning in the language and may not be redefined. They fall into six categories:

1.4.1 Operators
 +   -   *   /   \   =   #   <   <=   >   >=   ==   ::   &   ^

1.4.2 Punctuation
 .   ,   :   ;   |   ~   +   *   <   >   ..   :=   ++   --   ->   <>   ><   +/-

1.4.3 Grouping Delimiters
 (  )   [  ]   {  }

1.4.4 Quoted Text Delimiters
 '   "   <<  >>

1.4.5 Comment Delimiters
 !   (*  *)

1.4.6 Pragma Punctuation and Delimiters
 ?   <*  *>

1.5 Identifiers

Identifiers are names for syntactic entities in a program. They start with a letter, low-line or dollar sign, followed by any number and combination of letters, low-lines, dollar signs and digits.

The use of the low-line and dollar sign within identifiers is permitted in support of environments and platforms where they are an integral part of the naming convention, for instance when writing components for or mapping to operating system APIs that use them. However, such an identifier must also contain at least one letter or digit. A non-conformant identifier shall cause a compile time error. The definition of an identifier in a foreign API style shall cause a soft compile time warning. However, the warning may be automatically silenced when FFI of module UNSAFE is imported into the scope of the compiling module.

EBNF | Syntax Diagram

Examples:

 (* Modula-2 style *)  Foo, setBar, getBaz, Str80, Matrix8x4, FOOBAR
 (* Foreign API styles *)  _foo, __bar, __baz__, foo_bar_123, $foo, sys$foo, SYS$BAR

1.5.1 Reserved Identifiers

Reserved identifiers are language defined identifiers that may not be redefined. Reserved are:

  • predefined identifiers
  • Schrödinger's tokens
  • all-uppercase identifiers of standard pseudo-modules including their module identifiers

1.5. 2 User-Definable Identifiers

Identifiers that do not coincide with reserved identifiers may be defined or redefined in any scope of a program or library module.

1.6 Literals

There are three types of literals:

1.6.1 Numeric literals

Numeric literals represent a numeric compile time value. There are four types:

EBNF | Syntax Diagram

1.6.1.1 Decimal Number Literals

Decimal number literals represent decimal whole and real numbers. They are comprised of a mandatory integral part followed by an optional fractional part followed by an optional exponent. Integral and fractional part are separated by a decimal point. Fractional part and exponent are separated by the exponent prefix e followed by an optional sign. Integral part, fractional part and exponent are comprised of a non-empty sequence of decimal digits. Digits may be grouped using the single quote as a digit separator. A digit separator may only appear in between two digits.

Examples:

 0, 42, 12300, 32767 (* whole numbers *)
 0.0, 3.1415, 7.531e+12 (* real numbers *)
 1'234'500'000, 0.987'654'321e+99 (* with digit separators *)

1.6.1.2 Base-2 Number Literals

Base-2 number literals represent whole numbers in base-2 notation. They are comprised of base-2 number prefix 0b followed by a non-empty sequence of base-2 digits. Digits may be grouped using the single quote as a digit separator. A digit separator may only appear in between two digits.

Examples:

 0b0110 (* without digit separator *)
 0b1111'0000'0101'0011 (* with digit separators *)

1.6.1.3 Base-16 Number Literals

Base-16 number literals represent whole numbers in base-16 notation. They are comprised of base-16 number prefix 0x followed by a non-empty sequence of base-16 digits. Digits may be grouped using the single quote as a digit separator. A digit separator may only appear in between two digits.

Examples:

 0x80, 0xFF, 0xCAFED00D (* without digit separator *)
 0x00'00'FF'FF, 0xDEAD'BEEF (* with digit separators *)

1.6.1.4 Character Code Literals

Character code literals represent Unicode code points in base-16 notation. They are comprised of Unicode prefix 0u followed by a non-empty sequence of base-16 digits. Digits may be grouped using the single quote as a digit separator. A digit separator may only appear in between two digits.

Examples:

 0u7F (* DEL *)
 0uA9 (* copyright *)
 0u20'AC (* Euro currency sign *)

1.6.2 String Literals

String literals are sequences of quotable characters and optional escape sequences, enclosed in single quotes or double quotes. String literals may not contain any control code characters.

EBNF | Syntax Diagram

Examples:

 "it's nine o'clock"
 'he said "Modula-2" and smiled'
 "this is the end of the line\n"

1.6.3 Structured Literals

Structured literals are compound values consisting of zero or more terminal symbols, enclosed in braces. Structured literals may be nested.

EBNF | Syntax Diagram

Examples:

 { 1, 2, 3 }
 { "a", "b", "c" }
 { 1 .. 5 }   (* equivalent to: *)   { 1, 2, 3, 4, 5 }
 { 0 BY 5 }   (* equivalent to: *)   { 0, 0, 0, 0, 0 }

1.7 Non-Semantic Symbols

Non-semantic symbols are symbols that do not impact the meaning of a program. They may occur anywhere in a program before or after semantic symbols but not within them. There are three types:

1.7.1 Comments

Comments are ignored by a compiler but are for annotation and documentation. There are two kinds:

1.7.1.1 Line Comments

Line comments start with a ! symbol at the first column of a line and terminate at the end of the same line. They are intended for in-source documentation, for example in combination with documentation generators.

EBNF | Syntax Diagram

Examples:

 ! Special documentation tags for Doxygen:
 !! @brief Modula-2 Standard Library
 !! @authors B.Kowarsch & R.Sutcliffe

1.7.1.2 Block Comments

Block comments are delimited by opening (* and closing *) comment delimiters. They are intended for annotating source code. They may span multiple lines and they may be nested but in order to ensure portability of source code, a language defined arbitrary nesting limit of ten including the outermost comment is imposed. A compile time error shall occur if this limit is exceeded.

EBNF | Syntax Diagram

Examples:

 IF (* no match found *) this^.next = NIL THEN (* comment (* nested comment *) *) ...

1.7.2 Pragmas

Pragmas are in-source compiler directives to control or influence the compilation process but they do not change the meaning of the program. They consist of a pragma body enclosed in opening <* and closing *> pragma delimiters.

A pragma body consists of a non-empty token sequence whose syntax is defined by the pragma grammar. Whitespace, tabulator and line breaks may occur between tokens within a pragma, but comments are not permitted. A comment delimiter within a pragma shall cause a compile time error. There are language defined and optional implementation defined pragmas.

EBNF | Syntax Diagram

Examples:

 <*ALIGN=TSIZE(LONGCARD)*> (* language defined pragma *)
 <*GM2.UnrollLoops=FALSE|WARN*> (* implementation defined pragma *)

1.7.3 Lexical Separators

Lexical separators terminate a numeric literal, identifier, reserved word or a pragma symbol. There are two kinds.

  • Whitespace
  • Control Codes

1.7.3.1 Control Codes

The following control codes may appear within Modula-2 source text but not within string literals:

  • TAB denoting horizontal tabulator code 0u9
  • LF denoting line feed code 0uA
  • CR denoting carriage return code 0uD
  • UTF8-BOM denoting code sequence { 9uEF, 0uBB, 0uBF } but permitted only at the very beginning of a file.

Any other control codes within a source file shall cause a compile time error. An unrecognised BOM shall cause a fatal compile time error. Encoding support other than ASCII and UTF8 is implementation defined.

1.8 Reserved Symbols

Certain symbols are reserved for use by optional language facilities, language extensions and external source code processing utilities. Some are specifically reserved for future use.

1.8.1 Symbols Reserved for Optional and Future Use
FacilityReserved Symbols
Symbolic Inline Assembler (Optional)ASSEMBLER ASM REG
Actor Based Concurrency (Phase II Deliverables)ACTOR PRIORITY
Exponentiation and Dot Product Operators (Possible Future Use)** *.

1.8.2 Symbols Reserved for Coordinated Superset Use

A coordinated language superset is a compliant language superset for whose exclusive use certain symbols are reserved. The reserved symbols of coordinated language supersets are listed below:

SupersetReserved Symbols
Objective
Modula-2
Symbols` BYCOPY BYREF CLASS CONTINUE CRITICAL INOUT METHOD ON OPTIONAL OUT PRIVATE PROTECTED PROTOCOL PUBLIC TRY NO OBJECT YES
PragmasACTION FRAMEWORK OUTLET QUALIFIED
Parallel
Modula-2
SymbolsALL PARALLEL SYNC
PragmasLOCAL SPREAD CYCLE SBLOCK CBLOCK

1.8.3 Symbols Reserved for Uncoordinated Superset Use

An uncoordinated language superset is a compliant language superset for which no reserved words, identifiers or pragmas are reserved. Such a superset may define additional reserved words and predefined identifiers as long as they start with a single @ character. Implementations that target the OpenVMS operating system may define platform specific reserved words and predefined identifiers as long as they contain at least one % character.

Examples:

 @TRY @CATCH (* possible reserved words of a language superset *)
 %DESCR %IMMED (* possible reserved words of an OpenVMS specific superset *)

1.8.4 Symbols Reserved for External Source Code Processors

To assist source code processing prior to compilation, certain symbols are reserved for exclusive use by external source code processing utilities.

UtilityReserved Symbols
Modula-2 Template Engine## @@ <# #> // /* */
Character Set Transliterators/= +> (. .) (: :) (= =) ?/ ?< ?! ?- ?. ?= ?* ?: ?? ??? ?, ?,, ?> ?+ ?;

1.8.5 Other Symbols

Any special symbols not specifically reserved shall be considered reserved for possible future use or taboo.

1.9 Lexical Parameters

1.9.1 Length of Literals

The minimum lengths of literals a conforming implementation shall support are:

  • for string literals, 160 characters
  • for character code literals, 6 digits
  • for whole number literals, 24 digits
  • for real number literals, 64 digits

The fractional part of a real number literal may be truncated. If it is truncated, a soft compile time warning shall be emitted.

If a string literal, a character code literal, a whole number literal or the significand or exponent of a real number literal is longer than an implementation is able to process, a compile time error shall occur.

1.9.2 Length of Identifiers and Pragma Symbols

The minimum lengths of identifiers and pragma symbols a conforming implementation shall support are:

  • for identifiers, 32 characters
  • for pragma symbols, 32 characters

If an identifier or a pragma symbol exceeds the maximum length supported by the implementation, it may be truncated to the maximum supported length. If it is, a soft compile time warning shall occur.

1.9.4 Length of Comments

An implementation that generates source code of another language may choose to preserve comments by copying them into the output. In this case, the implementation may limit the length of comments copied into the output. The minimum lengths of comments to be fully preserved that such an implementation shall support are:

  • for line comments, 250 characters
  • for block comments, 2000 characters

If a comment to be preserved exceeds the maximum length supported by the implementation, it may be truncated to the maximum supported length. If it is truncated, a soft compile time warning shall occur. If a nested block comment is truncated, an implementation shall insert all closing comment delimiters that would have been lost as a result of truncation.

1.9.4 Line and Column Counters

An implementation may limit the capacity of its internal line and column counters. The minimum values a conforming implementation shall support are:

  • for the line counter, 65000
  • for the column counter, 250

In the event that a source file being processed exceeds the supported counter limits, an implementation may either continue or abort compilation. A soft compile time warning shall occur if the implementation continues. A fatal compile time error shall be emitted if the implementation aborts.

1.9.5 Lexical Parameter Constants

Actual lexical parameters shall be provided as constants in standard library module LexParams.