Modula-2 Reloaded

A Modern Typesafe & Literate Programming Notation

Site Menu

Project

Specification

Implementation

Recommendations

Reference

Needs Updating

Work in Progress

Wastebasket

Wiki Manual

edit SideBar

Project FAQ

FAQ

Why revive a long-forgotten 30-year old programming language?

In short, it seemed more promising to modify a thirty year old notation and modernise it than to try to fix a contemporary one. The goal was a notation that adheres to engineering principles, contemporary notations mostly don't.

Over the last thirty years, the world of information technology has become a very dangerous place. By exploiting vulnerabilities, determined attackers can walk right into our computers and devices to sabotage them or steal data at will and with impunity. Over the same period software has become overly complex, heavy and unreliable. All this is a result of a change in the mindset of software practitioners and their development tools. The prevailing mindset, programming languages and associated tools have been proliferating security and reliability issues for many years.

Thirty years ago, the prevailing mindset was firmly rooted in engineering principles and best practises. The motto was "An engineer measures twice and cuts once". This is in stark contrast to today's mottos "because we can, it's cool" and "ship early, fix later". Back then not everything that was possible to do was also considered worth doing. Safeguards took centre stage. Ada and Modula-2 represent this mindset like no others. Unlike Modula-2 though, Ada does not follow Einsteinian simplicity principles and it has not managed to influence the prevailing mindset. Modula-2 once educated an entire generation in best practises.

Somebody had to bring it back. We adopted this as our mission.

Who gave you permission to use the Modula-2 name?

Professor Wirth has given us permission to use the Modula-2 name for a modernisation of the language.

Is the revision backwards compatible with the classic language?

No. The language has been out of favour for so long that backwards compatibility was not a design goal. There are a number of changes that render legacy sources in classic Modula-2 incompatible with the revised language. For the migration of legacy code we intend to provide a source-to-source translator.

Wasn't there an ISO Modula-2 after Wirth's classic version? Why didn't you start out with that?

We participated in and learned from the ISO standardisation but preferred to start out with the simpler design defined by Wirth. We carefully considered each feature in view of our own design principles and only adopted those that passed our rigid scrutiny. From ISO Modula-2 we adopted the CAST function, structured values and pragma delimiters. In comparison to ISO Modula-2, our design and our standard library are simpler, but more robust and extensible.

If safety and readability is of concern, why not use Ada?

Ada is a very large language. It has a very steep learning curve and developers are hard to find. Building an Ada compiler and verifying its correctness is an immense undertaking. The only available free compiler is the GNU Ada Translator (GNAT). When software is compiled with GNAT it becomes GPL encumbered. Commercial Ada compilers are rare and very expensive. Pricing is not publicly available.

Within the context of our project's goals and design principles, the top five shortcomings of Ada are (1) user defined ADTs are not first-class, (2) overall too large and too complex, (3) too many features with rare use cases, (4) too heavy on synonym syntax, that is different ways to express what is essentially the same construct, (5) language report is too heavy on Ada specific terminology which makes it very difficult to comprehend.

If you need to do systems programming, why not use C/C++?

C and C++ are inherently unsafe and they are not literate. Their widespread use for system software is a major root cause for most of the world's information security problems. These languages do not solve but proliferate security issues. To reach a future in which information systems are safe, their use must eventually be abandoned altogether.

Within the context of our project's goals and design principles, the top five shortcomings of C are (1) user defined ADTs are not first-class, (2) very poor type safety, (3) very poor readability, (4) no separation of casting and conversion, (5) type promotion.

Within the context of our project's goals and design principles, the top five shortcomings of C++ are (1) overall too large and too complex, (2) too many features with rare use cases, (3) poor type safety, (4) very poor readability, (5) type promotion.

If you like simplicity in a language, why not use Oberon?

The approach Professor Wirth has taken with Oberon is best described as custom dialect development on a per project basis, leaving out everything that is not strictly required by the project for which the dialect is custom made. This approach leads to very lean language designs but it also leads to language balkanisation and is not suitable for large industrial projects where software reuse and interfacing to existing technologies is important.

Within the context of our project's goals and design principles, the top five shortcomings of Oberon are (1) user defined ADTs are not first-class, (2) no separation of interface and implementation, (3) absence of enumeration types, (4) absence of structured literals, (5) poor readability of number literals.

What's up with those UPPERCASE words? Why didn't you change that?

Modula-2 is a descendant of Algol. When Algol was specified there was no standard yet for character encoding. All the designers could do was to define how Algol source code would be presented in print while leaving the encoding implementation dependent. The Algol-60 report presented reserved words in lowercase boldface underlined and predefined names in lowercase boldface italic. This established a de-facto standard for the publication of algorithms that is still in use today, albeit minus the underlining.

Since there was no standard way to represent boldface and italic on computer hardware at the time, Algol implementations used a technique called stropping to tag reserved words and predefined names. Some implementations used leading or trailing apostrophes, some used national currency symbols. When both uppercase and lowercase character sets became widely available, Wirth used uppercase in his Algol-W language as a form of stropping. Modula-2 follows this approach.

The benefit of uppercase stropping is two-fold. It makes reserved words and predefined names stand out even in the absence of any formatting and colouring and thereby contributes to readability of source code. Further, it allows the addition of new reserved words and predefined names when the language undergoes a revision without concern for name collision with existing legacy code. By contrast, in C, language revisions have added reserved words and predefined names with leading and trailing lowline characters.

However, the recommended presentation for Modula-2 remains that of the Algol-60 report, minus the underlining. To that end we have contributed a multi-dialect rendering plug-in for Modula-2 to the Pygments source code rendering framework. Our plug-in supports an Algol rendering mode by which reserved words are rendered in lowercase boldface and predefined names in lowercase boldface italic. We intend to provide similar plug-ins to other frameworks and for various source code editors.

In Oberon, Wirth removed type CARDINAL. Why didn't you follow that?

We have taken Professor Wirth's dislike of unsigned types into consideration but found that their removal would come at the expense of both readability and utility. Consequently we could not justify their removal. There are two important use cases which we believe deserve language support.

Many operating system and library APIs pass unsigned types back and forth. When interfacing with these APIs from Modula-2 it is necessary to pass and receive unsigned types. Furthermore, when implementing filesystems and databases, an unsigned type is required to represent the index of addressable content cells. To express all addresses in a linear fashion an unsigned type or a larger signed type is required. Alternatively, if the negative values of an unsigned type are to be used, the address space is no longer linear.

If Modula-2 had no unsigned type, and used a signed type of the same size instead, the replacement type would then need to be interpreted with unsigned semantics which would diminish clarity and provide opportunity for error. To read a file of a file system that uses an address space of the bit-width of the largest integer, the file's index would need to be incremented for the first half of the address space and then counter-intuitively decremented for the second half. This is ugly, confusing and not worthy of a literate programming language.

In Oberon, Wirth removed enumerations. Why didn't you follow that?

Enumerations provide type safety. A language with true enumeration types guarantees that no invalid values can be used for a variable or parameter of an enumeration type. In a language without enumeration types, or with fake enumeration types (such as C) programmers must manually check that values are not out of range but very few actually do so. This is a very important area of safety that should never be compromised.

Professor Wirth stated the potential of name collisions when importing an enumeration type as a reason for removal. In our revision we require that enumerated values are qualified with the type identifier, thereby solving the problem of potential name collisions on import in a different way.

In Oberon, Wirth introduced type promotion. Why didn't you follow that?

When we asked Professor Wirth about his experience with type promotion, or type inclusion as it is called in Oberon terminology, he responded "Type inclusion turned out to have been an incredibly bad idea. I have removed it in my latest revision of Oberon."!!!

We have been using C for too long not to know just how bad this feature really is. In general terms, implicit magic is almost always a bad thing because it can lead to misinterpretation and subsequent introduction of errors. Type promotion is one such implicit magic. In our revision we require explicit type conversion. This is so central to our design that we added a dedicated type conversion operator.

You are using C-like number literals. Why the change?

As a general rule, prefixed literals are both easier to read and less effort to parse than suffixed literals. In addition, the B and C suffixes used by classic Modula-2 were also legal base-16 digits, making such literals particularly confusing and very hard to parse when applying a one-character lookahead. The prefix based literal notation popularised by C is far better readable, especially when using a lowercase prefix. It is also far less effort to parse, in particular when applying a one-character lookahead.

You replaced variant records with extensible records. Why the change?

First and foremost, variant record types are fragile. It is not possible to anticipate all future variants. If and when a new variant needs to be added later, the change will break all client libraries using the type, even if the client libraries are not making any use of the newly added variant. In the world of OOP this is known as the fragile base class problem.

Secondly, most implementations of variant records superimpose variant fields to save memory usage. Without a safeguard that only allows retrieval of that variant which has actually been stored, but not any other variant, such an implementation is not type safe. Of all the languages that provide variant record types, Ada is probably the only one that employs such a safeguard and it is quite complex.

In Oberon, Professor Wirth introduced extensible record types as an alternative to variant record types. Wirth's solution solves both issues in an elegant fashion and with simplicity. Naturally, we adopted this superior approach.

You replaced the FOR TO BY with a FOR IN loop. Why the change?

The FOR TO BY loop is sufficient for iterating over an ordinal type or an indexed collection such as an array. In this day and age, data structures such as sets, trees and dictionaries take a far more prominent role than back in the day when classic Modula-2 was defined. To iterate over those data structures the FOR TO BY loop is insufficient, a FOR IN loop is required instead.

We did not want to maintain two separate FOR loops, one for ordinal types and another for collections. Instead we designed a FOR IN loop that also accommodates ordinal types and thus replaced the FOR TO BY loop altogether.

You renamed pseudo-module SYSTEM to UNSAFE. Why the change?

Professor Wirth introduced pseudo-module SYSTEM to mark the use of non-portable and unsafe language features through import from SYSTEM. However, this is a classical case of a misnomer because not every system level feature is non-portable or unsafe, and not every non-portable or unsafe feature is a system level facility. In classical Modula-2 unsafe features remained in the language without requiring import from SYSTEM. If the primary purpose of the module was to house unsafe features, then it should be named accordingly.

Using UNSAFE instead of SYSTEM also serves a psychological purpose. C practitioners are not usually aware when a type transfer is safe or unsafe because C terminology does not distinguish the two cases, referring to both as casts. This has led to a culture of living dangerously without reflection. By explicitly spelling out a dangerous practise such as UNSAFE.CAST, awareness is raised and its use is stigmatised, leading to a culture of caution.

You added a new compilation unit BLUEPRINT? What for and why?

Our revision supports first class user definable abstract data types. That is to say, user defined data types may bind to built-in syntax such as the NEW and RELEASE statements, the FOR IN loop, operators and predefined functions and procedures. In practise user defined data types thereby become indistinguishable from built-in types. This eliminates one of the primary causes for feature growth in the core language.

However, when library defined data types are permitted to bind to built-in syntax there is an enormous potential for violating the consistency and integrity that is generally expected when using built-in syntax. A library defined integer type that looks as if it is built-in should be required to bind to all the syntax that a built-in integer type supports, no more and no less.

We introduced blueprints as a means to enforce the integrity and consistency of bindings to built-in syntax. A user defined data type may only bind to built-in syntax if it declares conformity to a blueprint. In turn, a blueprint imposes constraints and requirements on what bindings must be provided. Our standard library includes a rich set of predefined blueprints for numeric and collection ADTs.

You permit variadic parameters, known from C. Is that type safe?

The reason why variadic functions in C are unsafe is that no formal parameters are specified, they are effectively untyped. By contrast, in Modula-2 formal parameters are always typed and actual parameters must match formal types. We used the same principle in our design of variadic procedures in Modula-2. The facility is entirely type safe.

You permit an indeterminate field in records, known from C. Is that type safe?

The reason why variable length arrays as they are called in C are unsafe is that C does not consider array bounds part of the type and consequently it does not do any bounds checking, neither on ordinary arrays, nor on variable length arrays. By contrast, in Modula-2 array bounds are always part of the type and the compiler guarantees bounds checking. We used the same principle in our design of indeterminate records. The size of the array that makes up the indeterminate field must be linked to another field in the record, called the discriminant field. When the record is allocated at runtime, the runtime system automatically stores the allocated size in the discriminant field after which it becomes immutable and the array is not resizable. Runtime bounds checking is then performed using the discriminant value. The facility is entirely type safe.

So your repo is on Bitbucket. Will you migrate to my favourite alternative?

No, thank you. We cannot afford to engage in activities that would only distract us from our core mission while contributing little if anything towards our goals. We need to stay focused on the mission. Moreover, we have contributed a multi-dialect Modula-2 plugin to the Pygments source code rendering framework which Bitbucket is using. If we migrated away from Bitbucket, there is a chance that the new repository host will use a different rendering framework and we would have to rewrite our plugin, which would be entirely unnecessary double work.