An executable lexer of PFCS.
This is a simple lexer for the PFCS lexical grammar. Efficiency is not the primary focus for now; correctness and simplicity are. In the future, we may optimize this lexer, or we may use APT to do so.
Note that the lexical grammar for PFCS consists of a subset of ASCII characters that is also a subset of Unicode characters that UTF-8 encodes in a single byte. However, some of this lexer code is written to support the possibility of future concrete syntax that is UTF-8 encoded as multiple bytes. Hence the references to "Unicode characters".
The lexer consists of a collection of lexing functions, each of which takes a list of natural numbers as input, which represents the Unicode codepoints of characters that remain to lex in the PFCS definition being lexed. Each function returns two results: the first result is either an error or an ABNF tree (or list of trees) for the recognized lexeme(s); the second result is the remaining list of natural numbers after the lexeme. While from a conceptual point of view it would be better for all these lexing functions to return a single result that is either an error or a pair consisting of an ABNF tree or tree list plus remaining inputs, by returning two results we make the execution more efficient by avoiding constructing and deconstructing the pair.
Some of the code of this lexer is generated via the parser generation tools in the ABNF library (where `parser' in that context refers to the general idea of recognizing and structuring strings in a formal language, which also describes what PFCS lexing does). Other code is written by hand, due to limitations in the aforementioned parser generation tools, such as the efficiency of the generated code.