• Top
    • Documentation
    • Books
    • Boolean-reasoning
    • Projects
    • Debugging
    • Std
    • Proof-automation
    • Macro-libraries
    • ACL2
    • Interfacing-tools
    • Hardware-verification
      • Gl
      • Esim
      • Vl2014
      • Sv
      • Fgl
      • Vwsim
      • Vl
        • Syntax
        • Loader
          • Preprocessor
          • Vl-loadconfig
          • Vl-loadstate
          • Lexer
            • Lex-strings
              • Vl-read-string-aux
              • Vl-read-octal-string-escape
              • Vl-read-string-escape-sequence
              • Vl-read-hex-string-escape
              • Vl-read-string
              • Vl-lex-string
            • Lex-identifiers
            • Vl-typo-uppercase-p
            • Vl-typo-number-p
            • Vl-typo-lowercase-p
            • Lex-numbers
            • Chartypes
            • Vl-lex
            • Defchar
            • Tokens
            • Lex-keywords
            • Lexstate
            • Make-test-tokens
            • Lexer-utils
            • Lex-comments
            • Vl-typo-uppercase-list-p
            • Vl-typo-lowercase-list-p
            • Vl-typo-number-list-p
          • Parser
          • Vl-load-merge-descriptions
          • Vl-find-basename/extension
          • Vl-load-file
          • Vl-loadresult
          • Scope-of-defines
          • Vl-find-file
          • Vl-flush-out-descriptions
          • Vl-description
          • Vl-read-file
          • Vl-includeskips-report-gather
          • Vl-load-main
          • Extended-characters
          • Vl-load
          • Vl-load-description
          • Vl-descriptions-left-to-load
          • Inject-warnings
          • Vl-preprocess-debug
          • Vl-write-preprocessor-debug-file
          • Vl-read-file-report-gather
          • Vl-load-descriptions
          • Vl-load-files
          • Translate-off
          • Vl-load-read-file-hook
          • Vl-read-file-report
          • Vl-loadstate-pad
          • Vl-load-summary
          • Vl-collect-modules-from-descriptions
          • Vl-loadstate->warnings
          • Vl-iskips-report
          • Vl-descriptionlist
        • Warnings
        • Getting-started
        • Utilities
        • Printer
        • Kit
        • Mlib
        • Transforms
      • X86isa
      • Svl
      • Rtl
    • Software-verification
    • Math
    • Testing-utilities
  • Lexer

Lex-strings

Handling of string literals.

String literals are sequences of ASCII characters that are enclosed in "double quotes."

Verilog-2005 and SystemVerilog-2012 have some differences here, and Verilog implementations like Verilog-XL, NCVerilog, and VCS generally don't seem to follow the standard. We discuss some of the nuances here.

Line Continuations

The Verilog-2005 standard says that strings are contained on a single line, but SystemVerilog-2012 adds a line continuation sequence, \<newline>, which doesn't become part of the string. That is,

$display("Hello \
World");

Is invalid in Verilog-2005, but prints "Hello World" in SystemVerilog-2012. While Verilog-XL doesn't appear to implement this new syntax, but NCVerilog and VCS both do.

What counts as a newline? Let NL denote the newline character and CR denote carriage return. Both NCVerilog and VCS appear to accept only exactly \ NL:

  • They report syntax errors complaining about multi-line strings when given string literals that include \ CR NL; presumably they are translating \ CR into a plain CR and then hitting the NL, thinking it is not escaped.
  • They accept \ CR, but leave a CR character in the output, so it seems they are treating this as just an escaped CR character instead of a line continuation.
  • They accept \ NL CR, but leave a CR character in the output. So, it seems they just matching the \ NL part as the line continuation, and then treating the CR as an ordinary character.

We will allow or prohibit line continuations based on the vl-edition-p being used. When it is allowed, we will accept only exactly \ NL, like VCS and NCVerilog.

Basic Escapes

Verilog-2005 (Section 3.6.3) could be interpreted as prohibiting raw tab characters, but experimentation with tools like Verilog-XL, NCVerilog, and VCS suggest that tab characters should be accepted in strings, so we allow them.

Strings in both Verilog-2005 and SystemVerilog-2012 can make use of the following, basic escape sequences:

\n Newline
\t Tab
\\ \ character
\" " character

These sequences seem to work on Verilog-XL, NCVerilog, and VCS without any issues.

Octal Escapes

Verilog-2005 also allows for the encoding of arbitrary ASCII characters using an octal escape sequences.

\ddd Character by octal code of 1-3 digits (0 <= d <= 7)

Note that 377 in octal is 255 in decimal, so a sequence such as \378 is not really a valid character code. The Verilog standard says that implementations may issue an error in such cases. In practice, none of Verilog-XL, NCVerilog, or VCS complain about \378. Even so, it seems reasonable for us to notice and fail with errors in this case.

The Verilog-2005 standard explains the handling of \ddd nicely. Unfortunately, SystemVerilog-2012 has made quite a muddle of it.

In the SystemVerilog standard, they have replaced the informal description of octal digits with the more precise octal_digit production. This leads to a mess because octal_digit, used in numbers, can include X and Z digits. To work around this stupid new problem they've just caused, the standard goes on to restrict these octal_digits not to be x_digits or z_digits. They further say that an x_digit or z_digit cannot follow a \ddd sequence with fewer than 3 characters.

This means that certain sequences like \40x or \40?, which were perfectly valid in Verilog 2005, are no longer valid in SystemVerilog 2012.

In practice, none of Verilog-XL, NCVerilog, or VCS implements this restriction. However, since these are probably a very rare and esoteric thing to write in the first place, it seems reasonable for VL to prohibit these sequences.

Additional SystemVerilog-2012 Escapes

The SystemVerilog-2012 standard introduces some new, simple escape sequences:

\v Vertical Tab
\f Form Feed
\a Bell

None of these sequences seem to be implemented on Verilog-XL, NCVerilog, or VCS. Instead, when these tools encounter sequences like \v, they seem to simply produce v, and for \x00, they simply produce x00.

We nevertheless try to follow the standard, and properly implement these escape sequences for suitable vl-edition-ps.

SystemVerilog 2012 also adds some ambiguous language (Section 5.9) that Nonprinting and other special characters are preceded with a backslash. It's not clear whether this is just an informal description of what the escape tables mean, or if we're supposed to allow any non-printable character to be included in a string literal by preceding it with a backslash. But it appears (cosims/str) that other tools allow most characters to be preceded by a backslash in which case they expand to themselves. We try to be compatible where we think this seems safe.

Hex escapes

SystemVerilog-2012 also adds a way to specify characters by hexadecimal character codes:

\xdd Character by one or two hex digits

As with octal digits, the definition is muddled by the use of hex_digit, which leads to the possibility of x and z characters that then have to be ruled out separately.

None of Verilog-XL, SystemVerilog, or VCS seems to implement hex escapes yet. Instead, sequences like \x0 simply get displayed as x0, as if the \x is being converted into an x. VL implements the standard.

Subtopics

Vl-read-string-aux
Main loop for reading string literals.
Vl-read-octal-string-escape
Try to read a \ddd string escape.
Vl-read-string-escape-sequence
Try to read a string escape sequence.
Vl-read-hex-string-escape
Try to read a \xdd string escape.
Vl-read-string
Vl-lex-string
Lexing of string literals.