Search-engine friendly clone of the ACL2 documentation.

Literal-evaluation

Ubyte16-to-utf8

UTF-8 encoding of a 16-bit Unicode code point.

Signature
(ubyte16-to-utf8 codepoint) → bytes
Arguments: codepoint — Guard (ubyte16p codepoint).
Returns: bytes — Type (ubyte8-listp bytes).

The evaluation of plain string literals in Yul involves turning Unicode escapes into their UTF-8 encodings. This function does that.

The encoding is as follows (e.g. see the Wikipedia page on UTF-8):

A code point between 0 and 7Fh, which consists of 7 bits abcdefg, is encoded as one byte 0abcdefg.
A code point between 80h and 7FFh, which consists of 8 to 11 bits abcdefghijk, is encoded as two bytes 110abcde 10fghijk.
A code point between 800h and FFFFh, which consists of 12 to 16 bits abcdefghijklmnop, is encoded as three bytes 1110abcd 10efghij 10klmnop.

Definitions and Theorems

Function: ubyte16-to-utf8

(defun ubyte16-to-utf8 (codepoint)
  (declare (xargs :guard (ubyte16p codepoint)))
  (let ((__function__ 'ubyte16-to-utf8))
    (declare (ignorable __function__))
    (b* ((codepoint (ubyte16-fix codepoint)))
      (cond ((<= codepoint 127) (list codepoint))
            ((<= codepoint 2047)
             (list (logior 192 (ash codepoint -6))
                   (logior 128 (logand codepoint 63))))
            ((<= codepoint 65535)
             (list (logior 224 (ash codepoint -12))
                   (logior 128 (logand (ash codepoint -6) 63))
                   (logior 128 (logand codepoint 63))))
            (t (impossible))))))

Theorem: ubyte8-listp-of-ubyte16-to-utf8

(defthm ubyte8-listp-of-ubyte16-to-utf8
  (b* ((bytes (ubyte16-to-utf8 codepoint)))
    (ubyte8-listp bytes))
  :rule-classes :rewrite)

Theorem: ubyte16-to-utf8-of-ubyte16-fix-codepoint

(defthm ubyte16-to-utf8-of-ubyte16-fix-codepoint
  (equal (ubyte16-to-utf8 (ubyte16-fix codepoint))
         (ubyte16-to-utf8 codepoint)))

Theorem: ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint

(defthm ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint
  (implies (acl2::ubyte16-equiv codepoint codepoint-equiv)
           (equal (ubyte16-to-utf8 codepoint)
                  (ubyte16-to-utf8 codepoint-equiv)))
  :rule-classes :congruence)