UTF-8 encoding of a 16-bit Unicode code point.
(ubyte16-to-utf8 codepoint) → bytes
The evaluation of plain string literals in Yul involves turning Unicode escapes into their UTF-8 encodings. This function does that.
The encoding is as follows (e.g. see the Wikipedia page on UTF-8):
Function:
(defun ubyte16-to-utf8 (codepoint) (declare (xargs :guard (acl2::ubyte16p codepoint))) (let ((__function__ 'ubyte16-to-utf8)) (declare (ignorable __function__)) (b* ((codepoint (acl2::ubyte16-fix codepoint))) (cond ((<= codepoint 127) (list codepoint)) ((<= codepoint 2047) (list (logior 192 (ash codepoint -6)) (logior 128 (logand codepoint 63)))) ((<= codepoint 65535) (list (logior 224 (ash codepoint -12)) (logior 128 (logand (ash codepoint -6) 63)) (logior 128 (logand codepoint 63)))) (t (acl2::impossible))))))
Theorem:
(defthm ubyte8-listp-of-ubyte16-to-utf8 (b* ((bytes (ubyte16-to-utf8 codepoint))) (acl2::ubyte8-listp bytes)) :rule-classes :rewrite)
Theorem:
(defthm ubyte16-to-utf8-of-ubyte16-fix-codepoint (equal (ubyte16-to-utf8 (acl2::ubyte16-fix codepoint)) (ubyte16-to-utf8 codepoint)))
Theorem:
(defthm ubyte16-to-utf8-ubyte16-equiv-congruence-on-codepoint (implies (acl2::ubyte16-equiv codepoint codepoint-equiv) (equal (ubyte16-to-utf8 codepoint) (ubyte16-to-utf8 codepoint-equiv))) :rule-classes :congruence)