Parse the RLP encoding of a tree, returning any extra bytes for further parsing.
(rlp-parse-tree encoding) → (mv error? tree rest)
This function returns an error result (
This parser is fairly straightforward,
but there is a slight subtlety missed by some implementations,
including the reference decoding code in [Wiki:RLP]:
in order to recognize only valid encodings
(as formalized by rlp-tree-encoding-p),
a decoder must reject encodings that use ``non-optimal'' lengths.
For instance, the singleton byte array
So our decoding code checks that
(i) the form
Neither [YP:B] nor [Wiki:RLP] explicitly talk about this, but the fact that [YP:B] prescribes the encodings in all cases can be reasonably taken to exclude the encodings with non-optimal lengths. Furthermore, various GitHub issues for Ethereum implementations regard the acceptance of encodings with non-optimal lengths as bugs. Thus, we take the strict interpretation and regard such encodings as invalid. (Otherwise, formally, encoding would have to be a relation instead of a function.)
The function to parse single trees is mutually recursive with a function to parse lists of trees. The latter does not return the remaining bytes: it is always called with the exact byte sequence that is supposed to encode the trees of the list, one tree after the other.
The termination of these parsing function is proved via a lexicographic measure. The first component is the length of the input; the second component is an ordering on the two parsing functions where the one for trees is smaller than the one for lists of trees. This second component is needed because the function for lists of trees calls the one for trees on the same input (to parse the first tree in the list), and so the ordering on the function makes the overall measure smaller. When the function for trees calls the one for lists of trees, the input has been reduced in length by at least the first byte, and so it is immediate to prove that the overall measure decreases. When the function for lists of trees calls itself recursively, the input has also decreased, but this is a property of the function for trees, which cannot be proved until the function is admitted, i.e. shown to terminate. In other words, the termination here depends on functional properties. Thus, we use mbt to ensure that the length has in fact decreased, which lets us prove termination. Then guard verification ensures that that check is always satisfied. Under the negated mbt condition, we must return some error, but it does not matter which error, since this case never happens.
Before verifying guards, we need to show that rlp-parse-tree returns fewer remaining bytes than the input bytes when there is no error. This is done by the linear rules.
If a sequence of bytes is successfully parsed,
a sequence obtained by adding extra bytes at the end
is also successfully parsed.
The decoded tree is the same.
The remaining bytes are extended with the extra bytes.
This is all expressed by the theorem
rlp-parse-tree is left inverse of rlp-encode-tree,
over the encodable trees.
This implies that rlp-parse-tree accepts
all valid encodings of trees.
This is expressed by the theorem
rlp-parse-tree is right inverse of rlp-encode-tree,
over the valid tree encodings.
This implies that rlp-parse-tree accepts
only valid encodings of tree:
if it accepted an invalid encoding,
this right inverse theorem would imply that
rlp-encode-tree would map the result
back to that invalid encoding,
which would therefore be a valid encoding,
contradicting the initial assumption.
This right inverse theorem is
Without the extra checks for optimal lengths in the parser, the left inverse theorem would still be provable, but the right inverse theorem would not.
Function:
(defun rlp-parse-tree (encoding) (declare (xargs :guard (byte-listp encoding))) (b* ((encoding (byte-list-fix encoding)) (irrelevant (rlp-tree-leaf nil)) ((when (endp encoding)) (mv (rlp-error-no-bytes) irrelevant nil)) ((cons first encoding) encoding) ((when (< first 128)) (mv nil (rlp-tree-leaf (list first)) encoding)) ((when (<= first 183)) (b* ((len (- first 128)) ((when (< (len encoding) len)) (mv (rlp-error-fewer-bytes-than-short-length (list first) len (len encoding)) irrelevant nil)) (bytes (take len encoding)) ((when (and (= len 1) (< (car bytes) 128))) (mv (rlp-error-non-optimal-short-length (list first (car bytes))) irrelevant nil)) (encoding (nthcdr len encoding))) (mv nil (rlp-tree-leaf bytes) encoding))) ((when (< first 192)) (b* ((lenlen (- first 183)) ((when (< (len encoding) lenlen)) (mv (rlp-error-fewer-bytes-than-length-of-length (list first) lenlen (len encoding)) irrelevant nil)) (len-bytes (take lenlen encoding)) ((unless (equal (trim-bendian* len-bytes) len-bytes)) (mv (rlp-error-leading-zeros-in-long-length (cons first len-bytes)) irrelevant nil)) (encoding (nthcdr lenlen encoding)) (len (bebytes=>nat len-bytes)) ((when (<= len 55)) (mv (rlp-error-non-optimal-long-length (cons first len-bytes)) irrelevant nil)) ((when (< (len encoding) len)) (mv (rlp-error-fewer-bytes-than-long-length (cons first len-bytes) len (len encoding)) irrelevant nil)) (bytes (take len encoding)) (encoding (nthcdr len encoding))) (mv nil (rlp-tree-leaf bytes) encoding))) ((when (<= first 247)) (b* ((len (- first 192)) ((when (< (len encoding) len)) (mv (rlp-error-fewer-bytes-than-short-length (list first) len (len encoding)) irrelevant nil)) (subencoding (take len encoding)) (encoding (nthcdr len encoding)) ((mv error? subtrees) (rlp-parse-tree-list subencoding)) ((when error?) (mv (rlp-error-subtree error?) irrelevant nil))) (mv nil (rlp-tree-branch subtrees) encoding))) (lenlen (- first 247)) ((when (< (len encoding) lenlen)) (mv (rlp-error-fewer-bytes-than-length-of-length (list first) lenlen (len encoding)) irrelevant nil)) (len-bytes (take lenlen encoding)) ((unless (equal (trim-bendian* len-bytes) len-bytes)) (mv (rlp-error-leading-zeros-in-long-length (cons first len-bytes)) irrelevant nil)) (encoding (nthcdr lenlen encoding)) (len (bebytes=>nat len-bytes)) ((when (<= len 55)) (mv (rlp-error-non-optimal-long-length (cons first len-bytes)) irrelevant nil)) ((when (< (len encoding) len)) (mv (rlp-error-fewer-bytes-than-long-length (cons first len-bytes) len (len encoding)) irrelevant nil)) (subencoding (take len encoding)) (encoding (nthcdr len encoding)) ((mv error? subtrees) (rlp-parse-tree-list subencoding)) ((when error?) (mv (rlp-error-subtree error?) irrelevant nil))) (mv nil (rlp-tree-branch subtrees) encoding)))
Function:
(defun rlp-parse-tree-list (encoding) (declare (xargs :guard (byte-listp encoding))) (b* (((when (endp encoding)) (mv nil nil)) ((mv error? tree encoding1) (rlp-parse-tree encoding)) ((when error?) (mv error? nil)) ((unless (mbt (< (len encoding1) (len encoding)))) (mv (rlp-error-no-bytes) nil)) ((mv error? trees) (rlp-parse-tree-list encoding1)) ((when error?) (mv error? nil))) (mv nil (cons tree trees))))
Theorem:
(defthm return-type-of-rlp-parse-tree.error? (b* (((mv ?error? ?tree common-lisp::?rest) (rlp-parse-tree encoding))) (maybe-rlp-error-p error?)) :rule-classes :rewrite)
Theorem:
(defthm return-type-of-rlp-parse-tree.tree (b* (((mv ?error? ?tree common-lisp::?rest) (rlp-parse-tree encoding))) (rlp-treep tree)) :rule-classes :rewrite)
Theorem:
(defthm return-type-of-rlp-parse-tree.rest (b* (((mv ?error? ?tree common-lisp::?rest) (rlp-parse-tree encoding))) (byte-listp rest)) :rule-classes :rewrite)
Theorem:
(defthm return-type-of-rlp-parse-tree-list.error? (b* (((mv ?error? ?trees) (rlp-parse-tree-list encoding))) (maybe-rlp-error-p error?)) :rule-classes :rewrite)
Theorem:
(defthm return-type-of-rlp-parse-tree-list.trees (b* (((mv ?error? ?trees) (rlp-parse-tree-list encoding))) (rlp-tree-listp trees)) :rule-classes :rewrite)
Theorem:
(defthm len-of-rlp-parse-tree (<= (len (mv-nth 2 (rlp-parse-tree encoding))) (len encoding)) :rule-classes :linear)
Theorem:
(defthm len-of-rlp-parse-tree-when-no-error (implies (not (mv-nth 0 (rlp-parse-tree encoding))) (< (len (mv-nth 2 (rlp-parse-tree encoding))) (len encoding))))
Theorem:
(defthm rlp-parse-tree-of-byte-list-fix-encoding (equal (rlp-parse-tree (byte-list-fix encoding)) (rlp-parse-tree encoding)))
Theorem:
(defthm rlp-parse-tree-byte-list-equiv-congruence-on-encoding (implies (byte-list-equiv encoding encoding-equiv) (equal (rlp-parse-tree encoding) (rlp-parse-tree encoding-equiv))) :rule-classes :congruence)
Theorem:
(defthm rlp-parse-tree-extend (implies (not (mv-nth 0 (rlp-parse-tree encoding))) (and (not (mv-nth 0 (rlp-parse-tree (append encoding more)))) (equal (mv-nth 1 (rlp-parse-tree (append encoding more))) (mv-nth 1 (rlp-parse-tree encoding))) (equal (mv-nth 2 (rlp-parse-tree (append encoding more))) (append (mv-nth 2 (rlp-parse-tree encoding)) (byte-list-fix more))))))
Theorem:
(defthm rlp-parse-tree-of-rlp-encode-tree (b* (((mv e-error? encoding) (rlp-encode-tree tree)) ((mv d-error? tree1 rest) (rlp-parse-tree encoding))) (implies (not e-error?) (and (not d-error?) (not (consp rest)) (equal tree1 (rlp-tree-fix tree))))))
Theorem:
(defthm rlp-parse-tree-list-of-rlp-encode-tree-list (b* (((mv e-error? encoding) (rlp-encode-tree-list trees)) ((mv d-error? trees1) (rlp-parse-tree-list encoding))) (implies (not e-error?) (and (not d-error?) (equal trees1 (rlp-tree-list-fix trees))))))
Theorem:
(defthm rlp-encode-tree-of-rlp-parse-tree (b* (((mv d-error? tree rest) (rlp-parse-tree encoding)) ((mv e-error? encoding1) (rlp-encode-tree tree))) (implies (not d-error?) (and (not e-error?) (equal (append encoding1 rest) (byte-list-fix encoding))))))
Theorem:
(defthm rlp-encode-tree-list-of-rlp-parse-tree-list (b* (((mv d-error? trees) (rlp-parse-tree-list encoding)) ((mv e-error? encoding1) (rlp-encode-tree-list trees))) (implies (not d-error?) (and (not e-error?) (equal encoding1 (byte-list-fix encoding))))))