Representation of a single extended character.
Historically, a
vl-echar ::= (char . (:vl-location . (filename . (line . col)))))
Assume we need no extra overhead to represent the filename, line, or column. This is fair: typically whole giant sets of echars all have their filename pointing to the same string, so we don't need extra memory for the file name. Furthermore, the line and column number are always fixnums in practice, i.e., they are immediates that don't take any extra space. Then, the memory required for each echar is:
4 conses = 128 bits * 4 = 512 bits = 64 bytes
But since echars generally go in a list, we usually also need 1 extra cons per character to join it to the rest of the list. So, the total overhead just for echars is really more like 80 bytes. In short, to read a file with N bytes in it we need 80N bytes of memory, so if we want to process a 100 MB Verilog file we need 8 GB of space! (It's actually worse than this, because that's just the cost of reading the characters in the first place. After that we have to preprocess them, which is basically an echarlist-to-echarlist transformation. Preprocessing doesn't need to deeply copy the echars themselves, but it is still going to deeply copy the list, which means an extra 16 bytes of overhead per character. So we're up to 9.6 GB for a 100 MB file before reaching a good place where we can garbage collect.
To reduce this overhead, we now use a more efficient encoding scheme.
We will use a simple encoding that allows us to represent almost any practical echar as a single cons of an immediate onto a filename pointer. We will assume we can represent any unsigned 60-bit number as a fixnum, which is true in 64-bit CCL. This seems like plenty of space. We divide it up, rather arbitrarily, as follows:
It is hard to imagine hitting these limits in practice, but as a fallback we will simply allow any characters from locations outside this range to be represented as cons structures with line, column, and character code components. This is no worse than our former representation, and means that the interface for constructing echars can be kept simple and bounds-free.
Function:
(defun vl-echar-p (x) (declare (xargs :guard t)) (let ((__function__ 'vl-echar-p)) (declare (ignorable __function__)) (and (std::prod-consp x) (b* ((filename (std::prod-car x)) (pack (std::prod-cdr x))) (and (stringp filename) (vl-echarpack-p pack))))))