Search-engine friendly clone of the ACL2 documentation.

X86isa

Tlb

A translation lookaside buffer for the x86isa model.

Why?

Walking the page tables to translate virtual addresses to physical addresses is a relatively expensive operation, both in hardware and software. Real processors use various address translation caches to avoid page table walks and speed up memory access. These caches are not completely transparent to the software running on the processor, requiring software to explicitly invalidate cache entries, so the semantics of such caches is codified in the ISA.

The x86 ISA supports a number of address translation caches including TLBs. The ISA leaves many details of the operation of such caches underspecified to allow for flexibility in implementation. For example, in hardware, all caches must be fixed size. A processor with a small cache will likely invalidate cache entries to make room for new ones more often than a processor with a large cache. Thus, the ISA says any address translation cache entry may be invalidated at any time.

How closely does this model the caching behavior allowed by the ISA?

Modeling the complete semantics of address translation caches is difficult due to how much behavior is underspecified. Doing so in a manner that allows us to use these caches to improve performance in execution, which was our primary goal when implementing the TLB, is even more difficult. Thus, we choose to instead model a particular implementation of an address translation caching system that is consistent with the ISA, but is not necessarily consistent with all valid address translation caching systems.

Some weirdness with our design

While we call our address translation cache system a TLB, it is important to note that, as far as the ISA is concerned, it is actually many independent TLBs. Why? A TLB as described in Intel's SDM (Volume 3 Chapter 4.10.2), assuming all pages are 4k in size (for this discussion; the general problem holds even for large pages, but with some added complexities) maps virtual page numbers to a summary of the information in the paging structures that govern that virtual page's translation. Thus, a given virtual page may only have one translation in cache.

However, what we describe as our TLB maps virtual page numbers, various processor control bits, whether it was an implicit supervisor access, and the type of access (read/write/execute), to a physical page number. This allows for a single virtual page number to have multiple cached translations.

Is our weird TLB design allowed by the ISA?

It turns out, yes. Volume 3 Chapter 4.10.2.2 of Intel's SDM includes the following parenthetical "(TLB entries may contain other information as well. A processor may implement multiple TLBs, and some of these may be for special purposes, e.g., only for instruction fetches. Such special-purpose TLBs may not contain some of this information if it is not necessary. For example, a TLB used only for instruction fetches need not contain information about the R/W and dirty flags.)" This allows us to explain the behavior of our TLB by claiming that it is in fact multiple TLBs, one for however many entries a single virtual page can have in our TLB.

Why not a more "normal" TLB?

One may ask why the TLB was implemented this odd way. The address translation system (see ia32e-la-to-pa and friends) was initially implemented without a TLB. The way it worked was by walking the page tables and at each level determine whether the current level allowed or disallowed the access. If it was allowed it would continue to the next level, or, if it was at the last level, allow the translation. Otherwise, it would stop the page table walk early and set the fault field of the model to indicate a page-fault.

Changing the address translation system to instead walk the page tables and return a summary of the information in the page tables about the page an address is in and then determine whether the access is allowed after the page table walk is compelete, which would allow us to cache page table information in a TLB, would require making many updates to the lemma library. To avoid having to make significant changes to many of these proofs, we chose instead to use this odd implementation of a TLB which was easier to implement by modifying the old code and required less substantial changes to the proofs in the lemma library.

Looking up values quickly

The cache is pretty useless if we can't lookup entries quickly. We use a ACL2::fast-alist, which allows for fast lookups while reasoning about the TLB as if it is an alist. Additionally, to allow for fast lookup in a hash table, for the purposes of address translation caching we always treat pages as being 4k wide, which is allowed by the ISA as long as we invalidate all entries associated with a large page when the software requests to invalidate any one of them. To maintain this, we always invalidate the entire TLB when we wish to invalidate any part of it, even if that is not strictly necessary, like when the software issues an INVLPG instruction, which is allowed by the ISA since we can invalidate any entry at any time.

Subtopics

Tlb-key: A key for our TLB implementation.
Tlbp: Recongizer for the tlb field of the x86 state.
Tlb-entryp: Recognizer for TLB entries.