A TLB (or Translation Lookaside Buffer) is a cache that a CPU uses to quickly perform address translations between logical addresses (sometimes called virtual addresses) and physical addresses. See the linked article for fine details.

Real TLBs

Most MMUs (memory management units) divided the logical address space of a CPU into pages. For example, the Intel x86 processors have a 32-bit logical address space where each page is 4096 (or 2**12) bytes long. This implies that the low 12 bits of any address are the offset within a page, while the upper 20 bits specify which page the address is referring to. This logic is similar on most modern CPUs, though the page size differs (and sometimes is variable).

When an address is referenced on the CPU, the MMU must translate that address from logical to physical. It does this by extracting the page index from the upper bits of the address, looking that up in some kind of table, and then replacing those upper bits with bits from the table. The problem is that these tables aren't small. If you consider the Intel x86 case above, there are just over 1 million pages, and each table entry requires 8 bytes, so that's 8 MB to describe a 32-bit address space. That may not seem like much now, but keep in mind that Intel's MMU was introduced back in the days of the 386, when you'd be lucky if your entire system had 8 MB of RAM.

To alleviate this problem, the tables are often arranged in RAM hierarchically. The details aren't necessary here, except to understand that looking up a logical-to-physical address translation may require several costly accesses to RAM. This is why the CPU maintains a TLB. The TLB serves to cache the most recently looked up table values on the chip itself, so that it can efficiently perform most lookups without having to actually go do the work of looking things up in a table.

An interesting side note is that some CPU architectures implement a "software" TLB. This means that when a logical page entry is not found in the TLB, the chip just generates an exception and expects the OS software to manually do the table lookup and populate the TLB. The MIPS architecture in typical RISC fashion makes use of a software TLB, while the Intel x86 has all the extra transistors necessary to do the lookups for you. Interestingly, the PowerPC architecture has examples of both, depending on the exact CPU model.

As with any caching mechanism, you have to be careful to keep it in sync with the real data. To this end, all CPUs that implement TLBs have some mechanism to flush one or more entries. The operating system managing the TLB is responsible for knowing when it is making changes to the paging structure and for performing the necessary flushes to ensure that the TLB is not out of sync with reality.

In addition to providing the logical-to-physical address mapping, entries in the table (and the TLB) also contain additional information, such as whether access to a given page is allowed, whether the page has been accessed, or whether the page has been modified.

Virtual TLBs

When emulating a CPU that has an MMU, it is vital that the logical-to-physical address translation be speedy. To do this, we follow the hardware's example and emulate a TLB in software. One option is to explicitly mimic the behaviors of the particular CPU architecture's TLB. Unfortunately, the details of TLB operation are often not fully disclosed, and even if they are, they are not quite as efficient to emulate in software.

The approach taken in MAME is to allocate a complete linear lookup table (yes, all 1 million entries of it in the Intel x86 case) and use that as a virtual TLB. Here's how it works.

Each entry in the virtual TLB holds a 32-bit value. The top bits of that value are the exact bits which will replace the bits of the logical address that specify the page index. For example, in the Intel x86 case, the top 20 bits of the logical address specify the page index, so the top 20 bits of the virtual TLB entry will contain the physical address corresponding to that page.

The low 8 bits of each TLB entry contain flags:

  • VTLB_FLAG_VALID -- indicates that this VTLB entry is valid; an invalid entry should have all other flags cleared
  • VTLB_FLAG_FIXED -- indicates that this entry was manually specified and is fixed
  • VTLB_READ_ALLOWED -- indicates that a supervisor-level read access is permitted
  • VTLB_WRITE_ALLOWED -- indicates that a supervisor-level write access is permitted
  • VTLB_FETCH_ALLOWED -- indicates that a supervisor-level instruction fetch is permitted
  • VTLB_USER_READ_ALLOWED -- indicates that a user-level read access is permitted
  • VTLB_USER_WRITE_ALLOWED -- indicates that a user-level write access is permitted
  • VTLB_USER_FETCH_ALLOWED -- indicates that a user-level instruction fetch is permitted

To understand how to make the virtual TLB work for emulating the MMU, let's start with the Intel x86 case, where the hardware implements the TLB. When we first start, the virtual TLB is completely empty; all entries are set to 0. This means that during the very first memory access (once the MMU is enabled), the CPU core will perform its lookup and detect that nothing appears to be mapped for that address.

In response to this, the emulator calls vtlb_fill, which will attempt to populate the TLB for the provided address from the tables that the operating system has set up. To do this, vtlb_fill calls the CPU core's translate callback on the address and specifies whether the access involved was a read, a write, or an instruction fetch.

If the translate callback returns and says that there is no table entry present for that address, then we return failure to the memory handler, who will then likely generate a page fault to the virtual CPU.

If the translate callback returns with an actual mapping, then vtlb_fill strips off the low bits, set the VTLB_FLAG_VALID bit, and also sets the appropriate _ALLOWED bit based on the type of access that occurred. So if a user-mode read happened, then vtlb_fill will set the VTLB_USER_READ_ALLOWED bit in the entry. This new entry is then put into the table.

Upon return to the CPU core, the memory accessor must re-lookup the entry in the table. If the appropriate _ALLOWED bit is now set, the access can proceed as normal. If it is not set, then the emulator will need to generate a fault within the CPU core.

It is important to note that only one type of access bit is set at a time. This means that a read from an address can result in a call to vtlb_fill to check for read permissions, while a subsequent write access may also require a separate call to vtlb_fill in order to ensure that writes are enabled.

During execution, certain operations performed on the Intel x86 result in an implicit flush of the CPU's TLB. In response to these operations, the virtual TLB also needs to be flushed. This is done by calling vtlb_flush_dynamic to flush the entire virtual TLB, or vtlb_flush_address to only flush the entry that represents the provided logical address.

Problems and Solutions

Supporting software TLBs

Limited entries

Accessed and dirty bits