CPUs and Address Spaces

From MAMEDEV Wiki
Revision as of 00:56, 25 November 2014 by Stiletto (talk | contribs)

MAME supports many different types of CPUs. This article aims to explain how the memory and address spaces for these CPUs is managed in MAME.

This article is WIP.

Address Spaces

Currently, MAME supports CPUs with up to three distinct address spaces:

  1. Program space (ADDRESS_SPACE_PROGRAM) is by definition the address space where all code lives. On Von Neumann architecture CPUs, it is also where data is stored. Most CPUs are Von Neumann architecture systems, and thus comingle code and data in a single address space.
  2. Data space (ADDRESS_SPACE_DATA) is a separate address space where data is stored for Harvard architecture CPUs. An example of a Harvard architecture CPU in MAME is the ADSP2100.
  3. I/O space (ADDRESS_SPACE_IO) is a third address space for CPUs that have separate I/O operations. For example, the Intel x86 architecture has IN and OUT instructions which are effectively reads and writes to a separate address space. In MAME, these reads and writes are directed to the I/O space.

Note that a number of CPU architectures also have internal memory that is in a separate space or domain from the other three. Memory referenced in this way is expected to be maintained internally in the CPU core and is not exposed through the memory system in MAME.

CPUs and Bus Width

Everyone has probably heard about the "8-bit" Z80 CPU or the "32-bit" 80386 CPU. But where does this notion of "8-bit" and "32-bit" come from? When referring to CPUs, there are three metrics worth considering, all of which might be used to describe a CPU, depending on which sounds better to the marketing department (seriously).

The first possible metric is the size of the internal arithmetic units in the CPU. For example, the Motorola 68000 CPU can do arithmetic operations on 32-bit numbers internally. Does this make it a 32-bit CPU? Depends on who you ask, though most people would probably say "no", because it is at odds with the other two metrics.

The second possible metric is the width of the address bus. When a CPU goes to fetch memory, it has to tell the outside world what address it wishes to access. To do this, it drives some of the pins on the chip to specify in binary the address it wants to access, and then signals a read or a write to actually cause the memory access to occur. The number of pins available on the chip for sending these addresses is referred to as the address bus width, and ultimately controls how much memory the CPU can access. For example, the original Intel 8086 has 20 address pins on it, and could access 220 bytes (or 1 MB) of memory; thus, we say it had a 20-bit address bus. When Intel created the 80286, it increased the address bus width to 24-bit (16 MB), and then to 32-bit (4 GB) with the introduction of the 80386. Which is one reason why the 80386 is called a "32-bit" CPU.

The third possible metric is the width of the data bus. This describes how many bits of data the CPU can fetch at one time. Again, this is related to pins on the chip, so a CPU that has an 8-bit data bus can fetch 8 bits (or one byte) at a time, and has 8 pins on the CPU which either send or receive the data that goes out to memory. Almost all CPUs access memory either in 8-bit, 16-bit, 32-bit, or 64-bit chunks (though there are a few oddballs that don't follow these rules). For example, the original Motorola 68000 accessed memory in 16-bit chunks, meaning it had 16 pins on the CPU which sent/received data, and thus we say it had a 16-bit data bus width. When Motorola introduced the 68020, it doubled that data bus width to 32-bits, meaning that it could fetch twice the amount of data in a single memory access. This is why the 68020 is called a "32-bit" CPU.

So why do you need to know all of this for working with address maps? Well, the first metric is irrelevant because it doesn't apply to memory accesses, but the second two metrics describe how the CPUs deal with memory, and some of those details leak into the address maps.

MAME today supports any address bus width from 1-32 bits, and it supports data bus widths of 8, 16, 32, and 64-bit. For CPUs with oddball data bus widths, we generally round up to the next highest and clean up the details in the CPU core.

Also note that each address space can have different properties, even within the same CPU. For example, the Intel 80386 has a program address space with a 32-bit address bus width, but it has an I/O address space with a 16-bit address bus width.

Memory Layout in MAME

Hoo boy, this is a controversial topic. MAME was originally written with only 8-bit CPUs in mind. The nice thing about 8-bit CPUs is that from the CPU's perspective, all memory accesses are a single byte wide, so it doesn't matter if the system running the emulator was little endian or big endian. Eventually, however, MAME expanded and an emulator for the Motorola 68000 was added. This brought up the tough question: how do you model memory accesses for a CPU with a 16-bit data bus?

There are a few things to keep in mind here. First, because the concept of an 8-bit byte is so firmly grounded in microprocessors, almost all CPUs that have larger data busses also support accessing individual bytes. So while a 16-bit CPU can access 16 bits at a time, it also has the ability to individually access either the upper 8 bits or the lower 8 bits, effectively permitting byte-level access to memory and peripherals. This is generally taken one step further with CPUs that have a 32-bit data bus; for the most part, they can access any of the four 8-bit bytes independently, or either of the two 16-bit words independently.

The second thing to understand is that when a 16-bit CPU performs a 16-bit memory access, at the hardware level it sends out a single read or write request. The reason this becomes important is that when you write an emulator, each memory access generally translates into a function call which simulates the read/write behavior of the appropriate block of memory or I/O device. A naive approach to supporting 16-bit CPUs might be to keep all these function calls operating at a byte level only, and just perform two function calls to read two neighboring bytes if a 16-bit request is issued. The problem with this is that you make it difficult for the functions that simulate the hardware to know whether the orignal CPU request was for 8 or 16 bits, and in some cases it does make a difference.

The third thing to know is that at the hardware level, a CPU with a 16-bit databus always performs "aligned" memory accesses. That is, when communicating to outside memory or peripherals, it will only ask for even addresses. In fact, 16-bit CPUs don't actually even have a pin for the low bit of the address bus! But what about accessing individual bytes, you ask? Well, if you read a byte from an even address (say $3000), the CPU will request a read from address $3000, but will also request that only the 8 bits corresponding to the first byte be returned. If you read from an odd address (say $3001), it will also issue a read from address $3000, but will also request that only the 8 bits corresponding to the second byte be returned.

With that in mind, the way that MAME decided to support 16-bit CPUs was to define a new set of functions which were dedicated to reads and writes from 16-bit CPUs. These functions know that accesses may be either 8-bit or 16-bit, and know how to call out to special 16-bit memory handlers, which are functions that simulate the behavior of peripherals and other devices mapped into the CPU's address space. Seems straightforward enough, so where's the controversy?

The first controversial bit has to do with the 'offset' parameter to the read/write handlers. As mentioned earlier, 16-bit CPUs don't actually have a low bit on their address bus, they just specify which bytes within the word they actually want to access. To enforce this, the memory system could either have masked off the low bit, or else shifted the address one bit to the right to discard the bit. In MAME, the latter approach was taken, which can be a bit confusing. For example, an access to address $30 on a 16-bit CPU would mean that your read/write handler actually gets passed an address of $18 (and on a 32-bit CPU that would translate as an address of $0C: shifted right 2 bits). But this actually makes sense once the second controversy is explained.

The second controversial bit is how RAM and ROM memory is laid out. Because all accesses to memory are effectively 16 bits wide (potentially with some masking to extract either one byte or the other), the decision was made to store RAM and ROM natively in 16-bit chunks. What does this mean? Well, say you have a block of ROM on a Motorola M68000 (big-endian) with the following values:

$1234 $5678 $9ABC $DEF0

And let's say you have a pointer to it in your code:

UINT16 *memptr;

Then the intention is that, regardless of the endianness of the CPU that is running the emulator, if you read memptr[2], you will get the value $9ABC. That's because for a CPU with a 16-bit data bus, all memory is organized and accessed by the core in 16-bit chunks. As long as you access memory strictly through 16-bit pointers, everything will work fine. (Note also that because of the use of 16-bit pointers, having the 'offset' parameter shifted by 1 lets you use the offset directly as an array index to memory.)

To illustrate the mind-bending a bit more graphically, consider running the M68000 on an Intel x86-based CPU. The M68000 is big endian while the x86 is little endian. When accessing data as words, it goes like this:

M68000: $1234 $5678 $9ABC $DEF0
   x86: $1234 $5678 $9ABC $DEF0

But if you look at the byte level, you see it is different:

M68000: $12 $34 $56 $78 $9A $BC $DE $F0
   x86: $34 $12 $78 $56 $BC $9A $F0 $DE

For this reason, with 16-bit CPUs, you generally cannot take a UINT16 * pointer to memory, cast it to a UINT8 *, and access the individual bytes without getting incorrect results when the endianness of the native CPU is different from that of the emulated CPU.

Special Memory Types

In general, when a CPU wishes to access memory, it calls into MAME's memory system, which looks up the address provided, determines who is registered as the owner of that address, and then calls the associated memory read or write handler. However, not all memory is treated equally. There are five "special" types of memory which can be registered: RAM, ROM, banks, no-ops, and unmapped space. To register for any of these types, you simply specify SMH_RAM, SMH_ROM, SMH_BANK(banknum), SMH_NOP, or SMH_UNMAP in place of the read or write handlers in your address map. (SMH == Static Memory Handler)

SMH_UNMAP and SMH_NOP

Starting with the simple special handlers, the unmapped handler (SMH_UNMAP) is used to indicate that the given address range is effectively unmapped. In MAME this means that writes are ignored, and reads return either 0 (default) or ~0. In addition, each unmapped memory access is automatically logged to error.log; this is used to help identify potentially unidentified regions in the address map.

The no-op handler (SMH_NOP) is identical to the unmapped handler, except that it does no logging. Use this when you have an address range which generates a lot of unwanted junk in the error.log file.

SMH_BANK() and Memory Banks

The remaining three special handlers are all interrelated, but to understand them, you have to understand the concept of memory banks. Memory banks were originally introduced in MAME as a way to quickly implement bank switching, which is a common technique where CPUs with limited address space were able to access more ROM or RAM than they would normally be able to. Usually this took the form of a control register on the system which allowed the software to select one of a number of banks. Depending on which bank was selected, different ROM or RAM chips would respond to accesses to a certain memory range.

In an emulator, implementing bank switching like this seems relatively straightforward. You simply keep a list of pointers to the base of each bank, and an "active" pointer to the currently selected bank. Whenever the control register is written, you switch the active pointer to point to the appropriate bank base. And whenever the banked memory range is accessed, you perform the memory access using this pointer. Essentially this is what MAME does.

MAME's memory system defines a total of 32 banks, which you can reference via SMH_BANK(1) .. SMH_BANK(32). You can use memory_configure_bank() to configure the base pointers to each bank, and memory_set_bank() to select one of the pointers you previously set. (You can also use the function memory_set_bankptr() to specify the current active pointer directly, though this is discouraged because it doesn't play well with save states.)

By default, the base pointer for each bank points to the equivalent offset within the memory region that corresponds to the CPU attached to the address map. So, for example, if you specify a memory range of 0x2000-0x3fff and assign that to SMH_BANK(1), and if you assign that memory range to CPU #2, then the default pointer for that bank will point to memory_region(REGION_CPU2) + 0x2000.

It is also very important to recognize that banks are global to the machine. So if you have two CPUs and they each reference BANK1, they will be sharing the same memory. Thus, it is important to ensure that you don't accidentally use the same bank in multiple locations in a machine.

SMH_ROM and SMH_RAM

This brings us to the final two special handlers: RAM and ROM. Although these seem straightforward, they actually are more complex then you might think. Believe it or not, RAM and ROM are actually built on top of the memory banking system described in the previous section. In addition to the 32 explicitly-controlled banks, MAME also internally supports 35 additional "internal" banks. These banks are dynamically assigned to each independent RAM and ROM address range at initialization time. This is mainly a convenience so that you can simply say, "I want RAM here", without worrying about assigning a bank number. But really, under the covers, that's what is happening.

There are a few important side-effects of this implementation detail. First off, note that because the banks are dynamically allocated, you are safe to have multiple ROM and RAM regions in a single address map, as each one will be assigned internally to a different bank.

Second, note that this dynamic bank assignment happens only at initialization time. This is important if you install a new memory handler at runtime — you are not allowed specify SMH_RAM or SMH_ROM for the handlers. If you want to dynamically add RAM or ROM to an address map, your only option is to pick a bank number and specify SMH_BANK(n) when you install the handler; then you must explicitly call memory_set_bankptr() to set the base pointer of that bank (the default pointer does not apply here).

Finally, recognize that even though ROM and RAM are implemented as banks, you are not permitted to change the base pointer. They are intended to be treated truly as proper ROM and RAM. If you want a bank, you have to explicitly create one.

In the case of ROM, the default bank pointer is the same as it is for explicit banks: the equivalent offset within the memory region that corresponds to the CPU attached to the address map. In the case of RAM, however, the memory is allocated dynamically at initialization time. (There's that phrase again; this is another reason why you cannot dynamically install a RAM handler after initialization time.)

For all this complexity, however, you gain several benefits. Firstly, RAM, ROM and BANK regions are handled internally by the memory system, and don't require an extra function callback, which makes them a bit faster than handler-based address ranges. Second, the contents of RAM regions are automatically registered and saved by the memory system so you don't have to explicitly register them for yourself.

Probably the most important use of RAM/ROM/BANK handlers, however, is that they are treated specially when it comes to executing code from a CPU. Because the memory system manages the memory which backs these handlers, it can tell the CPU cores how to directly fetch the opcodes from that memory without incurring the overhead of calling through multiple function pointers. This greatly speeds up emulation of CPUs.

Read/Write Handlers

Before diving into the details of the address map macros, let's talk about read/write handlers. The purpose of an address map is to describe what MAME should do when memory within a certain range is accessed. In its most simplistic sense, it specifies a set of functions which should be called in response to memory accesses. These are the read/write handlers.

A read handler is a function which accepts an address and perhaps a mask, and returns the value obtained by "reading" memory at that address. Here is a prototype for an 8-bit read handler:

UINT8 my_read_handler(offs_t offset);

Notice a couple of things about this definition. First, I specifically said an "8-bit" read handler, and you can see that the function returns a UINT8. This means that yes, there are 4 different handler function types, one each for 8, 16, 32, and 64-bit memory accesses. Regardless of the size of data returned, however, all the functions take an offset of type "offs_t", which today is 32 bits (though in the future we may expand it to 64). Also note that it is called an "offset", not an "address". This is because the memory system in MAME always subtracts the beginning address of a memory range from the raw address before passing it into the read/write handlers. This means that the offset parameter is always the offset relative to the starting address of the range.

Similarly, a write handler is a function which accepts an address, a value, and perhaps a mask, and "writes" memory at that address.