Dynamic Recompiler Author's Guide

From MAMEDEV Wiki

This is intended to be an introduction to how to write a dynamic recompiler using the universal architecture in MAME. I suspect it will eventually need to be broken into multiple separate pieces, but we'll keep it all together to start with.

Logic Flow

The logic flow of an interpreter is pretty easy to understand, because it works like a CPU does:

mycpu_execute_interpreted(numcycles)
{
   do
   {
      opcode = fetch_opcode();
      execute_opcode(opcode);
      numcycles -= opcode_num_cycles;
   }
   while (numcycles >= 0);
}

Essentially, the core execution of the interpreter consists of sitting in a loop, fetching a single opcode at a time, executing it, and counting up cycles until it is time to move on to something else in the emulation. The bulk of the code in an interpreter consists of the implementations of the individual opcodes, and must be relatively well-tuned because it is executed quite frequently.

In contrast, the logic flow of a dynamic recompiler is considerably less analagous to a CPU:

mycpu_execute_recompiled(numcycles)
{
   cpustate.numcycles = numcycles;
   do
   {
      return_code = (*recompiler_entry_point)();
      if (return_code == EXIT_MISSING_CODE)
         recompile_code_at_current_pc();
      else if (return_code == EXIT_FLUSH_CACHE)
         code_cache_flush();
   }
   while (return_code != EXIT_OUT_OF_CYCLES);
}

This may seem kind of odd at first, but let's step through the high-level flow to understand what is happening.

The first thing we do is stash the number of cycles to execute somewhere where it will be accessible to the generated code. In this case we, store it in the CPU state. Then we begin our execution loop.

Now, in an ideal world, all the code we would ever want to execute has already been translated to the target architecture, and so to execute it, we simply jump into the translated code and begin execution. If this is in fact the case, then the generated code will execute for the number of cycles we stashed in the CPU state. When it is finished, it will return to the caller with a return code of EXIT_OUT_OF_CYCLES, indicating that all the cycles have been exhausted.

Of course, this is a dynamic recompiler. The word dynamic here refers to the fact that we do not translate all the code up front, but rather perform translation on the fly, as new code is encountered. This means that we will sometimes attempt to execute code that hasn't been translated yet. When this occurs, the translated code will return to the caller with a special return code of EXIT_MISSING_CODE, meaning that there is no valid translated code for the current PC. In response to receiving this return code, the recompiler must translate the new code into the code cache, and then call back again to the translated code to allow it to continue processing.

It is also possible that during execution, a substantial change is made to the CPU's state, such that all previously translated code is immediately invalidated. In this situation, the translated code returns EXIT_FLUSH_CACHE, in response to which the recompiler will flush the cache. When finished, we loop back around and attempt to execute the translated code again. Of course, since we just flushed the cache, the expectation is that we will immediately return with an EXIT_MISSING_CODE so that we are forced to translate from scratch the code that was previously executing.

Setup and Initialization

Compile-time versus run-time

Front-end Analysis

Static Subroutines

Common Generation Functions

General Opcode Behaviors

Specific Opcode Behaviors