CPU Scheduling in MAME

From MAMEDEV Wiki

Multi-CPU games in MAME are scheduled in a round-robin fashion. The order of the round robin execution is strictly defined by the order of the CPUs in the machine driver. There is no way to alter this order; however, you can affect the scheduling by suspending CPUs or adjusting the granularity of the scheduling.

The scheduler relies on the timer system, which knows when the next timer is scheduled. All scheduling happens between firings of timers, similarly, timers are never fired while a CPU is executing. This important to keep in mind.

The scheduler queries the timer system to find out when the next timer is set to fire. It then loops over each CPU, computes how many cycles that CPU needs to execute to reach that time, and runs the CPU for that many cycles. When the CPU is finished executing, the CPU core returns how many cycles it actually executed. This information is accumulated and converted back into a "local CPU time", in order to account for overshooting or early exiting from the CPU core.

For example....

Let's say that CPU #0 is running at 14MHz, and CPU #1 is running at 2MHz. Let's also say that we're starting at time 0 (local CPU time for both CPUs is 0), and a timer is scheduled to go off in 150 microseconds (time = 0.000150).

The round robin logic will start with CPU #0 and compute how many cycles it needs to execute to reach 0.000150. Since we're starting from time 0, we need to execute for at least 150usec. 0.000150 * 14,000,000 = 2100 cycles. It then calls that CPU's execute function with 2100 cycles; when the execute function returns, it specifies how many cycles it actually ran. Let's say it returns saying that it ran 2112 cycles. (CPU cores generally overshoot because many instructions take more than 1 cycle each to execute.) 2112 cycles puts the local CPU time for CPU #0 at 0.000150857 (2112 / 14,000,000).

Now it's time for CPU #1 to execute. 0.000150 * 2,000,000 = 300 cycles. So we call execute(300), and get back 300 cycles. CPU #1 local time is now 0.000150.

At this point, both CPUs have executed, and both their local times are greater than or equal to the target time 0.000150. So the scheduler calls the timer system to let it process the timers. When finished, it again asks when the next timer will fire. Let's say it's set to fire exactly 150usec later at time = 0.000300.

Back to the scheduler, we start the round robin over again. CPU #0 needs to execute (0.000300 - 0.000150857) * 14,000,000 = 2088 cycles to reach a local time of 0.300. Note that we took into account the extra cycles that we executed last time. So we call execute(2088), and we get back, say, 2091. That puts our local time at 0.000150857 + 0.000149357 = 0.000300214.

Now it's CPU #1's turn. (0.000300 - 0.000150) * 2,000,000 = 300 cycles again. Calling execute(300), we get back 302 cycles. This puts CPU #1's local time at 0.000150 + 0.000151 = 0.000301.

Again, both CPUs have executed, both their local times are greater than or equal to 0.000300, so we contact the timer system to let it run its timers. This procedure continues throughout the execution of the system.

Some things to note here, firstly, after the first round robin, CPU #0's local time was slightly ahead of CPU #1's local time. After the second pass, the opposite was true. Thus, you can't guarantee at any time that any given CPU is ahead of or behind the others.

Also, be sure to keep in mind that each CPU has its own local time. The timer system also has a "global time". The global time is generally the minimum of all the CPU local times. Which time is used when calling the timer system depends entirely upon which CPU context is currently active. If a CPU context is active (generally true only while a CPU is executing; for example, in read/write callbacks), then all timer operations treat the "current time" as the CPU local time, accounting for all cycles that have been executed on that CPU so far in the current timeslice. If a CPU context is not active (all other times; for example, in timer callbacks), then the "current time" is the global time.

The global time is updated at the end of each timeslice before the scheduler calls into the timer system to let it run its timers, and that global time is used to dispatch the timers.