UML Architecture: Difference between revisions

From MAMEDEV Wiki
No edit summary
 
Line 59: Line 59:
Code labels can be branched to via the '''JMP''' instruction, either unconditionally or via one of 16 conditions.
Code labels can be branched to via the '''JMP''' instruction, either unconditionally or via one of 16 conditions.


== Opcodes ==
== Opcode Conventions ==


Below is an exhaustive list of opcodes and their defined behaviors. Before diving into the details, however, there are some general principles and conventions which must be understood:
Before diving into the details of the opcodes, it is important to understand some general principles and conventions:


'''Integer Registers'''. There are 10 integer registers, each 64-bits wide. The same registers are used for both 32-bit and 64-bit opcodes; however, unlike many real computer architectures, the upper 32 bits are fully undefined when a 32-bit operation is performed. This means you cannot load a 64-bit value, perform a 32-bit operation, and expect the upper 32 bits to be anything in particular when you are finished.
'''Integer Registers'''. There are 10 integer registers, each 64-bits wide. The same registers are used for both 32-bit and 64-bit opcodes; however, unlike many real computer architectures, the upper 32 bits are fully undefined when a 32-bit operation is performed. This means you cannot load a 64-bit value, perform a 32-bit operation, and expect the upper 32 bits to be anything in particular when you are finished.
Line 93: Line 93:
'''Conditions'''. Control flow instructions and simple data move instructions support an optional condition, which allows behavior to occur based on the state of the flags. Each flag can be checked for on/off independently. In addition, the usual collection of G/GE/L/LE/A/AE/B/BE conditions are available.
'''Conditions'''. Control flow instructions and simple data move instructions support an optional condition, which allows behavior to occur based on the state of the flags. Each flag can be checked for on/off independently. In addition, the usual collection of G/GE/L/LE/A/AE/B/BE conditions are available.


== Control Flow Opcodes ==
== Opcodes ==
 
=== CALLC ===
 
'''Usage:'''
CALLC  ''function'',''parameter''[,''condition'']
 
'''Codegen Shorthand:'''
UML_CALLC(''block'', ''function'', ''parameter'');
UML_CALLCc(''block'', ''condition'', ''function'', ''parameter'');
 
'''Parameters:'''
* ''function'' &mdash; a memory pointer to a C function of the form <code>void cfunction(void *parameter)</code>
* ''parameter'' &mdash; a memory pointer that is passed as the argument to the C function
* ''condition'' &mdash; an optional condition which is used to determine whether or not to execute the call
 
'''Flags:''' undefined
 
'''Description:''' The '''CALLC''' opcode is used to execute a C ''function'' from within the generated code. A single ''parameter'' (evaluated at compile time only) may be passed to the function. More data can be effectively passed by using the parameter to point to a communication block. Upon return from the C function, all registers retain their original values; however, all flags are left in an undefined state.
 
Like most control flow opcodes, '''CALLC''' can be used either unconditionally or flagged with a condition that controls whether or not the target function is called.
 
'''Example:'''
void cfunc_printf_string(void *parameter)
{
    printf("%s", parameter);
}
void generate_call_to_print_string(drcuml_block *block)
{
    UML_CALLC(block, cfunc_printf_string, "My test string");
}
 
----
 
=== CALLH ===
 
'''Usage:'''
CALLH  ''handle''[,''condition'']
 
'''Codegen Shorthand:'''
UML_CALLH(''block'', ''handle'');
UML_CALLHc(''block'', ''condition'', ''handle'');
 
'''Parameters:'''
* ''handle'' &mdash; a memory pointer to a previously-allocated code handle
* ''condition'' &mdash; an optional condition which is used to determine whether or not to execute the call
 
'''Flags:''' undefined
 
'''Description:''' The '''CALLH''' function makes a subroutine call to the code referenced by the given ''handle''. Note that although the handle must have already been allocated, it is permitted to pass a handle which has not yet been populated by code. The back-end code generator must support both cases, though it will generally output less efficient code if the handle is not yet populated (since it must fetch the address at runtime from the handle).
 
Like most control flow opcodes, '''CALLH''' can be used either unconditionally or flagged with a condition that controls whether or not the target function is called.
 
'''Example:'''
drcuml_codehandle *mysubroutine;
void generate_subroutine_code(drcuml_state *drcuml)
{
    drcuml_block *block;
    jmp_buf errorbuf;
    /* allocate a handle */
    mysubroutine = drcuml_handle_alloc(drcuml, "my_subroutine_name");
    /* handle a fatal error during codegen */
    if (setjmp(errorbuf) != 0)
        fatalerror("Ran out of cache space generating subroutine!");
 
    /* generate a simple subroutine */
    block = drcuml_block_begin(drcuml, 10, &errorbuf);
    UML_HANDLE(block, mysubroutine);
    UML_RET(block);
    /* complete codegen */
    drcuml_block_end(block);
}
void generate_call_to_subroutine(drcuml_block *block)
{
    UML_CALLH(block, mysubroutine);
}
 
----
 
=== COMMENT ===
 
'''Usage:'''
COMMENT  ''string''
 
'''Codegen Shorthand:'''
UML_COMMENT(''block'', ''string'');
 
'''Parameters:'''
* ''string'' &mdash; a memory pointer to character string
 
'''Flags:''' unaffected
 
'''Description:''' The '''COMMENT''' opcode exists to provide a means of documenting the UML code. Comments are generally ignored completely by the back-end and typically generate no code. However, they do show up in the disassembly of UML code, and may optionally propogate to disassembly of generated code as well.
 
'''Example:'''
void generate_comment(drcuml_block *block)
{
    UML_COMMENT(block, "This is a comment for disassembly");
}
 
----
 
=== DEBUG ===
 
'''Usage:'''
DEBUG  ''pc''
 
'''Codegen Shorthand:'''
UML_DEBUG(''block'', PTYPE(''pc''));
 
'''Parameters:'''
* ''pc'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
 
'''Flags:''' undefined
 
'''Description:''' The '''DEBUG''' opcode indicates that, if the debugger is present, it should be updated with the provided ''pc'' as the current instruction. To properly support the built-in debugger, this opcode should be present before the execution of each instruction. Before issuing a '''DEBUG''' opcode, be sure to flush any cached state (including the PC) to its target memory location so that the debugger can properly examine it or modify it. Upon completion of this opcode, all registers retain their original values; however, all flags are left in an undefined state.
 
'''Example:'''
offs_t currentpc;
void generate_call_to_debugger(drcuml_block *block)
{
    UML_DEBUG(block, MEM(&currentpc));
}
 
----
 
=== EXH ===
 
'''Usage:'''
EXH    ''handle'',''parameter''[,''cond'']
 
'''Codegen Shorthand:'''
UML_EXH(''block'', ''handle'', PTYPE(''parameter''));
UML_EXHc(''block'', ''condition'', ''handle'', PTYPE(''parameter''));
 
'''Parameters:'''
* ''handle'' &mdash; a memory pointer to a previously-allocated code handle
* ''parameter'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''condition'' &mdash; an optional condition which is used to determine whether or not to generate the exception
 
'''Flags:''' undefined
 
'''Description:''' The '''EXH''' function generates an exception, which is effectively a subroutine call with a special parameter. The code to call is referenced by the given ''handle''. As with the '''CALLH''' opcode, the handle must have already been allocated, though again it is permitted to pass a handle which has not yet been populated by code. The back-end code generator must support both cases, though it will generally output less efficient code if the handle is not yet populated (since it must fetch the address at runtime from the handle). The ''parameter'' is stored in a special internal register called EXP and can be retrieved via the '''GETEXP''' opcode.
 
Like most control flow opcodes, '''EXH''' can be used either unconditionally or flagged with a condition that controls whether or not the exception is generated.
 
'''Example:'''
drcuml_codehandle *myexceptionhandler;
void generate_exception_handler(drcuml_state *drcuml)
{
    drcuml_block *block;
    jmp_buf errorbuf;
    /* allocate a handle */
    myexceptionhandler = drcuml_handle_alloc(drcuml, "handle_exception");
    /* handle a fatal error during codegen */
    if (setjmp(errorbuf) != 0)
        fatalerror("Ran out of cache space generating exception handler!");
 
    /* generate a simple exception handler */
    block = drcuml_block_begin(drcuml, 10, &errorbuf);
    UML_HANDLE(block, myexceptionhandler);
    UML_RET(block);
    /* complete codegen */
    drcuml_block_end(block);
}
void generate_exception_if_non_zero(drcuml_block *block, int parameter)
{
    UML_EXHc(block, IF_NZ, myexceptionhandler, IMM(parameter));
}
 
----
 
=== EXIT ===
 
'''Usage:'''
EXIT    ''parameter''[,''condition'']
 
'''Codegen Shorthand:'''
UML_EXIT(''block'', PTYPE(''parameter''));
UML_EXITc(''block'', ''condition'', PTYPE(''parameter''));
 
'''Parameters:'''
* ''parameter'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''condition'' &mdash; an optional condition which is used to determine whether or not to exit
 
'''Flags:''' undefined
 
'''Description:''' The '''EXIT''' opcode immediately exits from the generated code and returns control back to the back-end, which ultimately returns control to the dynamic recompiler. The exit can be performed even from within a subroutine or exception handler. The ''parameter'' is surfaced as the return value of the execute function.
 
Like most control flow opcodes, '''EXIT''' can be used either unconditionally or flagged with a condition that controls whether or not the exit occurs.
 
'''Example:'''
void exit_with_return_code_in_i0(drcuml_block *block)
{
    UML_EXIT(block, IREG(0));
}
 
----
 
=== HANDLE ===
 
'''Usage:'''
HANDLE  ''handle''
 
'''Codegen Shorthand:'''
UML_HANDLE(''block'', ''handle'');
 
'''Parameters:'''
* ''handle'' &mdash; a memory pointer to previously-allocated handle
 
'''Flags:''' unaffected
 
'''Description:''' The '''HANDLE''' opcode connects the current code position to the specified ''handle''. The ''handle'' must have been previously explicitly allocated. By definition only one piece of code can connect itself to a handle; attempts to attach a new piece of code to an existing handle are considered errors and will assert in debug builds.
 
'''Example:'''
void attach_code_to_handle(drcuml_block *block, drcuml_codehandle *handle)
{
    UML_HANDLE(block, handle);
}
 
----
 
=== HASH ===
 
'''Usage:'''
HASH    ''mode'', ''pc''
 
'''Codegen Shorthand:'''
UML_HASH(''block'', ''mode'', ''pc'');
 
'''Parameters:'''
* ''mode'' &mdash; a 32-bit immediate value
* ''pc'' &mdash; a 32-bit immediate value
 
'''Flags:''' unaffected
 
'''Description:''' The '''HASH''' opcode connects the current code position to the back-end's hash table. The back-end is free to implement whatever code lookup mechanism it desires, though a standard one is provided by the back-end utilities module. The available number of modes is specified at dynamic recompiler initialization time; it is an error to specify ''mode'' value beyond the number of modes requested at that time.
 
Note that unlike a handle, a hash for a given ''mode''/''pc'' combination can be replaced at any time.
 
'''Example:'''
void attach_code_to_mode_and_pc(drcuml_block *block, UINT32 mode, UINT32 pc)
{
    UML_HASH(block, mode, pc);
}
 
----
 
=== HASHJMP ===
 
'''Usage:'''
HASHJMP ''mode'',''pc'',''handle''
 
'''Codegen Shorthand:'''
UML_HASHJMP(''block'', PTYPE(''mode''), PTYPE(''pc''), ''handle'');
 
'''Parameters:'''
* ''mode'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''pc'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''handle'' &mdash; memory pointer to allocated handle
 
'''Flags:''' undefined
 
'''Description:''' The '''HASHJMP''' opcode transfers control to a block of code that has previously been connected to the provided ''mode''/''pc'' pair. If no code has been registered, an exception is generated by setting the exception parameter EXP to ''pc'' and calling the given ''handle''. The expected response to this is to return to the dynamic recompiler and request that it generate new code for the provided ''mode''/''pc'' pair.
 
'''Example:'''
drcuml_codehandle *nocode_exception_handler;
UINT32 current_mode;
void jump_to_code_at_pc_in_current_mode(drcuml_block *block, UINT32 pc)
{
    UML_HASHJMP(block, MEM(&currentmode), IMM(pc), nocode_exception_handler);
}
 
----
 
=== JMP ===
 
'''Usage:'''
JMP    ''label''[,''cond'']
 
'''Codegen Shorthand:'''
UML_JMP(''block'', ''label'');
UML_JMPc(''block'', ''condition'', ''label'');
 
'''Parameters:'''
* ''label'' &mdash; a 32-bit immediate value
* ''condition'' &mdash; an optional condition which is used to determine whether or not to jump
 
'''Flags:''' unaffected
 
'''Description:''' The '''JMP''' opcode transfers control to another location within the current block. The ''label'' parameter specifies a 32-bit integer label that serves as the jump target. Within the block, a '''LABEL''' opcode must be present to signify what code the ''label'' refers to. In debug builds, attempts to jump to a non-existent label will assert at back-end code generation time (when all the labels are resolved).
 
Like most control flow opcodes, '''JMP''' can be used either unconditionally or flagged with a condition that controls whether or not the jump occurs.
 
'''Example:'''
void conditional_jump_around_code(drcuml_block *block)
{
    drcuml_codelabel curlabel = 1;
    drcuml_codelabel skip;
    UML_JMPc(block, IF_NZ, skip = curlabel++);
    ''<other code here>''
    UML_LABEL(skip);
}
 
----
 
=== LABEL ===
 
'''Usage:'''
LABEL  ''label''
 
'''Codegen Shorthand:'''
UML_LABEL(''block'', ''label'');
 
'''Parameters:'''
* ''label'' &mdash; 32-bit immediate value
 
'''Flags:''' unaffected
 
'''Description:''' The '''LABEL''' opcode connects the current code position to a block-local label with the value of ''label''. Only one label per block of UML can claim the same ''label'' value, so the dynamic recompiler must set up some conventions to ensure that each label has a unique 32-bit identifier.
 
'''Example:'''
void attach_code_to_label(drcuml_block *block, drcuml_codelabel label)
{
    UML_LABEL(block, label);
}
 
----
 
=== MAPVAR ===
 
'''Usage:'''
MAPVAR  ''mapvar'',''value''
 
'''Codegen Shorthand:'''
UML_MAPVAR(''block'', MVAR(''mapvar''), ''value'');
 
'''Parameters:'''
* ''mapvar'' &mdash; a map variable
* ''value'' &mdash; a 32-bit immediate value
 
'''Flags:''' unaffected
 
'''Description:''' The '''MAPVAR''' opcode sets the specified map variable ''mapvar'' to a new value ''value'' starting at the current position within the code. Map variables are encoded into the instruction stream and can be retrieved from within a subroutine or exception handler via the '''RECOVER''' opcode. At the start of a block, all map variables default to 0, and hold that value until a '''MAPVAR''' opcode overrides it, at which point all subsequent code will take on the new ''value''.
 
Note that because the '''RECOVER''' opcode always operates on the outermost return address, it does not make sense to use '''MAPVAR''' within a subroutine.
 
'''Example:'''
void generate_some_code(drcuml_block *block)
{
    UML_MAPVAR(block, MVAR(0), 1);          /* we are in section 1 of the code */
    ''<section 1 code here>''
    UML_MAPVAR(block, MVAR(0), 2);          /* now we are in section 2 of the code */
    ''<section 2 code here>''
}
 
----
 
=== RECOVER ===
 
'''Usage:'''
RECOVER ''dest'',''mapvar''
 
'''Codegen Shorthand:'''
UML_RECOVER(''block'', PTYPE(''dest''), MVAR(''mapvar''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32-bit integer register or memory location
* ''mapvar'' &mdash; a map variable
 
'''Flags:''' undefined
 
'''Description:''' The '''RECOVER''' opcode retrieves the value of the given ''mapvar'' and stores it in ''dest''. It does this by fetching the outermost return address from the stack and using that value to find the value specified by the most recent '''MAPVAR''' opcode that preceded the point where the subroutine call or exception generation happened. Because of this behavior, it only makes sense to use '''RECOVER''' from within a subroutine or exception handler; using it outside of this context will produce undefined results.
 
'''Example:'''
void generate_recover_code_section_to_i1(drcuml_block *block)
{
    UML_RECOVER(block, IREG(0), MVAR(0));
}
 
----
 
=== RET ===
 
'''Usage:'''
RET    [''cond'']
 
'''Codegen Shorthand:'''
UML_RET(''block'');
UML_RETc(''block'', ''condition'');
 
'''Parameters:'''
* ''condition'' &mdash; an optional condition which is used to determine whether or not to return
 
'''Flags:''' undefined
 
'''Description:''' Returns control from a subroutine call or exception handler to the instruction following the CALLH or EXH opcode.
 
Like most control flow opcodes, '''RET''' can be used either unconditionally or flagged with a condition that controls whether or not the return occurs.
 
'''Example:'''
void generate_return_from_function_if_carry(drcuml_block *block)
{
    UML_RET(block, IF_C);
}
 
== Internal Register Opcodes ==
 
=== GETEXP ===
 
'''Usage:'''
GETEXP  ''dest''
 
'''Codegen Shorthand:'''
UML_GETEXP(block, PTYPE(''dest''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32-bit integer register or memory location
 
'''Flags:''' undefined
 
'''Description:''' The '''GETEXP''' opcode fetches the current value of the internal EXP (exception parameter) register. The EXP is set equal to the parameter value that was specified by the most recently executed '''EXH''' opcode, or is set equal to the ''pc'' parameter from a '''HASHJMP''' opcode that failed to find any associated code.
 
'''Example:'''
void generate_get_exception_parameter_in_i7(drcuml_block *block)
{
    UML_GETEXP(block, IREG(7));
}
 
----
 
=== GETFLGS ===
 
'''Usage:'''
GETFLGS ''dest'',''mask''
 
'''Codegen Shorthand:'''
UML_GETFLGS(block, PTYPE(''dest''), ''mask'');
 
'''Parameters:'''
* ''dest'' &mdash; a 32-bit integer register or memory location
* ''mask'' &mdash; an immediate mask of the flags to be retrieved
 
'''Flags:''' undefined
 
'''Description:''' The '''GETFLGS''' opcode retrieves the current value of the UML flags and stores them in the target destination. Although all of the flags are available, it is rare that all flags are required, and often more efficient if the back-end only needs to fetch a subset of the flags. The ''mask'' parameter makes it possible to specify exactly which flags are needed. Bits in ''dest'' representing unrequested flags will be set to zero.
 
'''Example:'''
void generate_get_sign_and_zero_flags_in_i0(drcuml_block *block)
{
    UML_GETFLGS(block, IREG(0), DRCUML_FLAG_S | DRCUML_FLAG_Z);
}
 
----
 
=== GETFMOD ===
 
'''Usage:'''
GETFMOD ''dest''
 
'''Codegen Shorthand:'''
UML_GETFMOD(block, PTYPE(''mode''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32-bit integer register or memory location
 
'''Flags:''' undefined
 
'''Description:''' The '''GETFMOD''' opcode retrieves the current floating point rounding mode. It will produce one of the following values:
* DRCUML_FMOD_TRUNC (0) means truncate, or round toward zero
* DRCUML_FMOD_ROUND (1) means round to nearest
* DRCUML_FMOD_CEIL (2) means round toward positive infinity
* DRCUML_FMOD_FLOOR (3) means round toward negative infinity
 
'''Example:'''
UINT32 saved_mode;
void generate_save_rounding_mode(drcuml_block *block)
{
    UML_GETFMOD(block, MEM(&saved_mode));
}
 
----
 
=== RESTORE ===
 
'''Usage:'''
RESTORE ''source''
 
'''Codegen Shorthand:'''
UML_RESTORE(block, ''source'');
 
'''Parameters:'''
* ''source'' &mdash; a memory pointer to a <code>drcuml_machine_state</code> structure
 
'''Flags:'''
* C &mdash; set to the value provided in ''source''
* V &mdash; set to the value provided in ''source''
* Z &mdash; set to the value provided in ''source''
* S &mdash; set to the value provided in ''source''
* U &mdash; set to the value provided in ''source''
 
'''Description:''' The '''RESTORE''' opcode copies the provided <code>drcuml_machine_state</code> structure into the live UML machine state.
 
'''Example:'''
void generate_restore_machine_state(drcuml_block *block, drcuml_machine_state *state)
{
    UML_RESTORE(block, state);
}
 
----
 
=== SAVE ===
 
'''Usage:'''
SAVE    ''dest''
 
'''Codegen Shorthand:'''
UML_SAVE(block, ''dest'');
 
'''Parameters:'''
* ''dest'' &mdash; a memory pointer to a <code>drcuml_machine_state</code> structure
 
'''Flags:''' undefined
 
'''Description:''' The '''SAVE''' opcode dumps the current UML machine state to the provided <code>drcuml_machine_state</code> structure. This state may be used for debugging or compliance analysis.
 
'''Example:'''
static drcuml_machine_state state;
void generate_save_machine_state(drcuml_block *block)
{
    UML_SAVE(block, &state);
}
 
----
 
=== SETFMOD ===
 
'''Usage:'''
SETFMOD ''mode''
 
'''Codegen Shorthand:'''
UML_SETFMOD(block, PTYPE(''mode''));
 
'''Parameters:'''
* ''mode'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
 
'''Flags:''' undefined
 
'''Description:''' The '''SETFMOD''' opcode sets the currently active floating point rounding mode, which is implicitly used when performing most floating point operations (apart from those which explicitly specify a mode). The mode can be one of four values:
* DRCUML_FMOD_TRUNC (0) means truncate, or round toward zero
* DRCUML_FMOD_ROUND (1) means round to nearest
* DRCUML_FMOD_CEIL (2) means round toward positive infinity
* DRCUML_FMOD_FLOOR (3) means round toward negative infinity
Only the two least significant bits of the ''mode'' parameter are considered; all other bits are ignored. Note that the floating point mode is forgotten once an '''EXIT''' opcode is executed. A dynamic recompiler that relies on the rounding mode must reset it in its entry point via this opcode.
 
'''Example:'''
void generate_set_fixed_rounding_mode(drcuml_block *block, int mode)
{
    UML_SETFMOD(block, IMM(mode));
}
 
== Integer Operations ==
 
=== LOAD ===
 
'''Usage:'''
LOAD    ''dest'',''base'',''index'',''size''
DLOAD  ''dest'',''base'',''index'',''size''
 
'''Codegen Shorthand:'''
UML_LOAD(block, PTYPE(''dest''), ''base'', PTYPE(''index''), BYTE | WORD | DWORD);
UML_DLOAD(block, PTYPE(''dest''), ''base'', PTYPE(''index''), BYTE | WORD | DWORD | QWORD);
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''base'' &mdash; a memory pointer to the base of the table to read from
* ''index'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''size'' &mdash; the size of the memory to access; can be 1, 2, or 4 (or 8 for the 64-bit form)
 
'''Flags:''' undefined
 
'''Description:''' The '''LOAD''' opcode performs a table-lookup memory read to a 32-bit destination; the '''DLOAD''' opcode does the same to a 64-bit destination. If the ''size'' specified is smaller than the destination, the result is zero-extended before being stored.
 
Unlike a standard memory location parameter (which must reside in the near cache), the ''base'' parameter may point anywhere in memory. Furthermore, the ''index'' parameter is truly an index and not a byte offset; thus the final address read will be ''base'' + (''size'' x ''index'').
 
'''Example:'''
static const UINT16 lookup_table[] = { 0, 1, 4, 5, 9, 10 };
void generate_lookup_index_i2_to_i0(drcuml_block *block)
{
    UML_LOAD(block, IREG(0), lookup_table, IREG(2), WORD);
}
 
----
 
=== LOADS ===
 
'''Usage:'''
LOADS  ''dest'',''base'',''index'',''size''
DLOADS  ''dest'',''base'',''index'',''size''
 
'''Codegen Shorthand:'''
UML_LOADS(block, PTYPE(''dest''), ''base'', PTYPE(''index''), BYTE | WORD | DWORD);
UML_DLOADS(block, PTYPE(''dest''), ''base'', PTYPE(''index''), BYTE | WORD | DWORD | QWORD);
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''base'' &mdash; a memory pointer to the base of the table to read from
* ''index'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''size'' &mdash; the size of the memory to access; can be 1, 2, or 4 (or 8 for the 64-bit form)
 
'''Flags:''' undefined
 
'''Description:''' The '''LOADS''' opcode performs a table-lookup memory read to a 32-bit destination; the '''DLOADS''' opcod does the same to a 64-bit destination. If the ''size'' specified is smaller than the destination, the result is sign-extended before being stored.
 
Unlike a standard memory location parameter (which must reside in the near cache), the ''base'' parameter may point anywhere in memory. Furthermore, the ''index'' parameter is truly an index and not a byte offset; thus the final address read will be ''base'' + (''size'' x ''index'').
 
'''Example:'''
static const INT8 my_signed_byte_array[100];
void generate_read_byte_from_array_index_i2(drcuml_block *block)
{
    UML_LOADS(block, IREG(0), my_signed_byte_array, IREG(2), BYTE);
}
 
----
 
=== STORE ===
 
'''Usage:'''
STORE  ''base'',''index'',''source'',''size''
DSTORE  ''base'',''index'',''source'',''size''
 
'''Codegen Shorthand:'''
UML_STORE(block, ''base'', PTYPE(''index''), PTYPE(''source''), BYTE | WORD | DWORD);
UML_DSTORE(block, ''base'', PTYPE(''index''), PTYPE(''source''), BYTE | WORD | DWORD | QWORD);
 
'''Parameters:'''
* ''base'' &mdash; a memory pointer to the base of the table to store to
* ''index'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''size'' &mdash; the size of the memory to access; can be 1, 2, or 4 (or 8 for the 64-bit form)
 
'''Flags:''' undefined
 
'''Description:''' The '''STORE''' opcode performs a table-lookup memory write from a 32-bit source; the '''DSTORE''' opcode does the same from a 64-bit source. Note that because the low 32 bits are the same regardless of whether the source is 32-bit or 64-bit, STORE and DSTORE are identical when ''size'' is 1, 2, or 4.
 
Unlike a standard memory location parameter (which must reside in the near cache), the ''base'' parameter may point anywhere in memory. Furthermore, the ''index'' parameter is truly an index and not a byte offset; thus the final address read will be ''base'' + (''size'' x ''index'').
 
'''Example:'''
static const INT32 register_aray[32];
void generate_store_i0_to_register_array_index_i9(drcuml_block *block)
{
    UML_STORE(block, register_array, IREG(9), IREG(0), DWORD);
}
 
----
 
=== READ ===
 
'''Usage:'''
READ    ''dest'',''address'',''space-size''
DREAD  ''dest'',''address'',''space-size''
 
'''Codegen Shorthand:'''
UML_READ(block, PTYPE(''dest''), PTYPE(''address''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD));
UML_DREAD(block, PTYPE(''dest''), PTYPE(''address''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD | QWORD));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''address'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''space-size'' &mdash; the address space and size of the memory to access; can be one of these values:
** PROGRAM_BYTE/DATA_BYTE/IO_BYTE &mdash; a 1-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_WORD/DATA_WORD/IO_WORD &mdash; a 2-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_DWORD/DATA_DWORD/IO_DWORD &mdash; a 4-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_QWORD/DATA_QWORD/IO_QWORD &mdash; an 8-byte value from the PROGRAM/DATA/IO address space
 
'''Flags:''' undefined
 
'''Description:''' The '''READ''' opcode performs a read from the emulated CPU's memory system to a 32-bit destination; the '''DREAD''' opcode does the same to a 64-bit destination. If the number of bytes specified by the ''space-size'' parameter is smaller than the destination, the result is zero-extended before being stored.
 
Note that even in its 64-bit form, the ''DREAD'' opcode still takes a fixed 32-bit size parameter for the ''address''.
 
'''Example:'''
void generate_load_qword_from_program_space_address_i0(drcuml_block *block)
{
    UML_DLOAD(block, IREG(0), IREG(0), PROGRAM_QWORD);
}
 
----
 
=== READS ===
 
'''Usage:'''
READS  ''dest'',''address'',''space-size''
DREADS  ''dest'',''address'',''space-size''
 
'''Codegen Shorthand:'''
UML_READS(block, PTYPE(''dest''), PTYPE(''address''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD));
UML_DREADS(block, PTYPE(''dest''), PTYPE(''address''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD | QWORD));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''address'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''space-size'' &mdash; the address space and size of the memory to access; can be one of these values:
** PROGRAM_BYTE/DATA_BYTE/IO_BYTE &mdash; a 1-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_WORD/DATA_WORD/IO_WORD &mdash; a 2-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_DWORD/DATA_DWORD/IO_DWORD &mdash; a 4-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_QWORD/DATA_QWORD/IO_QWORD &mdash; an 8-byte value from the PROGRAM/DATA/IO address space
 
'''Flags:''' undefined
 
'''Description:''' The '''READS''' opcode performs a read from the emulated CPU's memory system to a 32-bit destination; the '''DREADS''' opcode does the same to a 64-bit destination. If the number of bytes specified by the ''space-size'' parameter is smaller than the destination, the result is sign-extended before being stored.
 
Note that even in its 64-bit form, the ''DREADS'' opcode still takes a fixed 32-bit size parameter for the ''address''.
 
'''Example:'''
void generate_load_signed_word_from_data_space_address_i0(drcuml_block *block)
{
    UML_LOADS(block, IREG(0), IREG(0), DATA_WORD);
}
 
----
 
=== READM ===
 
'''Usage:'''
READM  ''dest'',''address'',''mask'',''space-size''
DREADM  ''dest'',''address'',''mask'',''space-size''
 
'''Codegen Shorthand:'''
UML_READ(block, PTYPE(''dest''), PTYPE(''address''), PTYPE(''mask''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD));
UML_DREAD(block, PTYPE(''dest''), PTYPE(''address''), PTYPE(''mask''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD | QWORD));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''address'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''mask'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''space-size'' &mdash; the address space and size of the memory to access; can be one of these values:
** PROGRAM_BYTE/DATA_BYTE/IO_BYTE &mdash; a 1-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_WORD/DATA_WORD/IO_WORD &mdash; a 2-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_DWORD/DATA_DWORD/IO_DWORD &mdash; a 4-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_QWORD/DATA_QWORD/IO_QWORD &mdash; an 8-byte value from the PROGRAM/DATA/IO address space
 
'''Flags:''' undefined
 
'''Description:''' The '''READM''' opcode performs a masked read from the emulated CPU's memory system to a 32-bit destination; the '''DREADM''' opcode does the same to a 64-bit destination. These opcodes are similar to the '''READ''' and '''DREAD''' opcodes described above, with the exception that the additional parameter ''mask'' specifies which bytes within the larger access should be referenced. As with '''READ''' and '''DREAD''', these opcodes zero-extend the result if it is smaller than the destination size.
 
'''Example:'''
void generate_load_upper_or_lower_word_from_i0(drcuml_block *block, int upper)
{
    /* big-endian */
    if (upper)
        UML_LOADM(block, IREG(0), IREG(0), IMM(0xffff0000), PROGRAM_DWORD);
    else
        UML_LOADM(block, IREG(0), IREG(0), IMM(0x0000ffff), PROGRAM_DWORD);
}
 
----
 
=== WRITE ===
 
'''Usage:'''
WRITE  ''address'',''source'',''space-size''
DWRITE  ''address'',''source'',''space-size''
 
'''Codegen Shorthand:'''
UML_WRITE(block, PTYPE(''address''), PTYPE(''source''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD));
UML_DWRITE(block, PTYPE(''address''), PTYPE(''source''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD | QWORD));
 
'''Parameters:'''
* ''address'' &mdash; a 32-bit integer register, memory location, map variable, or immediate
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''space-size'' &mdash; the address space and size of the memory to access; can be one of these values:
** PROGRAM_BYTE/DATA_BYTE/IO_BYTE &mdash; a 1-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_WORD/DATA_WORD/IO_WORD &mdash; a 2-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_DWORD/DATA_DWORD/IO_DWORD &mdash; a 4-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_QWORD/DATA_QWORD/IO_QWORD &mdash; an 8-byte value from the PROGRAM/DATA/IO address space
 
'''Flags:''' undefined
 
'''Description:''' The '''WRITE''' opcode performss a write to the emulated CPU's memory system from a 32-bit source; the '''DWRITE''' opcode does the same from a 64-bit source. Note that because the low 32 bits are the same regardless of whether the source is 32-bit or 64-bit, WRITE and DWRITE are identical when ''size'' is 1, 2, or 4.
 
Note that even in its 64-bit form, the ''DWRITE'' opcode still takes a fixed 32-bit size parameter for the ''address''.
 
'''Example:'''
void generate_write_memory_to_byte(drcuml_block *block, UINT32 *memory)
{
    UML_WRITE(block, IREG(0), MEM(memory), PROGRAM_BYTE);
}
 
----
 
=== WRITEM ===
 
'''Usage:'''
WRITEM  ''address'',''source'',''mask'',''space-size''
DWRITEM ''address'',''source'',''mask'',''space-size''
 
'''Codegen Shorthand:'''
UML_WRITEM(block, PTYPE(''address''), PTYPE(''source''), PTYPE(''mask''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD));
UML_DWRITEM(block, PTYPE(''address''), PTYPE(''source''), PTYPE(''mask''), (PROGRAM_ | DATA_ | IO_) ## (BYTE | WORD | DWORD | QWORD));
 
'''Parameters:'''
* ''space'' &mdash; an immediate describing which address space to read from
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''mask'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''space-size'' &mdash; the address space and size of the memory to access; can be one of these values:
** PROGRAM_BYTE/DATA_BYTE/IO_BYTE &mdash; a 1-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_WORD/DATA_WORD/IO_WORD &mdash; a 2-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_DWORD/DATA_DWORD/IO_DWORD &mdash; a 4-byte value from the PROGRAM/DATA/IO address space
** PROGRAM_QWORD/DATA_QWORD/IO_QWORD &mdash; an 8-byte value from the PROGRAM/DATA/IO address space
 
'''Flags:''' undefined
 
'''Description:''' The '''WRITEM''' opcode performs a masked write to the emulated CPU's memory system from a 32-bit source; the '''DWRITEM''' opcode does the same from a 64-bit source. These opcodes are similar to the '''WRITE''' and '''DWRITE''' opcodes described above, with the exception that the additional parameter ''mask'' specifies which bytes within the larger access should be written.
 
'''Example:'''
void generate_write_store_masked_byte(drcuml_block *block, UINT8 byte, UINT32 mask)
{
    UML_WRITEM(block, IREG(0), IMM(byte), IMM(mask), PROGRAM_DWORD);
}
 
----
 
=== CARRY ===
 
'''Usage:'''
CARRY  ''source'',''bitnum''
DCARRY  ''source'',''bitnum''
 
'''Codegen Shorthand:'''
UML_CARRY(block, PTYPE(''source''), PTYPE(''bitnum''));
UML_DCARRY(block, PTYPE(''source''), PTYPE(''bitnum''));
 
'''Parameters:'''
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''bitnum'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set to the value of the specified ''bitnum'' of the ''source'' operand
* V &mdash; undefined
* Z &mdash; undefined
* S &mdash; undefined
* U &mdash; undefined
 
'''Description:''' The '''CARRY''' opcode is used to seed the live carry flag prior to performing an '''ADDC''' or '''SUBC''' operation. The value of the carry flag is set based on the value of the specified ''bitnum'' of the ''source'' operand; all other flags are undefined. For the 32-bit form, only the low 5 bits of ''bitnum'' are considered; for the 64-bit form, only the low 6 bits of ''bitnum'' are considered.
 
'''Example:'''
#define CARRYBIT 10
#define CARRYMASK (1 << CARRYBIT)
void generate_add_with_carry(drcuml_block *block)
{
    UML_CARRY(block, MEM(&flagsregister), IMM(CARRYBIT));
    UML_ADDC(block, IREG(0), IREG(1), IREG(2));
}
 
----
 
=== SET ===
 
'''Usage:'''
SET    ''dest'',''condition''
DSET    ''dest'',''condition''
 
'''Codegen Shorthand:'''
UML_SET(block, ''condition'', PTYPE(''dest''));
UML_DSET(block, ''condition'', PTYPE(''dest''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''condition'' &mdash; the condition whose value determines the result of the SET operation
 
'''Flags:''' undefined
 
'''Description:''' The '''SET''' opcode tests the provided ''condition'' and sets the ''dest'' operand to 1 if the condition is true or 0 if it is false; the '''DSET''' opcode does the same with a 64-bit operand.
 
'''Example:'''
void generate_set_i0_if_i1_equals_0(drcuml_block *block)
{
    UML_CMP(block, IREG(1), IMM(0));
    UML_SET(block, IF_E, IREG(0));
}
 
----
 
=== MOV ===
 
'''Usage:'''
MOV    ''dest'',''source''[,''condition'']
DMOV    ''dest'',''source''[,''condition'']
 
'''Codegen Shorthand:'''
UML_MOV(block, PTYPE(''dest''), PTYPE(''source''));
UML_MOVc(block, ''condition'', PTYPE(''dest''), PTYPE(''source''));
UML_DMOV(block, PTYPE(''dest''), PTYPE(''source''));
UML_DMOVc(block, ''condition'', PTYPE(''dest''), PTYPE(''source''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''condition'' &mdash; an optional condition which is used to determine whether or not to perform the move
 
'''Flags:''' unaffected
 
'''Description:''' The '''MOV''' opcode transfers a 32-bit value from the ''source'' operand to the ''dest'' operand; the '''DMOV''' opcode does the same with 64-bit values. An optional ''condition'' can be provided which makes the move operation dependent on the condition being true. Note that the flags are defined to remain unaffected here; '''MOV''' is one of the very few opcodes that can be reliably used between a flag-changing opcode and a flag-consuming opcode.
 
'''Example:'''
void generate_swap_two_values(drcuml_block *block, UINT64 *val1, UINT64 *val2)
{
    UML_DMOV(block, IREG(0), MEM(val1));
    UML_DMOV(block, MEM(val1), MEM(val2));
    UML_DMOV(block, MEM(val2), IREG(0));
}
 
----
 
=== SEXT ===
 
'''Usage:'''
SEXT    ''dest'',''source'',''size''
DSEXT  ''dest'',''source'',''size''
 
'''Codegen Shorthand:'''
UML_SEXT(block, PTYPE(''dest''), PTYPE(''source''), BYTE | WORD);
UML_DSEXT(block, PTYPE(''dest''), PTYPE(''source''), BYTE | WORD | DWORD);
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''size'' &mdash; the size of the source operand; can be 1 or 2 (or 4 for the 64-bit form)
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''SEXT''' opcode sign-extends an 8-bit or 16-bit ''source'' operand to 32 bits and stores it in the ''dest'' operand; the '''DSEXT''' opcode sign-extends 8-bit, 16-bit, or 32-bit ''source'' operands into a 64-bit ''dest''
 
'''Example:'''
void generate_add_16bit_i0_to_32bit_i1(drcuml_block *block)
{
    UML_SEXT(block, IREG(2), IREG(0));
    UML_ADD(block, IREG(1), IREG(1), IREG(2));
}
 
----
 
=== ROLAND ===
 
'''Usage:'''
ROLAND  ''dest'',''source'',''rotate'',''mask''
DROLAND ''dest'',''source'',''rotate'',''mask''
 
'''Codegen Shorthand:'''
UML_ROLAND(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''rotate''), PTYPE(''mask''));
UML_DROLAND(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''rotate''), PTYPE(''mask''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''rotate'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''mask'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ROLAND''' opcode rotates the 32-bit ''source'' operand left by the number of bits specified in ''rotate'' and then ANDs the result with ''mask''; the '''DROLAND''' opcode performs the same operation on 64-bit operands. For the 32-bit form, only the low 5 bits of ''rotate'' are considered; for the 64-bit form, only the low 6 bits of ''rotate'' are considered.
 
'''Example:'''
void generate_extract_bytes(drcuml_block *block)
{
    /* extract from i0; high byte goes to i1, low byte goes to i4 */
    UML_ROLAND(block, IREG(1), IREG(0), IMM(8), IMM(0xff));
    UML_ROLAND(block, IREG(2), IREG(0), IMM(16), IMM(0xff));
    UML_ROLAND(block, IREG(3), IREG(0), IMM(24), IMM(0xff));
    UML_ROLAND(block, IREG(4), IREG(0), IMM(0), IMM(0xff));
}
 
----
 
=== ROLINS ===
 
'''Usage:'''
ROLINS  ''dest'',''source'',''rotate'',''mask''
DROLINS ''dest'',''source'',''rotate'',''mask''
 
'''Codegen Shorthand:'''
UML_ROLINS(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''rotate''), PTYPE(''mask''));
UML_DROLINS(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''rotate''), PTYPE(''mask''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''rotate'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''mask'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ROLINS''' opcode rotates the 32-bit ''source'' operand left by the number of bits specified in ''rotate'' and then inserts a subset of bits into ''dest'' under control of the ''mask''; the '''DROLINS''' opcode performs the same operation on 64-bit operands. For the 32-bit form, only the low 5 bits of ''rotate'' are considered; for the 64-bit form, only the low 6 bits of ''rotate'' are considered.
 
'''Example:'''
void generate_copy_bit(drcuml_block *block, int srcbitnum, dstbitnum)
{
    /* copy a bit from srcbitnum to dstbitnum within the i0 register */
    UML_DROLINS(block, IREG(0), IREG(0), IMM((dstbitnum - srcbitnum) & 63), IMM(U64(1) << dstbitnum));
}
 
----
 
=== ADD ===
 
'''Usage:'''
ADD    ''dest'',''source1'',''source2''
DADD    ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_ADD(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DADD(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the addition results in an unsigned overflow
* V &mdash; set if the addition results in a signed overflow
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ADD''' opcode performs addition between the 32-bit ''source1'' and ''source2'' operands and stores the result in ''dest''; the '''DADD''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_get_cycles_plus_10_in_i0(drcuml_block *block, UINT32 *cycles)
{
    UML_ADD(block, IREG(0), MEM(cycles), IMM(10));
}
 
----
 
=== ADDC ===
 
'''Usage:'''
ADDC    ''dest'',''source1'',''source2''
DADDC  ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_ADDC(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DADDC(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the addition results in an unsigned overflow
* V &mdash; set if the addition results in a signed overflow
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ADDC''' opcode performs a three-way addition between the 32-bit ''source1'' and ''source2'' operands and the carry flag (0 or 1) and stores the result in ''dest''; the '''DADDC''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_add_with_carry(drcuml_block *block)
{
    UML_CARRY(block, MEM(flags), IMM(0));
    UML_ADDC(block, IREG(0), IREG(1), IREG(2));
}
 
----
 
=== SUB ===
 
'''Usage:'''
SUB    ''dest'',''source1'',''source2''
DSUB    ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_SUB(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DSUB(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the subtraction results in an unsigned overflow
* V &mdash; set if the subtraction results in a signed overflow
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''SUB''' opcode subtracts the 32-bit ''source2'' operand from the ''source1'' operand and stores the result in ''dest''; the '''DSUB''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_subtract_accumlated_cycles(drcuml_block *block, UINT32 *cycles)
{
    UML_SUB(block, MEM(cycles), MEM(cycles), MVAR(10));
}
 
----
 
=== SUBC ===
 
'''Usage:'''
SUBC    ''dest'',''source1'',''source2''
DSUBC  ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_SUBC(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DSUBC(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the subtraction results in an unsigned overflow
* V &mdash; set if the subtraction results in a signed overflow
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''SUBC''' opcode subtracts the 32-bit ''source2'' operand and the carry flag (0 or 1) from the ''source1'' operand and stores the result in ''dest''; the '''DSUBC''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_sub_with_carry(drcuml_block *block)
{
    UML_CARRY(block, MEM(flags), IMM(0));
    UML_SUBC(block, IREG(0), IREG(1), IREG(2));
}
 
----
 
=== CMP ===
 
'''Usage:'''
CMP    ''source1'',''source2''
DCMP    '''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_CMP(block, PTYPE(''source1''), PTYPE(''source2''));
UML_DCMP(block, PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the compare results in an unsigned overflow
* V &mdash; set if the compare results in a signed overflow
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''CMP''' opcode subtracts the 32-bit ''source2'' operand from the ''source1'' operand and throws away the result, keeping only the flags; the '''DCMP''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_branch_if_under_16_unsigned(drcuml_block *block, UINT32 *param, drcuml_codelabel target)
{
    UML_CMP(block, MEM(param), IMM(16));
    UML_JMPc(block, IF_B, target);
}
 
----
 
=== MULU ===
 
'''Usage:'''
MULU    ''dest1'',''dest2'',''source1'',''source2''
DMULU  ''dest1'',''dest2'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_MULU(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
UML_DMULU(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest1'' &mdash; a 32/64-bit integer register or memory location
* ''dest2'' &mdash; a 32/64-bit integer register or memory location; can be the same as ''dest1''
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; set if the upper half of the resulting value is not 0
* Z &mdash; set if the resulting value is 0 (only lower half considered if ''dest1'' == ''dest2'')
* S &mdash; set if the high bit of resulting value is set (only lower half considered if ''dest1'' == ''dest2'')
* U &mdash; undefined
 
'''Description:''' The '''MULU''' opcode performs a 32-bit unsigned multiply between the ''source1'' and ''source2'' parameters and stores the lower half of the result in ''dest1'' and the upper half in ''dest2''; the '''DMULU''' opcode performs the same operation using 64-bit operands.
 
Note that ''dest1'' can be set equal to ''dest2''; in this case, only the lower half of the final result is stored to the operand specified. The S and Z flags are also set differently, being based only on the lower half value (normally they are set based on the full double-width result).
 
'''Example:'''
void generate_64x64_wide_multiply(drcuml_block *block, UINT64 *src1, UINT64 *src2)
{
    /* lower result in i0, upper result in i1 */
    UML_DMULU(block, IREG(0), IREG(1), MEM(src1), MEM(src2));
}
 
----
 
=== MULS ===
 
'''Usage:'''
MULS    ''dest1'',''dest2'',''source1'',''source2''
DMULS  ''dest1'',''dest2'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_MULS(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
UML_DMULS(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest1'' &mdash; a 32/64-bit integer register or memory location
* ''dest2'' &mdash; a 32/64-bit integer register or memory location; can be the same as ''dest1''
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; set if the upper half of the resulting value is not the sign extension of the lower half
* Z &mdash; set if the resulting value is 0 (only lower half considered if ''dest1'' == ''dest2'')
* S &mdash; set if the high bit of resulting value is set (only lower half considered if ''dest1'' == ''dest2'')
* U &mdash; undefined
 
'''Description:''' The '''MULS''' opcode performs a 32-bit signed multiply between the ''source1'' and ''source2'' parameters and stores the lower half of the result in ''dest1'' and the upper half in ''dest2''; the '''DMULS''' opcode performs the same operation using 64-bit operands.
 
As with '''MULU''', ''dest1'' can be set equal to ''dest2''; in this case, only the lower half of the final result is stored to the operand specified. The S and Z flags are also set differently, being based only on the lower half value (normally they are set based on the full double-width result).
 
'''Example:'''
void generate_multiply_i0_by_20_and_assume_no_overflow(drcuml_block *block)
{
    /* lower result in i0, upper result in i1 */
    UML_MULS(block, IREG(0), IREG(0), IREG(0), IMM(20));
}
 
----
 
=== DIVU ===
 
'''Usage:'''
DIVU    ''dest1'',''dest2'',''source1'',''source2''
DDIVU  ''dest1'',''dest2'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_DIVU(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
UML_DDIVU(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest1'' &mdash; a 32/64-bit integer register or memory location
* ''dest2'' &mdash; a 32/64-bit integer register or memory location; can be the same as ''dest1''
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; set if the ''source2'' parameter was 0
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''DIVU''' opcode performs a 32-bit unsigned divide of ''source1'' by ''source2'' and stores the quotient in ''dest1'' and the remainder in ''dest2''; the '''DDIVU''' opcode performs the same operation using 64-bit operands.
 
Note that ''dest1'' can be set equal to ''dest2''; in this case, only the quotient is stored to the operand specified. The remainder does not need to computed.
 
'''Example:'''
void generate_64x32_unsigned_divide(drcuml_block *block, UINT64 *dividend, UINT32 *divisor)
{
    UML_DSEXT(block, IREG(0), MEM(divisor), DWORD);
    UML_DDIVU(block, IREG(0), IREG(0), MEM(dividend), IREG(0));
}
 
 
----
 
=== DIVS ===
 
'''Usage:'''
DIVS    ''dest1'',''dest2'',''source1'',''source2''
DDIVS  ''dest1'',''dest2'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_DIVS(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
UML_DDIVS(block, PTYPE(''dest1''), PTYPE(''dest2''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest1'' &mdash; a 32/64-bit integer register or memory location
* ''dest2'' &mdash; a 32/64-bit integer register or memory location; can be the same as ''dest1''
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; set if the ''source2'' parameter was 0
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''DIVS''' opcode performs a 32-bit signed divide of ''source1'' by ''source2'' and stores the quotient in ''dest1'' and the remainder in ''dest2''; the '''DDIVS''' opcode performs the same operation using 64-bit operands.
 
Note that ''dest1'' can be set equal to ''dest2''; in this case, only the quotient is stored to the operand specified. The remainder does not need to computed.
 
'''Example:'''
void generate_compute_remainder_of_i0_over_i1(drcuml_block *block)
{
    /* result in i0; we use i2 for scratch space since we cannot avoid computing the quotient */
    UML_DIVS(block, IREG(2), IREG(0), IREG(0), IREG(1));
}
 
----
 
=== AND ===
 
'''Usage:'''
AND    ''dest'',''source1'',''source2''
DAND    ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_AND(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DAND(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''AND''' opcode performs a logical AND between the 32-bit ''source1'' and ''source2'' operands and stores the result in ''dest''; the '''DAND''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_keep_low_16_bits(drcuml_block *block, UINT32 *param)
{
    UML_AND(block, MEM(param), MEM(param), IMM(0xffff));
}
 
----
 
=== TEST ===
 
'''Usage:'''
TEST    ''source1'',''source2''
DTEST  ''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_TEST(block, PTYPE(''source1''), PTYPE(''source2''));
UML_DTEST(block, PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''TEST''' opcode performs a logical AND between the 32-bit ''source1'' and ''source2'' operands and throws away the result, keeping only the flags; the '''DTEST''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_branch_if_set(drcuml_block *block, int bitnum, drcuml_codelabel target)
{
    UML_TEST(block, IREG(0), IMM(1 << bitnum));
    UML_JMPc(block, IF_NZ, target);
}
 
----
 
=== OR ===
 
'''Usage:'''
OR      ''dest'',''source1'',''source2''
DOR    ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_OR(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DOR(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''OR''' opcode performs a logical OR between the 32-bit ''source1'' and ''source2'' operands and stores the result in ''dest''; the '''DOR''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_set_bit_in_i1(drcuml_block *block, int whichbit)
{
    UML_OR(block, IREG(1), IREG(1), IMM(1 << Whichbit));
}
 
----
 
=== XOR ===
 
'''Usage:'''
XOR    ''dest'',''source1'',''source2''
DXOR    ''dest'',''source1'',''source2''
 
'''Codegen Shorthand:'''
UML_XOR(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
UML_DXOR(block, PTYPE(''dest''), PTYPE(''source1''), PTYPE(''source2''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source1'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''source2'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''XOR''' opcode performs a logical XOR between the 32-bit ''source1'' and ''source2'' operands and stores the result in ''dest''; the '''DXOR''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_invert_all_bits_in_i0(drcuml_block *block)
{
    UML_DXOR(block, IREG(0), IMM(~U64(0)));
}
 
----
 
=== LZCNT ===
 
'''Usage:'''
LZCNT  ''dest'',''source''
DLZCNT  ''dest'',''source''
 
'''Codegen Shorthand:'''
UML_LZCNT(block, PTYPE(''dest''), PTYPE(''source'));
UML_DLZCNT(block, PTYPE(''dest''), PTYPE(''source''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; undefined
* U &mdash; undefined
 
'''Description:''' The '''LZCNT''' opcode counts the number of consecutive high order 0 bits in a 32-bit ''source'' operand and stores the result to ''dest''; the '''DLZCNT''' opcode performs the same operation using 64-bit operands.
 
If the ''source'' value is 0, the result is either 32 ('''LZCNT''') or 64 ('''DLZCNT''').
 
'''Example:'''
void generate_left_justify_bits_in_i0(drcuml_block *block)
{
    UML_LZCNT(block, IREG(1), IREG(0));
    UML_SHL(block, IREG(0), IREG(0), IREG(1));
}
 
----
 
=== BSWAP ===
 
'''Usage:'''
BSWAP  ''dest'',''source''
DBSWAP  ''dest'',''source''
 
'''Codegen Shorthand:'''
UML_BSWAP(block, PTYPE(''dest''), PTYPE(''source''));
UML_BSWAP(block, PTYPE(''dest''), PTYPE(''source''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; undefined
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''BSWAP''' opcode reverses the byte order of the 32-bit ''source'' operand and stores the result in ''dest''; the '''DBSWAP''' opcode performs the same operation using 64-bit operands.
 
'''Example:'''
void generate_byte_swap_16bit_value_in_i0(drcuml_block *block)
{
    UML_BSWAP(block, IREG(0), IREG(0));
    UML_SHR(block, IREG(0), IREG(0), IMM(16));
}
 
----
 
=== SHL ===
 
'''Usage:'''
SHL    ''dest'',''source'',''shift''
DSHL    ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_SHL(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DSHL(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit shifted out of the ''source'' is a 1
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''SHL''' opcode shifts the ''source'' operand left by the number of bits specified by ''shift'' and stores the result to ''dest''; the '''DSHL''' opcode performs the same operation using 64-bit operands. For each bit shifted, a 0 bit is inserted into the least significant bit.
 
Note that only the low bits of the ''shift'' operand are considered. For '''SHL''', only the low 5 bits are used for the shift amount; for '''DSHL''', only the low 6 bits are used.
 
'''Example:'''
void generate_multiply_i0_by_16(drcuml_block *block)
{
    UML_SHL(block, IREG(0), IREG(0), IMM(4));
}
 
----
 
=== SHR ===
 
'''Usage:'''
SHR    ''dest'',''source'',''shift''
DSHR    ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_SHR(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DSHR(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit shifted out of the ''source'' is a 1
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''SHR''' opcode shifts the ''source'' operand right by the number of bits specified by ''shift'' and stores the result to ''dest''; the '''DSHR''' opcode performs the same operation using 64-bit operands. For each bit shifted, a 0 bit is inserted into the most significant bit.
 
Note that only the low bits of the ''shift'' operand are considered. For '''SHR''', only the low 5 bits are used for the shift amount; for '''DSHR''', only the low 6 bits are used.
 
'''Example:'''
void generate_unsigned_divide_i0_by_128(drcuml_block *block)
{
    UML_SHR(block, IREG(0), IREG(0), IMM(7));
}
 
----
 
=== SAR ===
 
'''Usage:'''
SAR    ''dest'',''source'',''shift''
DSAR    ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_SAR(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DSAR(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit shifted out of the ''source'' is a 1
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''SAR''' opcode arithmetically shifts the ''source'' operand right by the number of bits specified by ''shift'' and stores the result to ''dest''; the'''DSHR''' opcode performs the same operation using 64-bit operands. For each bit shifted, a copy of the previous most significant bit is inserted as the new most significant bit.
 
Note that only the low bits of the ''shift'' operand are considered. For '''SHR''', only the low 5 bits are used for the shift amount; for '''DSHR''', only the low 6 bits are used.
 
'''Example:'''
void generate_get_sign_extension_of_i0_in_i1(drcuml_block *block)
{
    UML_SAR(block, IREG(1), IREG(0), IMM(31));
}
 
----
 
=== ROL ===
 
'''Usage:'''
ROL    ''dest'',''source'',''shift''
DROL    ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_ROL(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DROL(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit rotated out of the ''source'' is a 1; equal to the new LSB
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ROL''' opcode rotates the ''source'' operand left by the number of bits specified by ''shift'' and stores the result to ''dest''; the '''DROL''' opcode performs the same operation using 64-bit operands. Each bit shifted out of the most-significant location is inserted as the new least-significant bit.
 
Note that only the low bits of the ''shift'' operand are considered. For '''ROL''', only the low 5 bits are used for the shift amount; for '''DROL''', only the low 6 bits are used.
 
'''Example:'''
void generate_swap_words_in_i0(drcuml_block *block)
{
    UML_ROL(block, IREG(0), IREG(0), IMM(16));
}
 
----
 
=== ROR ===
 
'''Usage:'''
ROR    ''dest'',''source'',''shift''
DROR    ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_ROR(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DROR(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit rotated out of the ''source'' is a 1; equal to the new MSB
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ROR''' opcode rotates the ''source'' operand right by the number of bits specified by ''shift'' and stores the result to ''dest''; the '''DROR''' opcode performs the same operation using 64-bit operands. Each bit shifted out of the least-significant location is inserted as the new most-significant bit.
 
Note that only the low bits of the ''shift'' operand are considered. For '''ROR''', only the low 5 bits are used for the shift amount; for '''DROR''', only the low 6 bits are used.
 
'''Example:'''
void generate_swap_dwords_in_i1(drcuml_block *block)
{
    UML_DROR(block, IREG(1), IREG(1), IMM(32));
}
 
----
 
=== ROLC ===
 
'''Usage:'''
ROLC    ''dest'',''source'',''shift''
DROLC  ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_ROLC(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DROLC(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit rotated out of the ''source'' is a 1
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''ROLC''' opcode rotates the 33-bit contatenation of the ''source'' operand and the carry flag left by the number of bits specified by ''shift'' and stores the result to ''dest''; the '''DROL''' opcode performs the same operation using 64-bit operands (with an effective 65-bit source). Each bit shifted out of the most-significant location is stored in the carry; the previous value of the carry is inserted as the new least-significant bit.
 
Note that only the low bits of the ''shift'' operand are considered. For '''ROL''', only the low 5 bits are used for the shift amount; for '''DROL''', only the low 6 bits are used.
 
'''Example:'''
void generate_rotate_128bit_value_in_i0_i1(drcuml_block *block)
{
    UML_CARRY(block, IREG(0), IMM(63));
    UML_DROLC(block, IREG(1), IREG(1), IMM(1));
    UML_DROLC(block, IREG(0), IREG(0), IMM(1));
}
 
----
 
=== ROLR ===
 
'''Usage:'''
RORC    ''dest'',''source'',''shift''
DRORC  ''dest'',''source'',''shift''
 
'''Codegen Shorthand:'''
UML_RORC(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
UML_DRORC(block, PTYPE(''dest''), PTYPE(''source''), PTYPE(''shift''));
 
'''Parameters:'''
* ''dest'' &mdash; a 32/64-bit integer register or memory location
* ''source'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
* ''shift'' &mdash; a 32/64-bit integer register, memory location, map variable, or immediate
 
'''Flags:'''
* C &mdash; set if the final bit rotated out of the ''source'' is a 1
* V &mdash; undefined
* Z &mdash; set if the resulting value is 0
* S &mdash; set if the high bit of the resulting value is set
* U &mdash; undefined
 
'''Description:''' The '''RORC''' opcode rotates the 33-bit contatenation of the ''source'' operand and the carry flag right by the number of bits specified by ''shift'' and stores the result to ''dest''; the '''DROR''' opcode performs the same operation using 64-bit operands (with an effective 65-bit source). Each bit shifted out of the least-significant location is stored in the carry; the previous value of the carry is inserted as the new most-significant bit.


Note that only the low bits of the ''shift'' operand are considered. For '''ROL''', only the low 5 bits are used for the shift amount; for '''DROL''', only the low 6 bits are used.
For information about particular opcodes, see one of these sections:


'''Example:'''
* [[UML Control Flow Opcodes]]
void generate_rotate_64bit_value_in_i0_i1(drcuml_block *block)
* [[UML Internal Register Opcodes]]
{
* [[UML Integer Opcodes]]
    UML_CARRY(block, IREG(1), IMM(0));
* [[UML Floating Point Opcodes]]
    UML_RORC(block, IREG(0), IREG(0), IMM(1));
    UML_RORC(block, IREG(1), IREG(1), IMM(1));
}

Latest revision as of 20:00, 9 June 2008

This article describes the Universal Machine Language runtime architecture.

Machine Architecture

At its heart, the Universal Machine Language describes an abstract, primarily 32-bit computer architecture. It has been designed with several goals in mind:

  • dynamic recompilers should be able to express common operations simply
  • 64-bit integer operations should be supported, even if they are not preferred
  • creating x86 and PowerPC back-ends (both 32-bit and 64-bit) should be relatively straightforward
  • a back-end written in a high-level language such as C should have reasonable performance

In addition to a collection of opcodes, described below, the Universal Machine Language also describes an abstract runtime architecture with several basic requirements:

  • 10 64-bit integer registers (i0-i9)
  • 10 64-bit floating point registers (f0-f9)
  • 10 32-bit "map variables" (m0-m9) which map values onto sections of code
  • 5 flag bits that can be optionally set on most instructions
  • 1 internal exception parameter register
  • a 16-entry call stack for subroutine and exception handling

Because each back-end targets a different final CPU architecture, these abstract requirements may not map perfectly; however, it is the job of the back-end code generator to provide an implementation that fully supports all of these requirements. For example, there may not be enough free actual system registers to hold 10 64-bit values, so some of those registers may be implicitly converted by the backend into memory references. More details on how to provide these abstractions will be available in the Back-End Author's Guide.

Code Cache

One of the primary features of a dynamic recompiler is its ability to cache and quickly recall already-translated code. Because of this, the concept of a code cache is central to the UML. The code cache not only contains all the generated code, along with the necessary hash tables to find it, but it also serves as a general heap for any data referenced by the generated code. Memory can be allocated from the cache and thus kept in the vicinity of the code that is likely to reference it. On many architectures, memory that is close to the code can be more efficiently accessed, so it is important to make good use of the memory management provided by the cache.

The cache is created by the dynamic recompiler at initialization time. The size of the cache is fixed once it is created, so it is important to create a cache that is large enough to hold a typical translated working set. If the cache is too small, then code will be flushed from it relatively quickly, and your CPU usage will increase because you are spending extra time to re-translate code that could have been executed from the cache.

The cache is divided into three sections. The topmost section is known as the near cache and is a fixed size (64k). The near cache is where frequently-accessed data should be stored. Generally this includes the current architectural state of the CPU that is being emulated, along with tables or other data that is frequently accessed by the UML code. It is also important to realize that many UML opcodes support using memory locations as parameters, but only if those memory locations are within the near cache.

The bottommost section of the cache is where permanent memory allocations are taken from. Data structures that are used and re-used throughout the lifetime of the dynamic recompiler are allocated here. When memory is allocated from this section, the cache end is moved downward, reducing the amount of free space in the cache. Although memory that has been allocated from this section can be freed, it does not affect the position of the cache end. Rather, that data is kept in a free list and re-used for the next memory allocation of a similar size.

The middle section of the cache is where the most action is. This is where all temporary memory allocations and code generation takes place. It starts at the cache base, which is simply fixed at the end of the near cache, and can expand as far as the cache end, which is where the permanent memory allocations lie. The cache top represents the position within this region where the next code will be generated or the next block of memory allocated. As code is generated and added to the cache, the cache top moves forward until it reaches the cache end. When that happens, the cache is flushed. A flush simply resets the cache top back to the cache base, effectively throwing away everything that has accumulated in this middle section and starting over.

Although it could be argued that there might be value in keeping some frequently-used cached code around when running out of space, in practice it is not worth the extra bookeeping necessary to make that determination. The dynamic recompiler and back-end should operate relatively quickly, making the performance hit of regenerating the code minimal.

Code Generation

UML code is generated in blocks. A block of UML code is defined to be self-contained. That is, all local jumps within the code are resolved, and all calls or jumps to code outside of the block are performed via either code handles or code hashes, which are described below. In general, a code block is either a subroutine or a translated sequence of code. The dynamic recompiler generates the block one instruction at a time using helper functions and macros provided by the UML system, which in response encodes the instruction opcodes and parameters into a sequence of structures. Once a block is complete, the dynamic recompiler notifies the UML system, who takes the list of structures and hands it off to the back-end to perform final translation.

One potential problem is that during code generation, the back-end may run out of space in the cache. When this happens, the cache needs to be flushed, and whatever was being generated needs to be regenerated from scratch. To accomplish this, the UML makes use of setjmp/longjmp. Before a block is started, the dynamic recompiler performs a setjmp and passes the jump buffer to the UML. If at any time the cache runs out of space, the UML performs a longjmp back to the starting point, which is responsible for flushing the cache and starting the codegen over again.

Code Flow

Becase the details of back-end code generation are abstracted, code flow becomes a little tricky, since the addresses of the code you wish to jump to are not known until the back-end translates to the final code. To remedy this, the UML introduces three concepts: code handles, code hashes, and code labels.

A code handle is a globally accessible reference to a block of code. In practice, a code handle is allocated from the near cache by the dynamic recompiler and contains a pointer to the generated code provided by the back-end. When first allocated, a code handle is empty, since the back-end hasn't had a chance to generate the final code yet. Similarly, when the cache is flushed, all code handles are automatically reset to their empty state, since any code they referenced has been jettisoned. During back-end code generation, when a HANDLE opcode is encountered, the back-end will fill in the code handle's code pointer with the current cache top, which is where subsequent code will be generated.

To execute code referenced by a handle, the dynamic recompiler calls the UML from C code, passing in the handle where it should begin execution. A pointer to the generated code for this handle is extracted from the handle and then the back-end is called to begin execution. UML code can also make subroutine calls to handle-based code via the CALLH opcode, or it can invoke handle-based code to handle an exception via the EXH opcode.

A code hash is a more indirect way to create a global reference to a block of code, more typically used for hopping between blocks of translated code. As its name implies, a code hash is filed away in a hash table or some other structure that is maintained by the back-end code. A code hash is represented by two values: a PC, which is typically the linear address of a block of code, and a mode, which allows further differentiation between code which may live at the same PC but execute in different contexts. During back-end code generation, when a HASH opcode is encountered, the back-end will take the PC and mode specified in the opcode and create an entry in its hash table pointing to the current cache top.

The only way to execute code referenced in the hash table is via the UML opcode HASHJMP, which accepts a mode and PC, performs the lookup, and either continues execution at the target code entry, or generates an exception if no code exists for that hash entry.

A code label is a mechanism to handle branches within a block. Code labels can be seen as analagous to labels in a typical assembly language. The primary difference is that the label itself is the UML opcode LABEL with a 32-bit integer identifier that must be unique within the block. Because the label itself is an opcode, it is easy for the back-end to determine whether a given instruction can be blended with neighboring instructions, since branches can only occur to the label opcodes. Labels are not resolved until back-end code generation, so if you make a mistake and reference an invalid label or forget to define a label, it won't be caught until the block is ended.

Code labels can be branched to via the JMP instruction, either unconditionally or via one of 16 conditions.

Opcode Conventions

Before diving into the details of the opcodes, it is important to understand some general principles and conventions:

Integer Registers. There are 10 integer registers, each 64-bits wide. The same registers are used for both 32-bit and 64-bit opcodes; however, unlike many real computer architectures, the upper 32 bits are fully undefined when a 32-bit operation is performed. This means you cannot load a 64-bit value, perform a 32-bit operation, and expect the upper 32 bits to be anything in particular when you are finished.

Most back-ends are expected to assign as many integer registers to native integer registers as possible; however, some architectures do not support mapping all 10 registers in this way. Because of this, dynamic recompilers should try to use the first few registers aggressively, only resorting to the later registers where necessary.

The contents of the integer registers are lost whenever an EXIT opcode is encountered.

Floating Point Registers. As with the integer registers, there are 10 floating point registers, each 64-bits wide. The same registers are again used for both 32-bit and 64-bit opcodes, and the upper 32 bits are fully undefined when a 32-bit operation is performed.

Floating point registers must not perform any conversions when loaded/stored/moved. This means that the floating point register set must support holding arbitrary values without performing any implicit conversion. Back-end architectures that cannot meet this requirement (e.g., the Intel x86 FPU), must keep the 10 floating point registers in memory and only convert data when performing arithmetic operations.

Back-end support for floating point registers is often even more limited than support for integer registers, so dynamic recompilers should focus on using the first few registers as much as possible.

The contents of the floating point registers are lost whenever an EXIT opcode is encountered.

Immediates. Immediate values can be up to 64-bits wide. Of course, it only makes sense to use immediate values that fit in the size of the opcode.

Memory Parameters. Memory parameters can be used in most of the places where register parameters are permitted. The memory parameter size is implicitly determined by the opcode. Most importantly, for the majority of instructions, any memory parameters must reside in the near cache. This is to ensure that they can be efficiently accessed on all architectures. The lone exceptions to this rule are the LOAD and STORE opcodes, which can be used to access memory anywhere.

Map Variables. Map variables are constant values that are encoded into the instruction stream. They are used to recover values when a subroutine or exception occurs, based on the caller's address. Map variables are only 32-bits wide and when used in code always translate into immediate values.

Flags. There are 5 flags defined by the architecture:

  • C (bit 0) is the carry flag, and indicates an unsigned carry in arithmetic operations or the shift-out value in rotate/shift operations
  • V (bit 1) is the overflow flag, and indicates a signed overflow in arithmetic operations
  • Z (bit 2) is the zero flag, and indicates a zero result
  • S (bit 3) is the sign flag, and indicates a negative result
  • U (bit 4) is the unordered flag, and indicates that a floating point compare had at least one NaN parameter

Opcodes which can affect the flags must specify which flags they care about. Flags that are not explicitly requested are undefined.

Conditions. Control flow instructions and simple data move instructions support an optional condition, which allows behavior to occur based on the state of the flags. Each flag can be checked for on/off independently. In addition, the usual collection of G/GE/L/LE/A/AE/B/BE conditions are available.

Opcodes

For information about particular opcodes, see one of these sections: