-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Add support for packed vector instructions for floating point and integer operations.
-
Design and implement a generic signature that supports various explicit operations (e.g., mul, add) on, for instance, 64-bit floating point values (in e.g., 256bit packed vector registers).
-
Design and implement various structures that matches the above signature (e.g., for packed 64-bit floats and for packed 64-bit integers). Make use of the MLKit
prim
feature for intrinsics. -
Implement support for the intrinsics in the
Compiler/Lambda/LambdaExp
MLKit intermediate language to be targeted by the operations in the structures. Implement support for the operations all the way down to theCompiler/Backend/X64/CodeGenX64
/Compiler/Backend/X64/CodeGenUtilX64
modules (e.g., extend the operations inCompiler/Backend/PrimName.sml
) -
Implement operations for loading from and storing to memory. We can use the
BlockF64
values for representing and allocating memory.
Discussion.
An important aspect here is that the implementation will have to include boxing-operations that implicitly box the vector values into memory. The optimiser can then eliminate box-unboxing and unbox-box compositions. The reason is that, in general, it is impossible to ensure that a value is not passed to a generic function, stored in a data structure, or captured in a closure; it is assumed that all values can be represented in one 64-bit word (perhaps tagged with the LSB being 1, if the GC should not traverse the value).
I foresee some issues with implementing support for register allocation on the ymm
registers. Also, We must make sure that the optimiser (i.e., module Compiler/Lambda/OptLambda
) does not pass wide 256-bit values to generic functions. Also, such values cannot be passed as arguments to functions and neither can they be stored in closures. They are solely for operations in basic blocks. Ideally, these restrictions could be enforced in Compiler/Lambda/LambdaStatSem
.
An interesting application for these operations would be to make use of the operations to implement efficiently some of the operations in the Real64Array
/ Real64Vector
structures.