-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Currently we have
type AstIxS ms sh = IxS sh (AstInt ms)
type AstInt ms = AstTensor ms PrimalSpan (TKScalar Int64)
and consequently we can't share computation between the indexes returned by the lambda in a gather nor scatter operation. This is a real problem that already appears when rewriting AST in HordeAd.Core.AstSimplify
, though so far there is no benchmark evidence it has measurable performance impact in realistic examples. The shareIx
workaround fails whenever the index expression we want to share contains variables from the lambda, which obviously happens in practice.
One possible solution is to define something like
type family AstIxS ms sh where
AstIxS ms '[] = TKUnit
AstIxS ms (n ': rest) = TKProduct (AstInt ms) (AstIxS ms rest)
and then convert between this and the old definition of AstIxS
as needed (assuming the rest of horde-ad would stick to IxS
; certainly ox-arrays would). The frequent use of conversions, which is sadly bound to obfuscate any code working with AST, would be necessary to maintain the usage of the collection of useful and type-safe operations on IxS
and related types that ox-arrays provides. Probably many of the operations would need to be defined anew for the fully general AstIxS
as well. The library user would need to either convert or use the new set of operations in order to use the new kind of sharing, though we may want to wrap this in some syntactic sugar to make it less painful. Another issue is that type families are cumbersome when coding in Haskell and so we may end up needing more singletons, unsafe coercions, etc. This may spill over to the library usage as well. There are probably no other problems with this solution that these. A prototype would be needed to verify and benchmark this approach.
An alternative solution is to use small rank 1 tensors of Int
s as a representation of (lists of) indexes. This has been tried in the workaround tletIx
which, while similar in usage and applicability to shareIx
, proved disastrous performance-wise. The advantage is that no type family is required. Conversions, however, are needed just as with the TKProduct
solution.
Concluding, there seem to be no fundamental problems and plenty of usability problems, so it may make sense to wait until the next big rewrite of horde-ad (e.g., due to even bigger usability problems that users discover) and/or realistic examples that trip on this horde-ad limitation before committing to any solution.