You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+52-29Lines changed: 52 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
# HORDE-AD: Higher Order Reverse Derivatives Efficiently
2
2
3
-
This is an Automatic Differentiation library based on the paper [_"Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"_](https://dl.acm.org/doi/10.1145/3498710) by Krawiec, Krishnaswami, Peyton Jones, Ellis, Fitzgibbon, and Eisenberg, developed in collaboration with the paper's authors. Compared to the paper, the library additionally efficiently supports array operations and can generate symbolic derivative programs, the latter only for more narrowly typed source programs, where the higher-orderness is limited to a closed set of functionals over arrays.
3
+
This is an Automatic Differentiation library based on the paper [_"Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"_](https://dl.acm.org/doi/10.1145/3498710) by Krawiec, Krishnaswami, Peyton Jones, Ellis, Fitzgibbon, and Eisenberg, developed in collaboration with the paper's authors. Compared to the paper, the library additionally efficiently supports array operations and can generate symbolic derivative programs, though the latter only for more narrowly typed source programs, where the higher-orderness is limited to a closed set of functionals over arrays.
4
4
5
-
This is an early prototype, both in terms of the engine performance, the API andthe preliminary tools and examples build with it. The user should be ready to add missing primitives, as well as any obvious tools that should be predefined but aren't. It's already possible to differentiate basic neural network architectures, such as fully connected, recurrent, convolutional and residual. The library should also be suitable for defining exotic machine learning architectures and non-machine learning systems, given that the notion of a neural network is not hardwired into the formalism, but compositionally and type-safely built up from general automatic differentiation building blocks.
5
+
This is an early prototype, both in terms of the engine performance, the API and the preliminary tools and examples built with it. The user should be ready to add missing primitives, as well as any obvious tools that should be predefined but aren't. It's already possible to differentiate basic neural network architectures, such as fully connected, recurrent, convolutional and residual. The library should also be suitable for defining exotic machine learning architectures and non-machine learning systems, given that the notion of a neural network is not hardwired into the formalism, but instead it's compositionally and type-safely built up from general automatic differentiation building blocks.
6
6
7
-
Mature Haskell libraries with similar capabilities, but varying efficiency, are https://hackage.haskell.org/package/ad and https://hackage.haskell.org/package/backprop. See also https://github.com/Mikolaj/horde-ad/blob/master/CREDITS.md. Benchmarks suggest that horde-ad has competitive performance.
7
+
Mature Haskell libraries with similar capabilities, but varying efficiency, are https://hackage.haskell.org/package/ad and https://hackage.haskell.org/package/backprop. See also https://github.com/Mikolaj/horde-ad/blob/master/CREDITS.md. Benchmarks suggest that horde-ad has competitive performance. (TODO: boasting)
8
8
<!--
9
9
-- TODO: do and redo the benchmarks
10
10
@@ -13,58 +13,81 @@ The benchmarks at SOMEWHERE show that this library has performance highly compet
13
13
It is hoped that the (well-typed) separation of AD logic and the tensor manipulation backend will enable similar speedups on numerical accelerators.
14
14
15
15
16
-
# WIP: The examples below are outdated and will be replaced soon using a new API
17
-
18
-
19
16
## Computing the derivative of a simple function
20
17
21
18
Here is an example of a Haskell function to be differentiated:
22
19
23
20
```hs
24
21
-- A function that goes from R^3 to R.
25
-
foo::RealFloata=> (a,a,a) ->a
26
-
foo (x,y,z) =
22
+
foo::RealFloata=> (a,a, a) ->a
23
+
foo (x, y, z) =
27
24
let w = x *sin y
28
25
inatan2 z w + z * w -- note that w appears twice
29
26
```
30
27
31
-
The gradient of `foo` is:
32
-
<!--
33
-
TODO: this may yet get simpler and the names not leaking implementation details
34
-
("delta") so much, when the adaptor gets used at scale and redone.
35
-
Alternatively, we could settle on Double already here.
36
-
-->
28
+
The gradient of `foo` instantiated to `Double` can be expressed in Haskell with horde-ad as:
As a side note, `w` is processed only once during gradient computation and this property of sharing preservation is guaranteed universally by horde-ad without any action required from the user. The property holds not only for scalar values, but for arbitrary tensors, e.g., those in further examples. We won't mention the property further.
Do we want yet another example here, before we reach Jacobians or shaped tensors? Perhaps one with the testing infrastructure, e.g., generating a single set of random tensors, or a full QuickCheck example or just a simple
? Or is there a risk the reader won't make it to the shaped example below if we tarry here? Or perhaps finish the shaped tensor example below with an invocation of `assertEqualUpToEpsilon`?
59
-
-->
53
+
54
+
Note that `w` is processed only once during gradient computation and this property of sharing preservation is guaranteed for the `crev` tool universally by horde-ad without any action required from the user. When computing symbolic derivative programs, however, the user has to explicitly mark values for sharing using `tlet` with a more specific type of the objective function, as shown below.
55
+
56
+
```hs
57
+
fooLet:: (RealFloatH (targeta), LetTensortarget)
58
+
=> (targeta, targeta, targeta) ->targeta
59
+
fooLet (x, y, z) =
60
+
tlet (x *sin y) $\w ->
61
+
atan2H z w + z * w
62
+
```
63
+
64
+
The symbolic derivative program (here presented with additional formatting) can be obtained using the `revArtifactAdapt` tool:
A quick inspection of the derivative program reveals that computations are not repeated, which is thanks to sharing. A concrete value of the symbolic derivative can be obtained by interpreting the derivative program in the context of the operations supplied by the horde-ad library. The value should be the same as when evaluating `fooLet` with `crev` on the concrete input, as before. A shorthand that creates the symbolic derivative program and evaluates it at a given input is called `rev` and is used exactly the same (but with potentially better performance) as `crev`.
80
+
81
+
82
+
# WIP: The examples below are outdated and will be replaced soon using a new API
60
83
61
84
62
-
<!--
63
85
## Computing Jacobians
64
86
65
87
-- TODO: we can have vector/matrix/tensor codomains, but not pair codomains
66
88
-- until #68 is done;
67
89
-- perhaps a vector codomain example, with a 1000x3 Jacobian, would make sense?
90
+
-- 2 years later: actually, we can now have TKProduct codomains.
68
91
69
92
Now let's consider a function from 'R^n` to `R^m'. We don't want the gradient, but instead the Jacobian.
0 commit comments