You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+9-121Lines changed: 9 additions & 121 deletions
Original file line number
Diff line number
Diff line change
@@ -103,11 +103,19 @@ Concepts
103
103
Horovod core principles are based on `MPI <http://mpi-forum.org/>`_ concepts such as *size*, *rank*,
104
104
*local rank*, **allreduce**, **allgather** and, *broadcast*. See `this page <docs/concepts.rst>`_ for more details.
105
105
106
+
Supported frameworks
107
+
--------------------
108
+
See these pages for Horovod examples and best practices:
109
+
110
+
- `Horovod with TensorFlow <#usage>`__ (Usage section below)
111
+
- `Horovod with Keras <docs/keras.rst>`_
112
+
- `Horovod with PyTorch <docs/pytorch.rst>`_
113
+
- `Horovod with MXNet <docs/mxnet.rst>`_
106
114
107
115
Usage
108
116
-----
109
117
110
-
To use Horovod, make the following additions to your program:
118
+
To use Horovod, make the following additions to your program. This example uses TensorFlow.
111
119
112
120
1. Run ``hvd.init()``.
113
121
@@ -202,132 +210,12 @@ page for more instructions, including RoCE/InfiniBand tweaks and tips for dealin
202
210
203
211
7. To run in Singularity, see `Singularity <https://github.com/sylabs/examples/tree/master/machinelearning/horovod>`_.
204
212
205
-
Keras
206
-
-----
207
-
Horovod supports Keras and regular TensorFlow in similar ways.
208
-
209
-
See full training `simple <https://github.com/horovod/horovod/blob/master/examples/keras_mnist.py>`_ and `advanced <https://github.com/horovod/horovod/blob/master/examples/keras_mnist_advanced.py>`_ examples.
210
-
211
-
**Note**: Keras 2.0.9 has a `known issue <https://github.com/fchollet/keras/issues/8353>`_ that makes each worker allocate
212
-
all GPUs on the server, instead of the GPU assigned by the *local rank*. If you have multiple GPUs per server, upgrade
213
-
to Keras 2.1.2 or downgrade to Keras 2.0.8.
214
-
215
-
216
213
Estimator API
217
214
-------------
218
215
Horovod supports Estimator API and regular TensorFlow in similar ways.
219
216
220
217
See a full training `example <examples/tensorflow_mnist_estimator.py>`_.
221
218
222
-
MXNet
223
-
-----
224
-
Horovod supports MXNet and regular TensorFlow in similar ways.
225
-
226
-
See full training `MNIST <https://github.com/horovod/horovod/blob/master/examples/mxnet_mnist.py>`_ and `ImageNet <https://github.com/horovod/horovod/blob/master/examples/mxnet_imagenet_resnet50.py>`_ examples. The script below provides a simple skeleton of code block based on MXNet Gluon API.
# Create DistributedTrainer, a subclass of gluon.Trainer
258
-
trainer = hvd.DistributedTrainer(params, opt)
259
-
260
-
# Create loss function
261
-
loss_fn =...
262
-
263
-
# Train model
264
-
for epoch inrange(num_epoch):
265
-
train_data.reset()
266
-
for nbatch, batch inenumerate(train_data, start=1):
267
-
data = batch.data[0].as_in_context(context)
268
-
label = batch.label[0].as_in_context(context)
269
-
with autograd.record():
270
-
output = model(data.astype(dtype, copy=False))
271
-
loss = loss_fn(output, label)
272
-
loss.backward()
273
-
trainer.step(batch_size)
274
-
275
-
276
-
277
-
**Note**: The `known issue <https://github.com/horovod/horovod/issues/884>`__ when running Horovod with MXNet on a Linux system with GCC version 5.X and above has been resolved. Please use MXNet 1.4.1 or later releases with Horovod 0.16.2 or later releases to avoid the GCC incompatibility issue. MXNet 1.4.0 release works with Horovod 0.16.0 and 0.16.1 releases with the GCC incompatibility issue unsolved.
278
-
279
-
PyTorch
280
-
-------
281
-
Horovod supports PyTorch and TensorFlow in similar ways.
282
-
283
-
Example (also see a full training `example <examples/pytorch_mnist.py>`__):
284
-
285
-
.. code-block:: python
286
-
287
-
import torch
288
-
import horovod.torch as hvd
289
-
290
-
# Initialize Horovod
291
-
hvd.init()
292
-
293
-
# Pin GPU to be used to process local rank (one GPU per process)
294
-
torch.cuda.set_device(hvd.local_rank())
295
-
296
-
# Define dataset...
297
-
train_dataset =...
298
-
299
-
# Partition dataset among workers using DistributedSampler
Copy file name to clipboardExpand all lines: docs/index.rst
+86-5Lines changed: 86 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -2,26 +2,106 @@ Horovod documentation
2
2
=====================
3
3
Horovod improves the speed, scale, and resource utilization of deep learning training.
4
4
5
+
Get started
6
+
-----------
7
+
Choose your deep learning framework to learn how to get started with Horovod.
8
+
9
+
.. raw:: html
10
+
11
+
<buttonclass="accordion">TensorFlow</button>
12
+
<divclass="panel">
13
+
<p>To use Horovod with TensorFlow on your laptop:
14
+
<ol>
15
+
<li><ahref="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
16
+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
17
+
<li>Read <ahref="https://horovod.readthedocs.io/en/latest/tensorflow.html">Horovod with TensorFlow</a> for best practices and examples. </li>
18
+
</ol>
19
+
Or, use <ahref="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <ahref="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <ahref="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <ahref="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<ahref="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <ahref="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <ahref="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <ahref="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
20
+
</p>
21
+
</div>
22
+
23
+
<buttonclass="accordion">Keras</button>
24
+
<divclass="panel">
25
+
<p>To use Horovod with Keras on your laptop:
26
+
<ol>
27
+
<li><ahref="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
28
+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
29
+
<li>Read <ahref="https://horovod.readthedocs.io/en/latest/keras.html">Horovod with Keras</a> for best practices and examples. </li>
30
+
</ol>
31
+
Or, use <ahref="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <ahref="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <ahref="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <ahref="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<ahref="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <ahref="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <ahref="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <ahref="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
32
+
</p>
33
+
</div>
34
+
35
+
<buttonclass="accordion">PyTorch</button>
36
+
<divclass="panel">
37
+
<p>To use Horovod with PyTorch on your laptop:
38
+
<ol>
39
+
<li><ahref="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
40
+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
41
+
<li>Read <ahref="https://horovod.readthedocs.io/en/latest/pytorch.html">Horovod with PyTorch</a> for best practices and examples. </li>
42
+
</ol>
43
+
Or, use <ahref="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <ahref="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <ahref="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <ahref="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<ahref="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <ahref="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <ahref="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <ahref="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
44
+
</p>
45
+
</div>
46
+
47
+
<buttonclass="accordion">Apache MXNet</button>
48
+
<divclass="panel">
49
+
<p>To use Horovod with Apache MXNet on your laptop:
50
+
<ol>
51
+
<li><ahref="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
52
+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
53
+
<li>Read <ahref="https://horovod.readthedocs.io/en/latest/mxnet.html">Horovod with MXNet</a> for best practices and examples. </li>
54
+
</ol>
55
+
Or, use <ahref="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <ahref="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <ahref="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <ahref="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<ahref="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <ahref="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <ahref="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <ahref="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
56
+
</p>
57
+
</div>
58
+
59
+
<script>
60
+
var acc =document.getElementsByClassName("accordion");
61
+
var i;
62
+
63
+
for (i =0; i <acc.length; i++) {
64
+
acc[i].addEventListener("click", function() {
65
+
this.classList.toggle("active");
66
+
var panel =this.nextElementSibling;
67
+
if (panel.style.maxHeight){
68
+
panel.style.maxHeight=null;
69
+
} else {
70
+
panel.style.maxHeight=panel.scrollHeight+"px";
71
+
}
72
+
});
73
+
}
74
+
</script>
75
+
76
+
Guides
77
+
------
78
+
5
79
.. toctree::
6
80
:maxdepth:2
7
81
8
82
summary_include
9
83
10
-
mpirun
84
+
concepts_include
11
85
12
86
api
13
87
14
-
concepts_include
88
+
tensorflow
89
+
90
+
keras
91
+
92
+
pytorch
93
+
94
+
mxnet
15
95
16
96
running_include
17
97
18
98
benchmarks_include
19
99
20
-
docker_include
100
+
inference_include
21
101
22
102
gpus_include
23
103
24
-
inference_include
104
+
docker_include
25
105
26
106
spark_include
27
107
@@ -31,8 +111,9 @@ Horovod improves the speed, scale, and resource utilization of deep learning tra
Horovod supports Keras and regular TensorFlow in similar ways.
4
+
5
+
See full training `simple <https://github.com/horovod/horovod/blob/master/examples/keras_mnist.py>`_ and `advanced <https://github.com/horovod/horovod/blob/master/examples/keras_mnist_advanced.py>`_ examples.
6
+
7
+
.. NOTE:: Keras 2.0.9 has a `known issue <https://github.com/fchollet/keras/issues/8353>`_ that makes each worker allocate all GPUs on the server, instead of the GPU assigned by the *local rank*. If you have multiple GPUs per server, upgrade to Keras 2.1.2 or downgrade to Keras 2.0.8.
0 commit comments