You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3. To run using Open MPI without the ``horovodrun`` wrapper, see the `Running Horovod with Open MPI <docs/mpirun.rst>`_ page.
196
196
197
-
4. To run in Docker, see the `Horovod in Docker <docs/docker.md>`_ page.
197
+
4. To run in Docker, see the `Horovod in Docker <docs/docker.rst>`_ page.
198
198
199
199
5. To run in Kubernetes, see `Kubeflow <https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job>`_, `MPI Operator <https://github.com/kubeflow/mpi-operator/>`_, `Helm Chart <https://github.com/kubernetes/charts/tree/master/stable/horovod/>`_, and `FfDL <https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/>`_.
200
200
201
-
6. To run in Spark, see the `Spark <docs/spark.md>`_ page.
201
+
6. To run in Spark, see the `Spark <docs/spark.rst>`_ page.
202
202
203
203
Keras
204
204
-----
205
205
Horovod supports Keras and regular TensorFlow in similar ways.
206
206
207
-
See full training `simple <examples/keras_mnist.py>`_ and `advanced <examples/keras_mnist_advanced.py>`_ examples.
207
+
See full training `simple <https://github.com/horovod/horovod/blob/master/examples/keras_mnist.py>`_ and `advanced <https://github.com/horovod/horovod/blob/master/examples/keras_mnist_advanced.py>`_ examples.
208
208
209
209
**Note**: Keras 2.0.9 has a `known issue <https://github.com/fchollet/keras/issues/8353>`_ that makes each worker allocate
210
210
all GPUs on the server, instead of the GPU assigned by the *local rank*. If you have multiple GPUs per server, upgrade
@@ -221,7 +221,7 @@ MXNet
221
221
-----
222
222
Horovod supports MXNet and regular TensorFlow in similar ways.
223
223
224
-
See full training `MNIST <examples/mxnet_mnist.py>`_ and `ImageNet <examples/mxnet_imagenet_resnet50.py>`_ examples. The script below provides a simple skeleton of code block based on MXNet Gluon API.
224
+
See full training `MNIST <https://github.com/horovod/horovod/blob/master/examples/mxnet_mnist.py>`_ and `ImageNet <https://github.com/horovod/horovod/blob/master/examples/mxnet_imagenet_resnet50.py>`_ examples. The script below provides a simple skeleton of code block based on MXNet Gluon API.
225
225
226
226
.. code-block:: python
227
227
@@ -349,15 +349,15 @@ You can check for MPI multi-threading support by querying the ``hvd.mpi_threads_
349
349
350
350
Inference
351
351
---------
352
-
Learn how to optimize your model for inference and remove Horovod operations from the graph `here <docs/inference.md>`_.
352
+
Learn how to optimize your model for inference and remove Horovod operations from the graph `here <docs/inference.rst>`_.
353
353
354
354
355
355
Tensor Fusion
356
356
-------------
357
357
One of the unique things about Horovod is its ability to interleave communication and computation coupled with the ability
358
-
to batch small *allreduce* operations, which results in improved performance. We call this batching feature Tensor Fusion.
358
+
to batch small **allreduce** operations, which results in improved performance. We call this batching feature Tensor Fusion.
359
359
360
-
See `here <docs/tensor-fusion.md>`__ for full details and tweaking instructions.
360
+
See `here <docs/tensor-fusion.rst>`__ for full details and tweaking instructions.
361
361
362
362
363
363
Analyzing Horovod Performance
@@ -367,7 +367,7 @@ Horovod has the ability to record the timeline of its activity, called Horovod T
4. At the end of the run, you will see the number of images processed per second:
40
+
41
+
.. code-block:: bash
42
+
43
+
total images/sec: 1656.82
44
+
45
+
46
+
**Real data benchmarks**
47
+
48
+
The benchmark instructions above are for the synthetic data benchmark.
49
+
50
+
To run the benchmark on a real data, you need to download the `ImageNet dataset <http://image-net.org/download-images>`__
51
+
and convert it using the TFRecord `preprocessing script <https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_imagenet.sh>`__.
52
+
53
+
Now, simply add ``--data_dir /path/to/imagenet/tfrecords --data_name imagenet --num_batches=2000`` to your training command:
Horovod core principles are based on the `MPI <http://mpi-forum.org/>`_ concepts *size*, *rank*,
9
+
*local rank*, *allreduce*, *allgather*, and *broadcast*. These are best explained by example. Say we launched
10
+
a training script on 4 servers, each having 4 GPUs. If we launched one copy of the script per GPU:
11
+
12
+
* *Size* would be the number of processes, in this case, 16.
13
+
14
+
* *Rank* would be the unique process ID from 0 to 15 (*size* - 1).
15
+
16
+
* *Local rank* would be the unique process ID within the server from 0 to 3.
17
+
18
+
* *Allreduce* is an operation that aggregates data among multiple processes and distributes results back to them. *Allreduce* is used to average dense tensors. Here's an illustration from the `MPI Tutorial <http://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/>`__:
* *Allgather* is an operation that gathers data from all processes on every process. *Allgather* is used to collect values of sparse tensors. Here's an illustration from the `MPI Tutorial <http://mpitutorial.com/tutorials/mpi-scatter-gather-and-allgather/>`__:
* *Broadcast* is an operation that broadcasts data from one process, identified by root rank, onto every other process. Here's an illustration from the `MPI Tutorial <http://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/>`__:
0 commit comments