Skip to content

Commit 048ba40

Browse files
sblotneralsrgv
authored andcommitted
Restructure Horovod doc landing page (horovod#1158)
* Edit Horovod docs (horovod#1119) Signed-off-by: Stephanie Blotner <[email protected]> * Fix tf-nightly-gpu break (horovod#1124) Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Fix minor spacing issue in GPU and summary docs Signed-off-by: Stephanie Blotner <[email protected]> * Update Horovod doc site font, add landing page accordion Signed-off-by: Stephanie Blotner <[email protected]> * Reorder and clean up titles in left navigation Signed-off-by: Stephanie Blotner <[email protected]> * Add basic pages for keras, mxnet, pytorch, and tensorflow Signed-off-by: Stephanie Blotner <[email protected]> * Fix mpirun 4 GPU example Signed-off-by: Stephanie Blotner <[email protected]> * Minor updates to README Signed-off-by: Stephanie Blotner <[email protected]> * Make Open MPI an advanced topic Signed-off-by: Stephanie Blotner <[email protected]> * Fix REAMDE TF link Signed-off-by: Stephanie Blotner <[email protected]> * Resolve conflict Signed-off-by: Stephanie Blotner <[email protected]> * Fix minor spacing issue in GPU and summary docs (horovod#1127) Signed-off-by: Stephanie Blotner <[email protected]> * Updated horovodrun command (horovod#1126) Signed-off-by: Carsten Jacobsen <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Ask GCC version when filling out the issue (horovod#1133) Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Pin tf-nightly to 1.14.1.dev20190606 & remove torchvision-nightly (horovod#1137) * Pin tf-nightly to 1.14.1.dev20190606 Signed-off-by: Alex Sergeev <[email protected]> * torchvision-nightly -> torchvision Signed-off-by: Alex Sergeev <[email protected]> * Pin torchvision to a version that does not require CUDA Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Replace .step(synchronize=False) with optimizer.skip_synchronize() (horovod#1132) * Replace .step(synchronize=False) with optimizer.already_synchronized() Signed-off-by: Alex Sergeev <[email protected]> * Fix docs Signed-off-by: Alex Sergeev <[email protected]> * Rename to skip_synchronize() and fix test Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Unpin tf-nightly version (horovod#1140) Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Bump version to 0.16.4 (horovod#1139) Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * remove MSHADOW_USE_F16C (horovod#1141) Signed-off-by: Lin Yuan <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Adding support for multiple CUDA streams for NCCL operations. (horovod#1128) * Adding support for multiple CUDA streams for NCCL operations. Signed-off-by: Josh Romero <[email protected]> * Fix compilation without CUDA or NCCL enabled. Signed-off-by: Josh Romero <[email protected]> * Updating variable names. Signed-off-by: Josh Romero <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Add Singularity example page (horovod#1149) Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Update Gloo api for data layer (horovod#1120) * Added gloo as a submodule Signed-off-by: Travis Addair <[email protected]> * Added cmake build for gloo Signed-off-by: Travis Addair <[email protected]> * Added allreduce and broadcast ops for Gloo Signed-off-by: Travis Addair <[email protected]> * Enable MPI Signed-off-by: Travis Addair <[email protected]> * Fixed transport Signed-off-by: Travis Addair <[email protected]> * Use MPI comm from Horovod Signed-off-by: Travis Addair <[email protected]> * Changed gloo allreduce to always make use of fusion buffer Signed-off-by: Travis Addair <[email protected]> * Copy directly to output buffer Signed-off-by: Travis Addair <[email protected]> * Unique ptr to shared ptr Signed-off-by: Travis Addair <[email protected]> * Fixed root pointer rank Signed-off-by: Travis Addair <[email protected]> * Added float16 support for Gloo Signed-off-by: Travis Addair <[email protected]> * Use allgatherv Signed-off-by: Travis Addair <[email protected]> * Use GlooAllgather by default Signed-off-by: Travis Addair <[email protected]> * Pulled down update to gloo Signed-off-by: Sihan Zeng <[email protected]> * update allgather allreduce and broadcast for unified gloo api Signed-off-by: Sihan Zeng <[email protected]> * update setup.py & MANIFEST.in Signed-off-by: Sihan Zeng <[email protected]> * Add runtime flag to support switching betwee gloo and mpi Signed-off-by: Sihan Zeng <[email protected]> * Resolve review Signed-off-by: Sihan Zeng <[email protected]> * fix iface issue Signed-off-by: Sihan Zeng <[email protected]> * set Gloo to be automatically compiled except on MacOS Signed-off-by: Sihan Zeng <[email protected]> * fix code style Signed-off-by: Sihan Zeng <[email protected]> * integrate compile flag Signed-off-by: Sihan Zeng <[email protected]> * fixed reviews Signed-off-by: Sihan Zeng <[email protected]> * remove cmake from require list if system has cmake installed Signed-off-by: Sihan Zeng <[email protected]> * cmake becomes a blocking issue, temporarily work it around by skip compiling gloo if cmake is not installed. Signed-off-by: Sihan Zeng <[email protected]> * rebase on the latest master Signed-off-by: Sihan Zeng <[email protected]> * remove chmod related code Signed-off-by: Sihan Zeng <[email protected]> * final fix up Signed-off-by: Sihan Zeng <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * MLSL: move mlsl_init before mpi_init, add mlsl_finalize call (horovod#1156) Signed-off-by: Mikhail Shiryaev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]> * Update Horovod GitHub URL (horovod#1147) Signed-off-by: Alex Sergeev <[email protected]> Signed-off-by: Stephanie Blotner <[email protected]>
1 parent 50a3c01 commit 048ba40

File tree

13 files changed

+294
-263
lines changed

13 files changed

+294
-263
lines changed

README.rst

Lines changed: 9 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -103,11 +103,19 @@ Concepts
103103
Horovod core principles are based on `MPI <http://mpi-forum.org/>`_ concepts such as *size*, *rank*,
104104
*local rank*, **allreduce**, **allgather** and, *broadcast*. See `this page <docs/concepts.rst>`_ for more details.
105105

106+
Supported frameworks
107+
--------------------
108+
See these pages for Horovod examples and best practices:
109+
110+
- `Horovod with TensorFlow <#usage>`__ (Usage section below)
111+
- `Horovod with Keras <docs/keras.rst>`_
112+
- `Horovod with PyTorch <docs/pytorch.rst>`_
113+
- `Horovod with MXNet <docs/mxnet.rst>`_
106114

107115
Usage
108116
-----
109117

110-
To use Horovod, make the following additions to your program:
118+
To use Horovod, make the following additions to your program. This example uses TensorFlow.
111119

112120
1. Run ``hvd.init()``.
113121

@@ -202,132 +210,12 @@ page for more instructions, including RoCE/InfiniBand tweaks and tips for dealin
202210

203211
7. To run in Singularity, see `Singularity <https://github.com/sylabs/examples/tree/master/machinelearning/horovod>`_.
204212

205-
Keras
206-
-----
207-
Horovod supports Keras and regular TensorFlow in similar ways.
208-
209-
See full training `simple <https://github.com/horovod/horovod/blob/master/examples/keras_mnist.py>`_ and `advanced <https://github.com/horovod/horovod/blob/master/examples/keras_mnist_advanced.py>`_ examples.
210-
211-
**Note**: Keras 2.0.9 has a `known issue <https://github.com/fchollet/keras/issues/8353>`_ that makes each worker allocate
212-
all GPUs on the server, instead of the GPU assigned by the *local rank*. If you have multiple GPUs per server, upgrade
213-
to Keras 2.1.2 or downgrade to Keras 2.0.8.
214-
215-
216213
Estimator API
217214
-------------
218215
Horovod supports Estimator API and regular TensorFlow in similar ways.
219216

220217
See a full training `example <examples/tensorflow_mnist_estimator.py>`_.
221218

222-
MXNet
223-
-----
224-
Horovod supports MXNet and regular TensorFlow in similar ways.
225-
226-
See full training `MNIST <https://github.com/horovod/horovod/blob/master/examples/mxnet_mnist.py>`_ and `ImageNet <https://github.com/horovod/horovod/blob/master/examples/mxnet_imagenet_resnet50.py>`_ examples. The script below provides a simple skeleton of code block based on MXNet Gluon API.
227-
228-
.. code-block:: python
229-
230-
import mxnet as mx
231-
import horovod.mxnet as hvd
232-
from mxnet import autograd
233-
234-
# Initialize Horovod
235-
hvd.init()
236-
237-
# Pin GPU to be used to process local rank
238-
context = mx.gpu(hvd.local_rank())
239-
num_workers = hvd.size()
240-
241-
# Build model
242-
model = ...
243-
model.hybridize()
244-
245-
# Create optimizer
246-
optimizer_params = ...
247-
opt = mx.optimizer.create('sgd', **optimizer_params)
248-
249-
# Initialize parameters
250-
model.initialize(initializer, ctx=context)
251-
252-
# Fetch and broadcast parameters
253-
params = model.collect_params()
254-
if params is not None:
255-
hvd.broadcast_parameters(params, root_rank=0)
256-
257-
# Create DistributedTrainer, a subclass of gluon.Trainer
258-
trainer = hvd.DistributedTrainer(params, opt)
259-
260-
# Create loss function
261-
loss_fn = ...
262-
263-
# Train model
264-
for epoch in range(num_epoch):
265-
train_data.reset()
266-
for nbatch, batch in enumerate(train_data, start=1):
267-
data = batch.data[0].as_in_context(context)
268-
label = batch.label[0].as_in_context(context)
269-
with autograd.record():
270-
output = model(data.astype(dtype, copy=False))
271-
loss = loss_fn(output, label)
272-
loss.backward()
273-
trainer.step(batch_size)
274-
275-
276-
277-
**Note**: The `known issue <https://github.com/horovod/horovod/issues/884>`__ when running Horovod with MXNet on a Linux system with GCC version 5.X and above has been resolved. Please use MXNet 1.4.1 or later releases with Horovod 0.16.2 or later releases to avoid the GCC incompatibility issue. MXNet 1.4.0 release works with Horovod 0.16.0 and 0.16.1 releases with the GCC incompatibility issue unsolved.
278-
279-
PyTorch
280-
-------
281-
Horovod supports PyTorch and TensorFlow in similar ways.
282-
283-
Example (also see a full training `example <examples/pytorch_mnist.py>`__):
284-
285-
.. code-block:: python
286-
287-
import torch
288-
import horovod.torch as hvd
289-
290-
# Initialize Horovod
291-
hvd.init()
292-
293-
# Pin GPU to be used to process local rank (one GPU per process)
294-
torch.cuda.set_device(hvd.local_rank())
295-
296-
# Define dataset...
297-
train_dataset = ...
298-
299-
# Partition dataset among workers using DistributedSampler
300-
train_sampler = torch.utils.data.distributed.DistributedSampler(
301-
train_dataset, num_replicas=hvd.size(), rank=hvd.rank())
302-
303-
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=..., sampler=train_sampler)
304-
305-
# Build model...
306-
model = ...
307-
model.cuda()
308-
309-
optimizer = optim.SGD(model.parameters())
310-
311-
# Add Horovod Distributed Optimizer
312-
optimizer = hvd.DistributedOptimizer(optimizer, named_parameters=model.named_parameters())
313-
314-
# Broadcast parameters from rank 0 to all other processes.
315-
hvd.broadcast_parameters(model.state_dict(), root_rank=0)
316-
317-
for epoch in range(100):
318-
for batch_idx, (data, target) in enumerate(train_loader):
319-
optimizer.zero_grad()
320-
output = model(data)
321-
loss = F.nll_loss(output, target)
322-
loss.backward()
323-
optimizer.step()
324-
if batch_idx % args.log_interval == 0:
325-
print('Train Epoch: {} [{}/{}]\tLoss: {}'.format(
326-
epoch, batch_idx * len(data), len(train_sampler), loss.item()))
327-
328-
329-
**Note**: PyTorch support requires NCCL 2.2 or later. It also works with NCCL 2.1.15 if you are not using RoCE or InfiniBand.
330-
331219
mpi4py
332220
------
333221
Horovod supports mixing and matching Horovod collectives with other MPI libraries, such as `mpi4py <https://mpi4py.scipy.org>`_,

docs/_static/custom.css

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
/* Custom CSS for landing page accordion */
2+
3+
/*Accordion open/close button style */
4+
.accordion {
5+
background-color: #eee;
6+
color: #444;
7+
cursor: pointer;
8+
padding: 18px;
9+
width: 100%;
10+
text-align: left;
11+
border: none;
12+
outline: none;
13+
transition: 0.4s;
14+
font-family: 'Helvetica Neue';
15+
font-size: 16px;
16+
}
17+
18+
/* Active button style*/
19+
.active, .accordion:hover {
20+
background-color: #ccc;
21+
}
22+
23+
/* Accordion panel style */
24+
.panel {
25+
padding: 0 18px;
26+
background-color: white;
27+
max-height: 0;
28+
overflow: hidden;
29+
transition: max-height 0.2s ease-out;
30+
font-size: 16px;
31+
}
32+
33+
/* Plus sign */
34+
.accordion:after {
35+
content: '\02795';
36+
font-size: 16px;
37+
color: #777;
38+
float: right;
39+
margin-left: 5px;
40+
font-family: 'Helvetica Neue';
41+
}
42+
43+
/* Minus sign */
44+
.active:after {
45+
content: "\2796";
46+
}

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@
9191
'github_count': 'true',
9292
'fixed_sidebar': True,
9393
'sidebar_collapse': True,
94+
'font_family': 'Helvetica Neue'
9495
}
9596

9697
# Add any paths that contain custom static files (such as style sheets) here,

docs/index.rst

Lines changed: 86 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,106 @@ Horovod documentation
22
=====================
33
Horovod improves the speed, scale, and resource utilization of deep learning training.
44

5+
Get started
6+
-----------
7+
Choose your deep learning framework to learn how to get started with Horovod.
8+
9+
.. raw:: html
10+
11+
<button class="accordion">TensorFlow</button>
12+
<div class="panel">
13+
<p>To use Horovod with TensorFlow on your laptop:
14+
<ol>
15+
<li><a href="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
16+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
17+
<li>Read <a href="https://horovod.readthedocs.io/en/latest/tensorflow.html">Horovod with TensorFlow</a> for best practices and examples. </li>
18+
</ol>
19+
Or, use <a href="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <a href="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <a href="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <a href="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<a href="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <a href="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <a href="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <a href="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
20+
</p>
21+
</div>
22+
23+
<button class="accordion">Keras</button>
24+
<div class="panel">
25+
<p>To use Horovod with Keras on your laptop:
26+
<ol>
27+
<li><a href="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
28+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
29+
<li>Read <a href="https://horovod.readthedocs.io/en/latest/keras.html">Horovod with Keras</a> for best practices and examples. </li>
30+
</ol>
31+
Or, use <a href="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <a href="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <a href="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <a href="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<a href="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <a href="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <a href="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <a href="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
32+
</p>
33+
</div>
34+
35+
<button class="accordion">PyTorch</button>
36+
<div class="panel">
37+
<p>To use Horovod with PyTorch on your laptop:
38+
<ol>
39+
<li><a href="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
40+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
41+
<li>Read <a href="https://horovod.readthedocs.io/en/latest/pytorch.html">Horovod with PyTorch</a> for best practices and examples. </li>
42+
</ol>
43+
Or, use <a href="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <a href="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <a href="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <a href="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<a href="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <a href="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <a href="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <a href="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
44+
</p>
45+
</div>
46+
47+
<button class="accordion">Apache MXNet</button>
48+
<div class="panel">
49+
<p>To use Horovod with Apache MXNet on your laptop:
50+
<ol>
51+
<li><a href="https://www.open-mpi.org/faq/?category=building#easy-build">Install Open MPI 3.1.2 or 4.0.0</a>, or another MPI implementation. </li>
52+
<li>Install the Horovod pip package: <code>pip install horovod</code></li>
53+
<li>Read <a href="https://horovod.readthedocs.io/en/latest/mxnet.html">Horovod with MXNet</a> for best practices and examples. </li>
54+
</ol>
55+
Or, use <a href="https://horovod.readthedocs.io/en/latest/gpus_include.html">Horovod on GPUs</a>, in <a href="https://horovod.readthedocs.io/en/latest/spark_include.html">Spark</a>, <a href="https://horovod.readthedocs.io/en/latest/docker_include.html">Docker</a>, <a href="https://github.com/sylabs/examples/tree/master/machinelearning/horovod">Singularity</a>, or Kubernetes (<a href="https://github.com/kubeflow/kubeflow/tree/master/kubeflow/mpi-job">Kubeflow</a>, <a href="https://github.com/kubeflow/mpi-operator/">MPI Operator</a>, <a href="https://github.com/helm/charts/tree/master/stable/horovod">Helm Chart</a>, and <a href="https://github.com/IBM/FfDL/tree/master/etc/examples/horovod/">FfDL</a>).
56+
</p>
57+
</div>
58+
59+
<script>
60+
var acc = document.getElementsByClassName("accordion");
61+
var i;
62+
63+
for (i = 0; i < acc.length; i++) {
64+
acc[i].addEventListener("click", function() {
65+
this.classList.toggle("active");
66+
var panel = this.nextElementSibling;
67+
if (panel.style.maxHeight){
68+
panel.style.maxHeight = null;
69+
} else {
70+
panel.style.maxHeight = panel.scrollHeight + "px";
71+
}
72+
});
73+
}
74+
</script>
75+
76+
Guides
77+
------
78+
579
.. toctree::
680
:maxdepth: 2
781

882
summary_include
983

10-
mpirun
84+
concepts_include
1185

1286
api
1387

14-
concepts_include
88+
tensorflow
89+
90+
keras
91+
92+
pytorch
93+
94+
mxnet
1595

1696
running_include
1797

1898
benchmarks_include
1999

20-
docker_include
100+
inference_include
21101

22102
gpus_include
23103

24-
inference_include
104+
docker_include
25105

26106
spark_include
27107

@@ -31,8 +111,9 @@ Horovod improves the speed, scale, and resource utilization of deep learning tra
31111

32112
troubleshooting_include
33113

114+
34115
Indices and tables
35-
==================
116+
------------------
36117

37118
* :ref:`genindex`
38119
* :ref:`modindex`

docs/keras.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Horovod with Keras
2+
==================
3+
Horovod supports Keras and regular TensorFlow in similar ways.
4+
5+
See full training `simple <https://github.com/horovod/horovod/blob/master/examples/keras_mnist.py>`_ and `advanced <https://github.com/horovod/horovod/blob/master/examples/keras_mnist_advanced.py>`_ examples.
6+
7+
.. NOTE:: Keras 2.0.9 has a `known issue <https://github.com/fchollet/keras/issues/8353>`_ that makes each worker allocate all GPUs on the server, instead of the GPU assigned by the *local rank*. If you have multiple GPUs per server, upgrade to Keras 2.1.2 or downgrade to Keras 2.0.8.

docs/mpirun.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
Running Horovod with Open MPI
2-
=============================
1+
:orphan:
32

3+
Run Horovod with Open MPI
4+
=========================
45
``horovodrun`` introduces a convenient, Open MPI-based wrapper for running Horovod scripts.
56

67
In some cases it is desirable to have fine-grained control over options passed to Open MPI. This page describes
@@ -10,14 +11,13 @@ running Horovod training directly using Open MPI.
1011

1112
.. code-block:: bash
1213
13-
horovodrun -np 4 -H localhost:4 python train.py
14+
horovodrun -np 4 python train.py
1415
1516
Equivalent Open MPI command:
1617

1718
.. code-block:: bash
1819
1920
mpirun -np 4 \
20-
-H localhost:4 \
2121
-bind-to none -map-by slot \
2222
-x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
2323
-mca pml ob1 -mca btl ^openib \

0 commit comments

Comments
 (0)