Skip to content

Commit bdce24b

Browse files
authored
Fix affinity, convert oneccl.md to oneccl.rst (horovod#2350)
Signed-off-by: Yana Shchyokotova <[email protected]>
1 parent bfe9178 commit bdce24b

File tree

9 files changed

+518
-86
lines changed

9 files changed

+518
-86
lines changed

.buildkite/gen-pipeline.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ tests=$(if [[ "${BUILDKITE_BRANCH:-}" == "${BUILDKITE_PIPELINE_DEFAULT_BRANCH:-}
1818
printf "test-cpu-gloo-py3_8-tf2_3_0-keras2_3_1-torch1_6_0-mxnet1_5_0-pyspark3_0_1 "
1919
printf "test-cpu-openmpi-py3_6-tfhead-kerashead-torchhead-mxnethead-pyspark2_4_7 "
2020
printf "test-cpu-mpich-py3_6-tf1_15_0-keras2_3_1-torch1_4_0-mxnet1_5_0-pyspark2_4_7 "
21-
# printf "test-cpu-oneccl-py3_6-tf1_15_0-keras2_3_1-torch1_4_0-mxnet1_5_0-pyspark2_4_7 "
22-
# printf "test-cpu-oneccl-ofi-py3_6-tf1_15_0-keras2_3_1-torch1_4_0-mxnet1_5_0-pyspark2_4_7 "
21+
printf "test-cpu-oneccl-py3_6-tf1_15_0-keras2_3_1-torch1_4_0-mxnet1_5_0-pyspark2_4_7 "
22+
printf "test-cpu-oneccl-ofi-py3_6-tf1_15_0-keras2_3_1-torch1_4_0-mxnet1_5_0-pyspark2_4_7 "
2323
printf "test-gpu-openmpi-py3_6-tf1_15_0-keras2_2_4-torch1_3_0-mxnet1_4_1-pyspark2_4_7 "
2424
printf "test-gpu-gloo-py3_6-tf2_0_0-keras2_3_1-torch1_4_0-mxnet1_4_1-pyspark2_4_7 "
2525
printf "test-gpu-openmpi-gloo-py3_6-tf2_2_0-keras2_3_1-torch1_5_0-mxnet1_4_1-pyspark2_4_7 "

Dockerfile.test.cpu

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,20 +88,21 @@ RUN if [[ ${MPI_KIND} == "OpenMPI" ]]; then \
8888
chmod +x /usr/local/oneccl/bin/mpigxx && \
8989
cp /tmp/oneCCL-master/mpi/lib/libmpicxx.so /usr/local/oneccl/lib && \
9090
chmod +x /usr/local/oneccl/lib/libmpicxx.so && \
91+
cp /tmp/oneCCL-master/mpi/lib/libmpifort.so /usr/local/oneccl/lib && \
92+
chmod +x /usr/local/oneccl/lib/libmpifort.so && \
9193
sed -i 's/if \[ -z \"\${I_MPI_ROOT}\" \]/if [ -z \"${I_MPI_ROOT:-}\" ]/g' /usr/local/oneccl/env/setvars.sh && \
9294
sed -i 's/ \$1/ \${1:-}/g' /usr/local/oneccl/env/setvars.sh && \
9395
echo ". /usr/local/oneccl/env/setvars.sh" > /oneccl_env && \
9496
chmod +x /oneccl_env && \
95-
echo "export CCL_ATL_TRANSPORT=ofi; export CCL_ATL_SHM=0; \
97+
echo "export CCL_ATL_TRANSPORT=ofi; \
9698
echo \"\$(env)\"; \
9799
echo \"mpirun is \$(which mpirun)\"; \
98100
echo \"LD_LIBRARY_PATH is \$(echo \$LD_LIBRARY_PATH)\"; \
99101
echo \"oneCCL links with \$(ldd /usr/local/oneccl/lib/libccl.so)\"; \
100102
mpirun -np 2 -hosts localhost \$@" > /mpirun_command_ofi && \
101103
chmod +x /mpirun_command_ofi && \
102104
cp /mpirun_command_ofi /mpirun_command_mpi && \
103-
sed -i 's/export CCL_ATL_TRANSPORT=ofi;//g' /mpirun_command_mpi && \
104-
sed -i 's/export CCL_ATL_SHM=0;//g' /mpirun_command_mpi && \
105+
sed -i 's/export CCL_ATL_TRANSPORT=ofi;/export CCL_ATL_TRANSPORT=mpi;/g' /mpirun_command_mpi && \
105106
echo "-L/usr/local/oneccl/lib -lmpi -I/usr/local/oneccl/include" > /mpicc_oneccl && \
106107
chmod +x /mpicc_oneccl && \
107108
echo "/mpirun_command_mpi" > /mpirun_command; \

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,8 @@ Guides
119119

120120
mpi_include
121121

122+
oneccl_include
123+
122124
conda_include
123125

124126
docker_include

docs/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ oneCCL
190190
~~~~~~
191191

192192
oneCCL is an Intel library for accelerated collective operations on CPU. See
193-
`Horovod with Intel(R) oneCCL <oneccl.md>`_ for more details.
193+
`Horovod with Intel(R) oneCCL <oneccl.rst>`_ for more details.
194194

195195
Set ``HOROVOD_CPU_OPERATIONS=CCL`` to use oneCCL.
196196

docs/oneccl.md

Lines changed: 0 additions & 78 deletions
This file was deleted.

docs/oneccl.rst

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
.. inclusion-marker-start-do-not-remove
2+
3+
Horovod with Intel(R) oneCCL
4+
============================
5+
To use Horovod with the Intel(R) oneAPI Collective Communications Library (oneCCL), follow the steps below.
6+
7+
1. Install `oneCCL <https://github.com/intel/oneccl>`_.
8+
9+
To install oneCCL, follow `these steps <https://github.com/intel/oneccl/blob/master/README.md>`_.
10+
11+
Source ``setvars.sh`` to start using oneCCL.
12+
13+
.. code-block:: bash
14+
15+
source <install_dir>/env/setvars.sh
16+
17+
2. Install the `Intel(R) MPI Library <https://software.intel.com/en-us/mpi-library>`_.
18+
19+
To install the Intel MPI Library, follow `these instructions <https://software.intel.com/en-us/mpi-library/documentation/get-started>`_.
20+
21+
Source ``mpivars.sh`` script to establish the proper environment settings.
22+
23+
.. code-block:: bash
24+
25+
source <installdir_MPI>/intel64/bin/mpivars.sh release_mt
26+
27+
3. Set ``HOROVOD_CPU_OPERATIONS`` variable
28+
29+
.. code-block:: bash
30+
31+
export HOROVOD_CPU_OPERATIONS=CCL
32+
33+
4. Install Horovod from source code
34+
35+
.. code-block:: bash
36+
37+
python setup.py build
38+
python setup.py install
39+
40+
or via pip
41+
42+
.. code-block:: bash
43+
44+
pip install horovod
45+
46+
**Advanced:** You can specify the affinity for BackgroundThread with the ``HOROVOD_THREAD_AFFINITY`` environment variable.
47+
See the instructions below.
48+
49+
Set Horovod background thread affinity according to the rule. If there is N Horovod ranks per node, this variable should
50+
contain all the values for every rank using comma as a separator:
51+
52+
.. code-block:: bash
53+
54+
export HOROVOD_THREAD_AFFINITY=c0,c1,...,c(N-1)
55+
56+
where c0,...,c(N-1) are core IDs to attach background thread to.
57+
58+
Set the number of oneCCL workers:
59+
60+
.. code-block:: bash
61+
62+
export CCL_WORKER_COUNT=X
63+
64+
where X is the number of threads you'd like to dedicate for driving communication. This means that for every rank there are X oneCCL
65+
workers available.
66+
67+
Set oneCCL workers affinity:
68+
69+
.. code-block:: bash
70+
71+
export CCL_WORKER_AFFINITY=c0,c1,..,c(X-1)
72+
73+
where c0,c1,..,c(X-1) are core IDs dedicated to oneCCL workers (uses X 'last' cores by default). This variable sets affinity for all
74+
oneCCL workers (``CCL_WORKER_COUNT`` * Number of ranks per node) that are available for all the ranks running on one node.
75+
76+
For instance, we have 2 nodes and each node has 2 sockets: socket0 CPUs:0-17,36-53 and socket1 CPUs:18-35,54-71. We decide to pin CCL
77+
workers to the last two cores of each socket while pinning Horovod background thread to one of the hyper-thread cores of CCL workers's
78+
cores. All these cores are excluded from Intel MPI pinning using ``I_MPI_PIN_PROCESSOR_EXCLUDE_LIST`` to dedicate them to CCL and Horovod
79+
tasks only, thus avoiding the conflict with framework's computational threads.
80+
81+
.. code-block:: bash
82+
83+
export I_MPI_PIN_PROCESSOR_EXCLUDE_LIST="16,17,34,35,52,53,70,71"
84+
export I_MPI_PIN_DOMAIN=socket
85+
export HOROVOD_THREAD_AFFINITY="53,71"
86+
export CCL_WORKER_COUNT=2
87+
export CCL_WORKER_AFFINITY="16,17,34,35"
88+
mpirun -n 4 -ppn 2 -hostfile hosts python ./run_example.py
89+
90+
.. inclusion-marker-end-do-not-remove

docs/oneccl_include.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.. include:: ./oneccl.rst
2+
:start-after: inclusion-marker-start-do-not-remove
3+
:end-before: inclusion-marker-end-do-not-remove

horovod/common/common.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -156,12 +156,12 @@ void parse_and_set_affinity(const char* affinity, int local_size, int local_rank
156156
char* affinity_copy = (char*)calloc(affinity_len + 1, sizeof(char));
157157
memcpy(affinity_copy, affinity, affinity_len);
158158
char* tmp = affinity_copy;
159-
char *endptr;
159+
char* endptr;
160160

161161
std::vector<int> core_ids(local_size);
162162
int count = 0;
163163

164-
while (*tmp != 0 && count < local_size) {
164+
while (tmp && count < local_size) {
165165
auto core_id_str = strsep(&tmp, ",");
166166
errno = 0;
167167
auto core_id = std::strtol(core_id_str, &endptr, 10);

0 commit comments

Comments
 (0)