Skip to content

Commit 2c0d946

Browse files
authored
[doc] Add sphinx scripts for converting docs to htmls (alibaba#34)
1 parent 149cb8d commit 2c0d946

25 files changed

+489
-923
lines changed

README.md

Lines changed: 188 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,189 @@
1-
# The open sourced BladeDISC is coming soon.
1+
# BladeDISC Introduction
22

3-
# 敬请期待!
3+
* [Overview](#overview)
4+
+ [Features and Roadmap](#features-and-roadmap)
5+
- [Frontend Framework Support Matrix](#frontend-framework-support-matrix)
6+
- [Backend Support Matrix](#backend-support-matrix)
7+
- [Deployment Solutions](#deployment-solutions)
8+
+ [Numbers of Typical Workloads](#numbers-of-typical-workloads)
9+
- [Advantage in Dynamic Shape Workloads](#advantage-in-dynamic-shape-workloads)
10+
* [API QuickView](#api-quickview)
11+
+ [For TensorFlow Users](#for-tensorflow-users)
12+
+ [For PyTorch Users](#for-pytorch-users)
13+
* [Setup and Examples](#setup-and-examples)
14+
* [Publications](#publications)
15+
* [Tutorials and Documents for Developers](#tutorials-and-documents-for-developers)
16+
* [How to Contribute](#how-to-contribute)
17+
* [FAQ](#faq)
18+
+ [Roadmap with mlir-hlo Project](#roadmap-with-mlir-hlo-project)
19+
* [Contact Us](#contact-us)
20+
21+
## Overview
22+
23+
BladeDISC is an end-to-end **DynamIc Shape Compiler** project for machine
24+
learning workloads, which is one of the key components of Alibaba's
25+
[PAI-Blade](https://www.aliyun.com/activity/bigdata/blade). BladeDISC provides
26+
general, transparent, and ease of use performance optimization for
27+
TensorFlow/PyTorch workloads on GPGPU and CPU backends. The architecture
28+
natively supports dynamic shape workloads, with many considerations in the
29+
performance of both static and dynamic shape scenarios. It also supports
30+
multiple and flexible deployment solutions, including both Plugin Mode inside
31+
TensorFlow/Pytorch runtime, and Standalone Mode for AOT standalone execution.
32+
The project is based on [MLIR](https://mlir.llvm.org/) and highly related with
33+
[mlir-hlo](https://github.com/tensorflow/mlir-hlo) project.
34+
35+
Refer to [our website](https://alibaba.github.io/BladeDISC/) for more
36+
information, including the setup tutorial, developer guide, demo examples and
37+
documents for developers.
38+
39+
### Features and Roadmap
40+
41+
#### Frontend Framework Support Matrix
42+
43+
| | TensorFlow [1] | Pytorch [2] |
44+
|---------- | -------------- | ------------ |
45+
| Inference | Yes | Yes |
46+
| Training | Yes [3] | Ongoing |
47+
48+
[1] TensorFlow 1.12, 1.15, 2.4 & 2.5 are supported and fully verified. For other
49+
versions some slight works on adaptation might be needed.
50+
51+
[2] 1.6.0 <= PyTorch version < 1.9.0 has been fully verified.
52+
53+
[3] Although supported, there's much room for improvement on Op coverage for
54+
training workloads.
55+
56+
#### Backend Support Matrix
57+
58+
| | Memory Intensive Part | Compute Intensive Part | End-to-End Usability |
59+
|----------- | ------------- | ---------------------- | -------------------- |
60+
| Nvidia GPU | Yes | Yes | Yes |
61+
| AMD GPU | Ongoing | Ongoing | No |
62+
| Hygon DCU | Yes | Yes | Yes |
63+
| X86 | Yes | Not open-sourced yet [1] | No |
64+
65+
[1] The compute-intensive part of the X86 backend is already supported on the
66+
internal version. The code decoupling is ongoing and will be open-sourced soon,
67+
same for the end-to-end usability.
68+
69+
#### Deployment Solutions
70+
71+
* Plugin Mode - BladeDISC works as a plugin of TensorFlow or PyTorch. Only the
72+
supported Ops are clustered and compiled, and the unsupported ones will be
73+
executed by the original TensorFlow or PyTorch runtime. We recommend this mode
74+
to most of the users for its transparency and ease of use.
75+
76+
* Standalone Mode - In Standalone mode, the input workload will be compiled into
77+
a binary that can be executed by it self, aka, does not rely on a TensorFlow
78+
or PyTorch runtime. In this mode all ops must be supported.
79+
80+
### Numbers of Typical Workloads
81+
82+
By evaluating BladeDISC using a set of typical machine learning workloads for
83+
production purpose, DISC shows up to 3x speedup compared with
84+
TensorFlow/PyTorch.
85+
86+
![Numbers](./docs/pics/numbers.png)
87+
88+
#### Advantage in Dynamic Shape Workloads
89+
90+
Specifically, for the BERT large inference on T4 we provide in the
91+
[examples](./docs/tutorials/tensorflow_inference_and_training.md), static compiler
92+
optimization (XLA) shows severe performance degradation due to its compilation
93+
overhead, while DISC shows a 1.75x speedup.
94+
95+
| TensorFlow | XLA | DISC |
96+
|-------------|-----------|------------|
97+
| 1.78 s | 41.69s | 1.02s |
98+
| 1X | | 1.75X |
99+
100+
## API QuickView
101+
102+
### For TensorFlow Users
103+
104+
Only two lines of code are needed on native Tensorflow program as the following:
105+
106+
``` python
107+
import numpy as np
108+
import tensorflow as tf
109+
110+
## enable BladeDISC on TensorFlow program
111+
import tensorflow_blade_disc as disc
112+
disc.enable()
113+
114+
## construct TensorFlow Graph and run it
115+
g = tf.Graph()
116+
with g.as_default():
117+
...
118+
with tf.session as sess:
119+
sess.run(...)
120+
```
121+
122+
For more information, please refer to [QuickStart for TensorFlow
123+
Users](./docs/quickstart.md#quickstart-for-tensorflow-users)
124+
125+
### For PyTorch Users
126+
127+
PyTorch users only need the following few lines of code to enable
128+
BladeDISC:
129+
130+
``` python
131+
import torch_blade
132+
# construct PyTorch Module
133+
class MyModule(nn.Module):
134+
...
135+
136+
module = MyModule()
137+
138+
with torch.no_grad():
139+
# blade_module is the optimized module by BladeDISC
140+
blade_module = torch_blade.optimize(module, allow_tracing=True, model_inputs=(x, y))
141+
142+
# run the optimized module
143+
blade_module(x, y)
144+
```
145+
146+
`torch_blade.optimize` accepts an `nn.Module` object and outputs the
147+
optimized module. For more information, please refer to [Quickstart
148+
for PyTorch Users](./docs/quickstart.md#quickstart-for-pytorch-users).
149+
150+
## Setup and Examples
151+
152+
* [How to Setup and Build from Source](./docs/build_from_source.md)
153+
* [Use Case of TensorFlow Inference and Training](./docs/tutorials/tensorflow_inference_and_training.md)
154+
* [Use Case of PyTorch Inference](./docs/tutorials/torch_bert_inference.md)
155+
156+
## Publications
157+
158+
* [DISC: A Dynamic Shape Compiler for Machine Learning
159+
Workloads](https://arxiv.org/pdf/2103.05288.pdf)
160+
161+
## Tutorials and Documents for Developers
162+
163+
* [Tutorial: A Walkthough of the BladeDISC Pass Pipeline](./docs/developers/pass_pipeline.md)
164+
* [Introduction on Runtime Abstraction Layer](./docs/developers/runtime_abstraction_layer.md)
165+
* [TorchBlade Overview](./docs/developers/bladedisc_torch_overview.md)
166+
* [Tutorial: How to Add a New Torch Operator Converter](./docs/developers/torch_add_a_new_converter.md)
167+
168+
## How to Contribute
169+
170+
* [Contribute to BladeDISC](./docs/contribution.md)
171+
172+
## FAQ
173+
174+
### Roadmap with mlir-hlo Project
175+
176+
BladeDISC is in a close relationship with
177+
[mlir-hlo](https://github.com/tensorflow/mlir-hlo) project. Part of the building
178+
blocks, including the MHLO Op definitions, TF to MHLO conversions, and some
179+
general purpose passes have been upstreamed to mlir-hlo repository. We'll
180+
continue to work in a close cooperative relationship with mlir-hlo project in
181+
the longer term.
182+
183+
## Contact Us
184+
185+
* Mailgroup: [email protected]
186+
187+
* DingTalk group for support and discussion:
188+
189+
![DingTalk](./docs/pics/dingtalk_support.png)

README_.md

Lines changed: 0 additions & 177 deletions
This file was deleted.

0 commit comments

Comments
 (0)