|
1 |
| -# The open sourced BladeDISC is coming soon. |
| 1 | +# BladeDISC Introduction |
2 | 2 |
|
3 |
| -# 敬请期待! |
| 3 | +* [Overview](#overview) |
| 4 | + + [Features and Roadmap](#features-and-roadmap) |
| 5 | + - [Frontend Framework Support Matrix](#frontend-framework-support-matrix) |
| 6 | + - [Backend Support Matrix](#backend-support-matrix) |
| 7 | + - [Deployment Solutions](#deployment-solutions) |
| 8 | + + [Numbers of Typical Workloads](#numbers-of-typical-workloads) |
| 9 | + - [Advantage in Dynamic Shape Workloads](#advantage-in-dynamic-shape-workloads) |
| 10 | +* [API QuickView](#api-quickview) |
| 11 | + + [For TensorFlow Users](#for-tensorflow-users) |
| 12 | + + [For PyTorch Users](#for-pytorch-users) |
| 13 | +* [Setup and Examples](#setup-and-examples) |
| 14 | +* [Publications](#publications) |
| 15 | +* [Tutorials and Documents for Developers](#tutorials-and-documents-for-developers) |
| 16 | +* [How to Contribute](#how-to-contribute) |
| 17 | +* [FAQ](#faq) |
| 18 | + + [Roadmap with mlir-hlo Project](#roadmap-with-mlir-hlo-project) |
| 19 | +* [Contact Us](#contact-us) |
| 20 | + |
| 21 | +## Overview |
| 22 | + |
| 23 | +BladeDISC is an end-to-end **DynamIc Shape Compiler** project for machine |
| 24 | +learning workloads, which is one of the key components of Alibaba's |
| 25 | +[PAI-Blade](https://www.aliyun.com/activity/bigdata/blade). BladeDISC provides |
| 26 | +general, transparent, and ease of use performance optimization for |
| 27 | +TensorFlow/PyTorch workloads on GPGPU and CPU backends. The architecture |
| 28 | +natively supports dynamic shape workloads, with many considerations in the |
| 29 | +performance of both static and dynamic shape scenarios. It also supports |
| 30 | +multiple and flexible deployment solutions, including both Plugin Mode inside |
| 31 | +TensorFlow/Pytorch runtime, and Standalone Mode for AOT standalone execution. |
| 32 | +The project is based on [MLIR](https://mlir.llvm.org/) and highly related with |
| 33 | +[mlir-hlo](https://github.com/tensorflow/mlir-hlo) project. |
| 34 | + |
| 35 | +Refer to [our website](https://alibaba.github.io/BladeDISC/) for more |
| 36 | +information, including the setup tutorial, developer guide, demo examples and |
| 37 | +documents for developers. |
| 38 | + |
| 39 | +### Features and Roadmap |
| 40 | + |
| 41 | +#### Frontend Framework Support Matrix |
| 42 | + |
| 43 | +| | TensorFlow [1] | Pytorch [2] | |
| 44 | +|---------- | -------------- | ------------ | |
| 45 | +| Inference | Yes | Yes | |
| 46 | +| Training | Yes [3] | Ongoing | |
| 47 | + |
| 48 | +[1] TensorFlow 1.12, 1.15, 2.4 & 2.5 are supported and fully verified. For other |
| 49 | +versions some slight works on adaptation might be needed. |
| 50 | + |
| 51 | +[2] 1.6.0 <= PyTorch version < 1.9.0 has been fully verified. |
| 52 | + |
| 53 | +[3] Although supported, there's much room for improvement on Op coverage for |
| 54 | +training workloads. |
| 55 | + |
| 56 | +#### Backend Support Matrix |
| 57 | + |
| 58 | +| | Memory Intensive Part | Compute Intensive Part | End-to-End Usability | |
| 59 | +|----------- | ------------- | ---------------------- | -------------------- | |
| 60 | +| Nvidia GPU | Yes | Yes | Yes | |
| 61 | +| AMD GPU | Ongoing | Ongoing | No | |
| 62 | +| Hygon DCU | Yes | Yes | Yes | |
| 63 | +| X86 | Yes | Not open-sourced yet [1] | No | |
| 64 | + |
| 65 | +[1] The compute-intensive part of the X86 backend is already supported on the |
| 66 | +internal version. The code decoupling is ongoing and will be open-sourced soon, |
| 67 | +same for the end-to-end usability. |
| 68 | + |
| 69 | +#### Deployment Solutions |
| 70 | + |
| 71 | +* Plugin Mode - BladeDISC works as a plugin of TensorFlow or PyTorch. Only the |
| 72 | + supported Ops are clustered and compiled, and the unsupported ones will be |
| 73 | + executed by the original TensorFlow or PyTorch runtime. We recommend this mode |
| 74 | + to most of the users for its transparency and ease of use. |
| 75 | + |
| 76 | +* Standalone Mode - In Standalone mode, the input workload will be compiled into |
| 77 | + a binary that can be executed by it self, aka, does not rely on a TensorFlow |
| 78 | + or PyTorch runtime. In this mode all ops must be supported. |
| 79 | + |
| 80 | +### Numbers of Typical Workloads |
| 81 | + |
| 82 | +By evaluating BladeDISC using a set of typical machine learning workloads for |
| 83 | +production purpose, DISC shows up to 3x speedup compared with |
| 84 | +TensorFlow/PyTorch. |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | +#### Advantage in Dynamic Shape Workloads |
| 89 | + |
| 90 | +Specifically, for the BERT large inference on T4 we provide in the |
| 91 | +[examples](./docs/tutorials/tensorflow_inference_and_training.md), static compiler |
| 92 | +optimization (XLA) shows severe performance degradation due to its compilation |
| 93 | +overhead, while DISC shows a 1.75x speedup. |
| 94 | + |
| 95 | +| TensorFlow | XLA | DISC | |
| 96 | +|-------------|-----------|------------| |
| 97 | +| 1.78 s | 41.69s | 1.02s | |
| 98 | +| 1X | | 1.75X | |
| 99 | + |
| 100 | +## API QuickView |
| 101 | + |
| 102 | +### For TensorFlow Users |
| 103 | + |
| 104 | +Only two lines of code are needed on native Tensorflow program as the following: |
| 105 | + |
| 106 | +``` python |
| 107 | +import numpy as np |
| 108 | +import tensorflow as tf |
| 109 | + |
| 110 | +## enable BladeDISC on TensorFlow program |
| 111 | +import tensorflow_blade_disc as disc |
| 112 | +disc.enable() |
| 113 | + |
| 114 | +## construct TensorFlow Graph and run it |
| 115 | +g = tf.Graph() |
| 116 | +with g.as_default(): |
| 117 | + ... |
| 118 | + with tf.session as sess: |
| 119 | + sess.run(...) |
| 120 | +``` |
| 121 | + |
| 122 | +For more information, please refer to [QuickStart for TensorFlow |
| 123 | +Users](./docs/quickstart.md#quickstart-for-tensorflow-users) |
| 124 | + |
| 125 | +### For PyTorch Users |
| 126 | + |
| 127 | +PyTorch users only need the following few lines of code to enable |
| 128 | +BladeDISC: |
| 129 | + |
| 130 | +``` python |
| 131 | +import torch_blade |
| 132 | +# construct PyTorch Module |
| 133 | +class MyModule(nn.Module): |
| 134 | + ... |
| 135 | + |
| 136 | +module = MyModule() |
| 137 | + |
| 138 | +with torch.no_grad(): |
| 139 | + # blade_module is the optimized module by BladeDISC |
| 140 | + blade_module = torch_blade.optimize(module, allow_tracing=True, model_inputs=(x, y)) |
| 141 | + |
| 142 | +# run the optimized module |
| 143 | +blade_module(x, y) |
| 144 | +``` |
| 145 | + |
| 146 | +`torch_blade.optimize` accepts an `nn.Module` object and outputs the |
| 147 | +optimized module. For more information, please refer to [Quickstart |
| 148 | +for PyTorch Users](./docs/quickstart.md#quickstart-for-pytorch-users). |
| 149 | + |
| 150 | +## Setup and Examples |
| 151 | + |
| 152 | +* [How to Setup and Build from Source](./docs/build_from_source.md) |
| 153 | +* [Use Case of TensorFlow Inference and Training](./docs/tutorials/tensorflow_inference_and_training.md) |
| 154 | +* [Use Case of PyTorch Inference](./docs/tutorials/torch_bert_inference.md) |
| 155 | + |
| 156 | +## Publications |
| 157 | + |
| 158 | +* [DISC: A Dynamic Shape Compiler for Machine Learning |
| 159 | + Workloads](https://arxiv.org/pdf/2103.05288.pdf) |
| 160 | + |
| 161 | +## Tutorials and Documents for Developers |
| 162 | + |
| 163 | +* [Tutorial: A Walkthough of the BladeDISC Pass Pipeline](./docs/developers/pass_pipeline.md) |
| 164 | +* [Introduction on Runtime Abstraction Layer](./docs/developers/runtime_abstraction_layer.md) |
| 165 | +* [TorchBlade Overview](./docs/developers/bladedisc_torch_overview.md) |
| 166 | +* [Tutorial: How to Add a New Torch Operator Converter](./docs/developers/torch_add_a_new_converter.md) |
| 167 | + |
| 168 | +## How to Contribute |
| 169 | + |
| 170 | +* [Contribute to BladeDISC](./docs/contribution.md) |
| 171 | + |
| 172 | +## FAQ |
| 173 | + |
| 174 | +### Roadmap with mlir-hlo Project |
| 175 | + |
| 176 | +BladeDISC is in a close relationship with |
| 177 | +[mlir-hlo](https://github.com/tensorflow/mlir-hlo) project. Part of the building |
| 178 | +blocks, including the MHLO Op definitions, TF to MHLO conversions, and some |
| 179 | +general purpose passes have been upstreamed to mlir-hlo repository. We'll |
| 180 | +continue to work in a close cooperative relationship with mlir-hlo project in |
| 181 | +the longer term. |
| 182 | + |
| 183 | +## Contact Us |
| 184 | + |
| 185 | + |
| 186 | + |
| 187 | +* DingTalk group for support and discussion: |
| 188 | + |
| 189 | + |
0 commit comments