Xyntra

Project Description

Xyntra is an automatic kernel-fusion compiler pass written in safe Rust.
It ingests ONNX / TorchScript graphs, pattern-matches common op-chains, and emits one fused GPU kernel through wgpu (cross-platform WGSL) or optional CUDA PTX.
The project explores graph rewriting, GPU occupancy modelling, and autotuned code-generation while keeping the entire pipeline 100 % unsafe-free.

Technologies & Dependencies

🦀 Core Technologies

Rust 2024 Edition – type-safe IR with comprehensive error handling
Safe Rust only – zero unsafe blocks in current implementation

📦 Current Dependencies

Standard library only – no external crates yet
Planned: egg (e-graphs), wgpu (GPU), clap (CLI)

Features & Roadmap

🔧 Core Infrastructure & Foundations

Type-safe primitives – NodeId, TensorShape, OpKind, Graph in progress
Error-enum with recoverable vs fatal classes in progress
Config loader – CLI flags & fusion.toml in progress
Modular crate layout – xyntra-core, xyntra-cli, xyntra-ir in progress

📡 Graph Ingestion & Export

ONNX parser – load .onnx into internal IR
TorchScript loader – parse .pt via tch-rs
IR serialisation – export to DOT / JSON for debugging
Fused-graph writer – emit reduced node graph snapshots

🧩 IR, Pattern Matching & Scheduling

egg-based e-graph integration – rewrite rules & saturation loop
Declarative fusion DSL – macro for matmul -> gelu -> dropout
Scheduling heuristics – cost model for fusion candidates
Fusion legality checker – shape, dtype, broadcast guards

⚡ Kernel Code Generation

WGSL backend – emit compute shaders for wgpu
CUDA PTX backend – optional NV path behind --backend ptx
Shared-memory tiling – configurable tile/block sizes
Vectorisation pass – vec4<f32> style loads/stores
Mixed-precision support (FP16/BF16) (stretch)

🚀 Autotuning & Performance

Parameter search harness – Bayesian optimiser over tile sizes
GPU occupancy analysis – register & SM utilisation metrics
Latency histogram – HDR log, p50/p95/p99 prints
Flamegraphs – CPU-side hotspots with cargo flamegraph
Roofline model script – FLOP/s vs bandwidth chart (stretch)

🔒 Correctness & Validation

Golden unit tests – compare fused vs unfused outputs in progress
Gradient checks – optional back-prop correctness suite
Edge-case library – broadcast, dynamic shapes, odd strides in progress
Numerical tolerance config – FP32 / FP16 epsilon thresholds

📊 Observability & Diagnostics

Structured tracing spans – tracing crate with GPU timestamps
--trace CLI flag – dump kernel timeline to JSON
Occupancy dashboard – live CLI table of SM usage (stretch)

🛠️ Bench & Test Harness

Micro-bench harness – single op-chain latency
Model-zoo benchmarks – BERT, ResNet, ViT comparison
Determinism suite – random seeds & output hashes in progress
CI matrix – MSRV check, clippy, fmt, criterion

🧰 Developer eXperience (DX)

cargo xtask or justfile – shortcuts (just fuse resnet.onnx)
Pre-commit hook – cargo fmt && cargo clippy --fix
make dev alias – spin-up CI-like environment locally

📦 Packaging & Release

GitHub Release action – build macOS, Linux, Windows binaries
Publish xyntra-core & xyntra-cli to crates.io
SemVer policy & CHANGELOG.md generation
Signed tags + GPG release checklist

🔗 Framework Plugins

PyTorch 2 torch.xyntra.compile() drop-in backend
ONNX Runtime execution-provider stub (libxyntra_ep.so)

📚 Docs & Examples

Quick-start guide – clone → build → fuse tiny MLP
Architecture diagram – ASCII / Mermaid / SVG
Fusion logs demo – before/after latency screenshot

🗃️ Model-Zoo Benchmarks

Scripted download + benchmark of BERT-Base, ResNet-50, ViT-Tiny, GPT-2
Auto-generated result table in README via CI

🌐 Stretch Goals & Research Paths

Horizontal fusion across attention blocks
Dynamic-shape specialisation & cache
Triton IR interoperability adapter
WebAssembly demo – run fused WGSL in browser
Meta-scheduler – ML-predicted tile sizes

🤝 DevOps & Community

GitHub Actions pipeline – lint, test, benches, release
Dual MIT / Apache-2 licence – broad adoption
CONTRIBUTING.md, CODE_OF_CONDUCT.md, issue/PR templates
GitHub Discussions – Q&A, roadmap voting
Annotated blog series – graph rewriting, GPU tuning deep-dives

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Xyntra

Project Description

Technologies & Dependencies

🦀 Core Technologies

📦 Current Dependencies

Features & Roadmap

🔧 Core Infrastructure & Foundations

📡 Graph Ingestion & Export

🧩 IR, Pattern Matching & Scheduling

⚡ Kernel Code Generation

🚀 Autotuning & Performance

🔒 Correctness & Validation

📊 Observability & Diagnostics

🛠️ Bench & Test Harness

🧰 Developer eXperience (DX)

📦 Packaging & Release

🔗 Framework Plugins

📚 Docs & Examples

🗃️ Model-Zoo Benchmarks

🌐 Stretch Goals & Research Paths

🤝 DevOps & Community

About

Uh oh!

Releases

Packages

Languages

Capataina/Xyntra

Folders and files

Latest commit

History

Repository files navigation

Xyntra

Project Description

Technologies & Dependencies

🦀 Core Technologies

📦 Current Dependencies

Features & Roadmap

🔧 Core Infrastructure & Foundations

📡 Graph Ingestion & Export

🧩 IR, Pattern Matching & Scheduling

⚡ Kernel Code Generation

🚀 Autotuning & Performance

🔒 Correctness & Validation

📊 Observability & Diagnostics

🛠️ Bench & Test Harness

🧰 Developer eXperience (DX)

📦 Packaging & Release

🔗 Framework Plugins

📚 Docs & Examples

🗃️ Model-Zoo Benchmarks

🌐 Stretch Goals & Research Paths

🤝 DevOps & Community

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages