Add https://github.com/laude-institute/terminal-bench to Training/Evaluation category (#266)

InftyAI-Agent · kerthcet · web-flow · commit 496e304e3bd6 · 2025-08-05T14:38:05.000+01:00
Co-authored-by: kerthcet &lt;kerthcet@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -264,6 +264,7 @@
 * **[MLE-bench](https://github.com/openai/mle-bench/)**: MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering ![Stars](https://img.shields.io/github/stars/openai/mle-bench.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/openai/mle-bench?color=green) ![LastCommit](https://img.shields.io/github/last-commit/openai/mle-bench?color=green)
 * **[OpenCompass](https://github.com/open-compass/opencompass)**: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets. ![Stars](https://img.shields.io/github/stars/open-compass/opencompass.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/open-compass/opencompass?color=green) ![LastCommit](https://img.shields.io/github/last-commit/open-compass/opencompass?color=green)
 * **[opik](https://github.com/comet-ml/opik)**: Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. ![Stars](https://img.shields.io/github/stars/comet-ml/opik.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/comet-ml/opik?color=green) ![LastCommit](https://img.shields.io/github/last-commit/comet-ml/opik?color=green)
+* **[terminal-bench](https://github.com/laude-institute/terminal-bench)**: A benchmark for LLMs on complicated tasks in the terminal ![Stars](https://img.shields.io/github/stars/laude-institute/terminal-bench.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/laude-institute/terminal-bench?color=green) ![LastCommit](https://img.shields.io/github/last-commit/laude-institute/terminal-bench?color=green)
 
 ### Workflow
 
diff --git a/website/data.yml b/website/data.yml
@@ -929,6 +929,11 @@ categories:
       homepage_url: https://www.comet.com/docs/opik/
       logo: opik.png
       repo_url: https://github.com/comet-ml/opik
+    - name: terminal-bench
+      description: A benchmark for LLMs on complicated tasks in the terminal
+      homepage_url: https://www.tbench.ai/
+      logo: terminal-bench
+      repo_url: https://github.com/laude-institute/terminal-bench
   - name: Workflow
     items:
     - name: BentoML
diff --git a/website/logos/terminal-bench b/website/logos/terminal-bench