Skip to content

Commit 496e304

Browse files
Add https://github.com/laude-institute/terminal-bench to Training/Evaluation category (#266)
Co-authored-by: kerthcet <[email protected]>
1 parent f3979ea commit 496e304

File tree

3 files changed

+6
-0
lines changed

3 files changed

+6
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,7 @@
264264
* **[MLE-bench](https://github.com/openai/mle-bench/)**: MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering ![Stars](https://img.shields.io/github/stars/openai/mle-bench.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/openai/mle-bench?color=green) ![LastCommit](https://img.shields.io/github/last-commit/openai/mle-bench?color=green)
265265
* **[OpenCompass](https://github.com/open-compass/opencompass)**: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets. ![Stars](https://img.shields.io/github/stars/open-compass/opencompass.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/open-compass/opencompass?color=green) ![LastCommit](https://img.shields.io/github/last-commit/open-compass/opencompass?color=green)
266266
* **[opik](https://github.com/comet-ml/opik)**: Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. ![Stars](https://img.shields.io/github/stars/comet-ml/opik.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/comet-ml/opik?color=green) ![LastCommit](https://img.shields.io/github/last-commit/comet-ml/opik?color=green)
267+
* **[terminal-bench](https://github.com/laude-institute/terminal-bench)**: A benchmark for LLMs on complicated tasks in the terminal ![Stars](https://img.shields.io/github/stars/laude-institute/terminal-bench.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/laude-institute/terminal-bench?color=green) ![LastCommit](https://img.shields.io/github/last-commit/laude-institute/terminal-bench?color=green)
267268

268269
### Workflow
269270

website/data.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -929,6 +929,11 @@ categories:
929929
homepage_url: https://www.comet.com/docs/opik/
930930
logo: opik.png
931931
repo_url: https://github.com/comet-ml/opik
932+
- name: terminal-bench
933+
description: A benchmark for LLMs on complicated tasks in the terminal
934+
homepage_url: https://www.tbench.ai/
935+
logo: terminal-bench
936+
repo_url: https://github.com/laude-institute/terminal-bench
932937
- name: Workflow
933938
items:
934939
- name: BentoML

website/logos/terminal-bench

4.63 KB
Binary file not shown.

0 commit comments

Comments
 (0)