Skip to content

Commit cc75453

Browse files
authored
Update README.md
1 parent cf66e58 commit cc75453

File tree

1 file changed

+17
-10
lines changed

1 file changed

+17
-10
lines changed

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,8 @@ Whether your application is implemented via RAG or fine-tuning, LangChain or Lla
5353
> 🥳 You can now share DeepEval's test results on the cloud directly on [Confident AI](https://confident-ai.com?utm_source=GitHub)'s infrastructure
5454
5555
- Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that runs **locally on your machine**:
56-
- **General metrics:**
5756
- G-Eval
58-
- Hallucination
59-
- Summarization
60-
- Bias
61-
- Toxicity
57+
- DAG (deep acyclic graph)
6258
- **RAG metrics:**
6359
- Answer Relevancy
6460
- Faithfulness
@@ -69,6 +65,11 @@ Whether your application is implemented via RAG or fine-tuning, LangChain or Lla
6965
- **Agentic metrics:**
7066
- Task Completion
7167
- Tool Correctness
68+
- **Others:**
69+
- Hallucination
70+
- Summarization
71+
- Bias
72+
- Toxicity
7273
- **Conversational metrics:**
7374
- Knowledge Retention
7475
- Conversation Completeness
@@ -150,7 +151,12 @@ from deepeval.metrics import AnswerRelevancyMetric
150151
from deepeval.test_case import LLMTestCase
151152

152153
def test_case():
153-
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
154+
correctness_metric = GEval(
155+
name="Correctness",
156+
criteria="Determine if the 'actual output' is correct based on the 'expected output'.",
157+
evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
158+
threshold=0.5
159+
)
154160
test_case = LLMTestCase(
155161
input="What if these shoes don't fit?",
156162
# Replace this with the actual output from your LLM application
@@ -171,11 +177,12 @@ And finally, run `test_chatbot.py` in the CLI:
171177
deepeval test run test_chatbot.py
172178
```
173179

174-
**Your test should have passed ✅** Let's breakdown what happened.
180+
**Congratulations! Your test case should have passed ✅** Let's breakdown what happened.
175181

176-
- The variable `input` mimics user input, and `actual_output` is a placeholder for your chatbot's intended output based on this query.
177-
- The variable `retrieval_context` contains the relevant information from your knowledge base, and `AnswerRelevancyMetric(threshold=0.5)` is an out-of-the-box metric provided by DeepEval. It helps evaluate the relevancy of your LLM output based on the provided context.
178-
- The metric score ranges from 0 - 1. The `threshold=0.5` threshold ultimately determines whether your test has passed or not.
182+
- The variable `input` mimics a user input, and `actual_output` is a placeholder for what your application's supposed to output based on this input.
183+
- The variable `expected_output` represents the ideal answer for a given `input`, and [`GEval`](/docs/metrics-llm-evals) is a research-backed metric provided by `deepeval` for you to evaluate your LLM output's on any custom custom with human-like accuracy.
184+
- In this example, the metric `criteria` is correctness of the `actual_output` based on the provided `expected_output`.
185+
- All metric scores range from 0 - 1, which the `threshold=0.5` threshold ultimately determines if your test have passed or not.
179186

180187
[Read our documentation](https://docs.confident-ai.com/docs/getting-started?utm_source=GitHub) for more information on how to use additional metrics, create your own custom metrics, and tutorials on how to integrate with other tools like LangChain and LlamaIndex.
181188

0 commit comments

Comments
 (0)