You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-10Lines changed: 17 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -53,12 +53,8 @@ Whether your application is implemented via RAG or fine-tuning, LangChain or Lla
53
53
> 🥳 You can now share DeepEval's test results on the cloud directly on [Confident AI](https://confident-ai.com?utm_source=GitHub)'s infrastructure
54
54
55
55
- Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by **ANY** LLM of your choice, statistical methods, or NLP models that runs **locally on your machine**:
56
-
-**General metrics:**
57
56
- G-Eval
58
-
- Hallucination
59
-
- Summarization
60
-
- Bias
61
-
- Toxicity
57
+
- DAG (deep acyclic graph)
62
58
-**RAG metrics:**
63
59
- Answer Relevancy
64
60
- Faithfulness
@@ -69,6 +65,11 @@ Whether your application is implemented via RAG or fine-tuning, LangChain or Lla
69
65
-**Agentic metrics:**
70
66
- Task Completion
71
67
- Tool Correctness
68
+
-**Others:**
69
+
- Hallucination
70
+
- Summarization
71
+
- Bias
72
+
- Toxicity
72
73
-**Conversational metrics:**
73
74
- Knowledge Retention
74
75
- Conversation Completeness
@@ -150,7 +151,12 @@ from deepeval.metrics import AnswerRelevancyMetric
# Replace this with the actual output from your LLM application
@@ -171,11 +177,12 @@ And finally, run `test_chatbot.py` in the CLI:
171
177
deepeval test run test_chatbot.py
172
178
```
173
179
174
-
**Your test should have passed ✅** Let's breakdown what happened.
180
+
**Congratulations! Your test case should have passed ✅** Let's breakdown what happened.
175
181
176
-
- The variable `input` mimics user input, and `actual_output` is a placeholder for your chatbot's intended output based on this query.
177
-
- The variable `retrieval_context` contains the relevant information from your knowledge base, and `AnswerRelevancyMetric(threshold=0.5)` is an out-of-the-box metric provided by DeepEval. It helps evaluate the relevancy of your LLM output based on the provided context.
178
-
- The metric score ranges from 0 - 1. The `threshold=0.5` threshold ultimately determines whether your test has passed or not.
182
+
- The variable `input` mimics a user input, and `actual_output` is a placeholder for what your application's supposed to output based on this input.
183
+
- The variable `expected_output` represents the ideal answer for a given `input`, and [`GEval`](/docs/metrics-llm-evals) is a research-backed metric provided by `deepeval` for you to evaluate your LLM output's on any custom custom with human-like accuracy.
184
+
- In this example, the metric `criteria` is correctness of the `actual_output` based on the provided `expected_output`.
185
+
- All metric scores range from 0 - 1, which the `threshold=0.5` threshold ultimately determines if your test have passed or not.
179
186
180
187
[Read our documentation](https://docs.confident-ai.com/docs/getting-started?utm_source=GitHub) for more information on how to use additional metrics, create your own custom metrics, and tutorials on how to integrate with other tools like LangChain and LlamaIndex.
0 commit comments