You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two mandatory and two optional parameters required when instantiating an `LLMEvalMetric` class:
55
+
There are three mandatory and one optional parameters required when instantiating an `LLMEvalMetric` class:
54
56
55
-
-`name`
56
-
-`criteria`
57
+
-`name`: name of metric
58
+
-`criteria`: a description outlining the specific evaluation aspects for each test case.
59
+
-`evaluation_params`: a list of type `LLMTestCaseParams`. Include only the parameters that are relevant for evaluation.
57
60
-[Optional]`minimum_score`
58
-
-[Optional]`completion_function`
59
61
60
-
All instances of `LLMEvalMetric` returns a score ranging from 0-1. A metric is only successful if the evaluation score is equal to or greater than `minimum_score`.
62
+
All instances of `LLMEvalMetric` returns a score ranging from 0 - 1. A metric is only successful if the evaluation score is equal to or greater than `minimum_score`.
61
63
62
-
:::info
63
-
`LLMEvalMetric` may or may not not require `context` or `expected_output` supplied to `LLMTestCase`, but we recommend providing both arguments where possible for the most accurate evaluation.
64
+
:::danger
65
+
For accurate and valid results, only the parameters that are mentioned in `criteria` should be included as a member of `evaluation_params`.
64
66
:::
65
67
66
-
You can also supply a custom `completion_function` if for example you want to utilize another LLM provider to evaluate your `LLMTestCase`. By default, `deepeval` uses the `openai` chat completion function.
67
-
68
-
```python
69
-
defmake_chat_completion_request(prompt: str):
70
-
response = openai.ChatCompletion.create(
71
-
model="gpt-3.5-turbo",
72
-
messages=[
73
-
{"role": "system", "content": "You are a helpful assistant."},
74
-
{"role": "user", "content": prompt},
75
-
],
76
-
)
77
-
return response.choices[0].message.content
78
-
```
68
+
By defauly, `LLMEvalMetric` is evaluated using `GPT-4` from OpenAI.
In this scenario, `test_everything` only passes if all metrics are passing. Run `deepeval test run` again to see the results:
@@ -267,20 +278,16 @@ deepeval test run test_bulk.py
267
278
268
279
If you have reached this point, you've likely ran `deepeval test run` multiple times. To keep track of all future evaluation results created by `deepeval`, login to **[Confident AI](https://app.confident-ai.com/auth/signup)** by running the following command:
269
280
270
-
```
271
-
281
+
```console
272
282
deepeval login
273
-
274
283
```
275
284
276
285
**Confident AI** is the platform powering `deepeval`, and offer deep insights to help you quickly figure out how to best implement your LLM application. Follow the instructions displayed on the CLI to create an account, get your Confident API key, and paste it in the CLI.
277
286
278
287
Once you've pasted your Confident API key in the CLI, run:
279
288
280
-
```
281
-
282
-
deepeval test run test_examply.py
283
-
289
+
```console
290
+
deepeval test run test_example.py
284
291
```
285
292
286
293
### View Test Run
@@ -295,6 +302,33 @@ You can also view individual test cases for enhanced debugging:
0 commit comments