v3.0 Pre-Release
🚨 Breaking Changes
⚠️ This release introduces breaking changes in preparation for DeepEval v3.0.
Please review carefully and adjust your code as needed.
The evaluate()
function now has "configs"
- Previously the
evaluate()
function had 13+ arguments to control display, async behaviors, caching, etc. and it was growing out of control. We've now abstracted it into "configs" instead:
from deepeval.evaluate.configs import AsyncConfig
from deepeval import evaluate
evaluate(..., async_config=AsyncConfig(max_concurrent=20))
Full docs here: https://www.deepeval.com/docs/evaluation-running-llm-evals#configs-for-evaluate
Red Teaming Officially Migrated to DeepTeam
This shouldn't be a surprised but, DeepTeam now takes care of everything red teaming related, for the foreseeable future. Docs here: https://trydeepteam.com
🥳 New Feature
Dynamic Evaluations for Nested Components
Nested components are a mess to evaluate. In this version in preparation for v3.0, we introduced dynamic evals, where you can apply a different set of metrics for different components in your LLM application:
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.tracing import observe, update_current_span_test_case
@observe(metrics=[AnswerRelevancyMetric()])
def complete(query: str):
response = openai.ChatCompletion.create(model="gpt-4o", messages=[{"role": "user", "content": query}]).choices[0].message["content"]
update_current_span_test_case(
test_case=LLMTestCase(input=query, output=response)
)
return response
Full docs here: https://www.deepeval.com/docs/evaluation-running-llm-evals#setup-tracing-highly-recommended