new release

penguine-ip · penguine-ip · commit 65d99303d275 · 2025-06-13T03:30:20.000+08:00
diff --git a/CITATION.cff b/CITATION.cff
@@ -6,7 +6,7 @@ authors:
   - family-names: Vongthongsri
     given-names: Kritin
 title: deepeval
-version: 3.1.0
+version: 3.1.1
 date-released: "2025-06-08"
 url: https://confident-ai.com
 repository-code: https://github.com/confident-ai/deepeval
diff --git a/deepeval/_version.py b/deepeval/_version.py
@@ -1 +1 @@
-__version__: str = "3.1.0"
+__version__: str = "3.1.1"
diff --git a/docs/docs/multimodal-metrics-g-eval.mdx b/docs/docs/multimodal-metrics-g-eval.mdx
@@ -35,7 +35,7 @@ To create a custom metric that uses MLLMs for evaluation, simply instantiate an
 
 ```python
 from deepeval.metrics import MultimodalGEval
-from deepeval.test_case import MLLMTestCaseParams, MLLMTestCase
+from deepeval.test_case import MLLMTestCaseParams, MLLMTestCase, MLLMImage
 
 m_test_case = MLLMTestCase(
     input=["Show me how to fold an airplane"],
@@ -52,7 +52,8 @@ text_image_coherence = MultimodalGEval(
     evaluation_params=[MLLMTestCaseParams.ACTUAL_OUTPUT],
 )
 
-evaluate(test_cases=[m_test_case], metrics=[text_image_coherence])
+text_image_coherence.measure(m_test_case)
+print(text_image_coherence.score, text_image_coherence.reason)
 ```
 
 There are **THREE** mandatory and **SEVEN** optional parameters required when instantiating an `MultimodalGEval` class:
@@ -116,21 +117,6 @@ Note that `score_range` ranges from **0 - 10, inclusive** and different `Rubric`
 This is an optional improvement done by `deepeval` in addition to the original implementation in the `GEval` paper.
 :::
 
-### As a standalone
-
-You can also run `GEval` on a single test case as a standalone, one-off execution.
-
-```python
-...
-
-text_image_coherence.measure(test_case)
-print(text_image_coherence.score, text_image_coherence.reason)
-```
-
-:::caution
-This is great for debugging or if you wish to build your own evaluation pipeline, but you will **NOT** get the benefits (testing reports, Confident AI platform) and all the optimizations (speed, caching, computation) the `evaluate()` function or `deepeval test run` offers.
-:::
-
 ## How Is It Calculated?
 
 The `MultimodalGEval` is an adapted version of [`GEval`](/docs/metrics-llm-evals), so alike `GEval`, the `MultimodalGEval` metric is a two-step algorithm that first generates a series of `evaluation_steps` using chain of thoughts (CoTs) based on the given `criteria`, before using the generated `evaluation_steps` to determine the final score using the `evaluation_params` provided through the `MLLMTestCase`.
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "deepeval"
-version = "3.1.0"
+version = "3.1.1"
 description = "The LLM Evaluation Framework"
 authors = ["Jeffrey Ip <jeffreyip@confident-ai.com>"]
 license = "Apache-2.0"

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__: str = "3.1.0"`
	`1`	`+__version__: str = "3.1.1"`