Skip to content

Commit 22d4bc4

Browse files
committed
updated docs according to code
1 parent a0d1770 commit 22d4bc4

File tree

1 file changed

+22
-16
lines changed

1 file changed

+22
-16
lines changed

docs/docs/evaluation-datasets-synthetic-data.mdx

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ There are 4 approaches a `deepeval`'s `Synthesizer` can generate synthetic `Gold
4040

4141
1. Generating synthetic `Golden`s using **context extracted from documents.**
4242
2. Generating synthetic `Golden`s from a **list of provided context.**
43-
3. Generating synthetic `Golden`s from a **list of provided input.**
44-
4. Generating synthetic `Golden`s from a **scratch**
43+
3. Generating synthetic `Golden`s from a **list of provided prompts.**
44+
4. Generating synthetic `Golden`s from **scratch**
4545

4646
### 1. Generating From Documents
4747

@@ -70,7 +70,7 @@ There are one mandatory and seven optional parameters when using the `generate_g
7070
- [Optional] `chunk_overlap`: an int that determines the overlap size between consecutive text chunks during context extraction. Defaulted to 0.
7171
- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
7272
- [Optional] `enable_breadth_evolve`: a boolean which when set to `True`, introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
73-
- [Optional] `evolution_types`: a list of `EvolutionType`, specifying methods used during data evolution. Defaulted to all `EvolutionType`s.
73+
- [Optional] `evolution_types`: a list of `Evolution`, specifying methods used during data evolution. Defaulted to all `Evolution`s.
7474

7575
### 2. Generating From Provided Contexts
7676

@@ -100,26 +100,26 @@ There are one mandatory and five optional parameters when using the `generate_go
100100
- [Optional] `max_goldens_per_context`: the maximum number of golden data points to be generated from each context. Adjusting this parameter can influence the size of the resulting dataset. Defaulted to 2.
101101
- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
102102
- [Optional] `enable_breadth_evolve`: a boolean indicating whether to enable breadth evolution strategies during data generation. When set to True, it introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
103-
- [Optional] `evolution_types`: a list of `EvolutionType`, specifying methods used during data evolution. Defaulted to all `EvolutionType`s.
103+
- [Optional] `evolution_types`: a list of `Evolution`, specifying methods used during data evolution. Defaulted to all `Evolution`s.
104104

105105
:::caution
106106
While the previous methods first use an LLM to generate a series of inputs based on the provided context before evolving them, `generate_goldens_from_inputs` simply evolves the provided list of inputs into more complex and diverse `Golden`s. It's also important to note that this method will only populate the input field of each generated `Golden`.
107107
:::
108108

109-
### 3. Generating From Provided Inputs
109+
### 3. Generating From Provided Prompts
110110

111-
If your LLM application **does not rely on a retrieval context**, or if you simply wish to generate a synthetic dataset based on information outside your application's information database, `deepeval` also supports generating synthetic `Golden`s from an initial list of inputs, which serve as examples from which additional inputs will be generated.
111+
If your LLM application **does not rely on a retrieval context**, or if you simply wish to generate a synthetic dataset based on information outside your application's information database, `deepeval` also supports generating synthetic `Golden`s from an initial list of prompts, which serve as examples from which additional prompts will be generated.
112112

113113
:::info
114-
While the previous methods first use an LLM to generate a series of inputs based on the provided context before evolving them, `generate_goldens_from_inputs` simply **evolves the provided list of inputs** into more complex and diverse `Golden`s. It's also important to note that this method will only populate the input field of each generated `Golden`.
114+
While the previous methods first use an LLM to generate a series of inputs based on the provided context before evolving them, `generate_goldens_from_prompts` simply **evolves the provided list of prompts** into more complex and diverse `Golden`s. It's also important to note that this method will only populate the input field of each generated `Golden`.
115115
:::
116116

117117
```python
118118
from deepeval.synthesizer import Synthesizer
119119

120120
synthesizer = Synthesizer()
121-
synthesizer.generate_goldens_from_inputs(
122-
inputs=[
121+
synthesizer.generate_goldens_from_prompts(
122+
prompts=[
123123
"What is 2+2",
124124
"Give me the solution to 12/5",
125125
"5! = ?"
@@ -130,17 +130,17 @@ synthesizer.generate_goldens_from_inputs(
130130

131131
There are one mandatory and three optional parameters when using the `generate_goldens_from_docs` method:
132132

133-
- `inputs`: a list of strings, representing your initial list of example inputs.
134-
- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
133+
- `prompts`: a list of strings, representing your initial list of example prompts.
134+
- [Optional] `num_evolutions`: the number of evolution steps to apply to each prompt. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial prompts. Defaulted to 1.
135135
- [Optional] `enable_breadth_evolve`: a boolean which when set to `True`, introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
136-
- [Optional] `evolution_types`: a list of `InputEvolutionType`, specifying methods used during data evolution. Defaulted to all `InputEvolutionType`s.
136+
- [Optional] `evolution_types`: a list of `PromptEvolution`, specifying methods used during data evolution. Defaulted to all `PromptEvolution`s.
137137

138138
### 4. Generating From Scratch
139139

140-
If you do not have a list of example inputs, or wish to solely rely on an LLM generation for synthesis, you can also generate synthetic `Golden`s simply by specifying the subject, task, and output format you wish your inputs to follow.
140+
If you do not have a list of example prompts, or wish to solely rely on an LLM generation for synthesis, you can also generate synthetic `Golden`s simply by specifying the subject, task, and output format you wish your prompts to follow.
141141

142142
:::tip
143-
This method is especially helpful when you wish to **evaluate your LLM on a specific task**, such as red-teaming or text-to-SQL use cases!
143+
Generating goldens from scratch is especially helpful when you wish to **evaluate your LLM on a specific task**, such as red-teaming or text-to-SQL use cases!
144144
:::
145145

146146
```python
@@ -156,15 +156,21 @@ synthesizer.generate_goldens_from_scratch(
156156
)
157157
```
158158

159+
This method is a **2-step function** that first generates a list of prompts about a given subject for a certain task and in a certain output format, before using the generated list of prompts to generate more prompts through data evolution.
160+
161+
:::info
162+
The subject, task, and output format parameters are all strings that are inserted into a predefined prompt template, meaning these parameters are **flexible and will need to be iterated on** for optimal results.
163+
:::
164+
159165
There are four mandatory and three optional parameters when using the `generate_goldens_from_docs` method:
160166

161167
- `subject`: a string, specifying the subject and nature of your generated `Golden`s
162168
- `task`: a string, representing the purpose of these evaluation `Golden`s
163169
- `output_format`: a string, representing the expected output format. This is not equivalent to python `type`s but simply gives you more control over the structure of your synthetic data.
164170
- `num_initial_goldens`: the number of goldens generated before consequent evolutions
165-
- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
171+
- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated prompt. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
166172
- [Optional] `enable_breadth_evolve`: a boolean which when set to `True`, introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
167-
- [Optional] `evolution_types`: a list of `InputEvolutionType`, specifying methods used during data evolution. Defaulted to all `InputEvolutionType`s.
173+
- [Optional] `evolution_types`: a list of `PromptEvolution`, specifying methods used during data evolution. Defaulted to all `PromptEvolution`s.
168174

169175
### Saving Generated Goldens
170176

0 commit comments

Comments
 (0)