updated docs according to code

kritinv · kritinv · commit 22d4bc49a193 · 2024-06-18T17:36:34.000+07:00
diff --git a/docs/docs/evaluation-datasets-synthetic-data.mdx b/docs/docs/evaluation-datasets-synthetic-data.mdx
@@ -40,8 +40,8 @@ There are 4 approaches a `deepeval`'s `Synthesizer` can generate synthetic `Gold
 
 1. Generating synthetic `Golden`s using **context extracted from documents.**
 2. Generating synthetic `Golden`s from a **list of provided context.**
-3. Generating synthetic `Golden`s from a **list of provided input.**
-4. Generating synthetic `Golden`s from a **scratch**
+3. Generating synthetic `Golden`s from a **list of provided prompts.**
+4. Generating synthetic `Golden`s from **scratch**
 
 ### 1. Generating From Documents
 
@@ -70,7 +70,7 @@ There are one mandatory and seven optional parameters when using the `generate_g
 - [Optional] `chunk_overlap`: an int that determines the overlap size between consecutive text chunks during context extraction. Defaulted to 0.
 - [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
 - [Optional] `enable_breadth_evolve`: a boolean which when set to `True`, introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
-- [Optional] `evolution_types`: a list of `EvolutionType`, specifying methods used during data evolution. Defaulted to all `EvolutionType`s.
+- [Optional] `evolution_types`: a list of `Evolution`, specifying methods used during data evolution. Defaulted to all `Evolution`s.
 
 ### 2. Generating From Provided Contexts
 
@@ -100,26 +100,26 @@ There are one mandatory and five optional parameters when using the `generate_go
 - [Optional] `max_goldens_per_context`: the maximum number of golden data points to be generated from each context. Adjusting this parameter can influence the size of the resulting dataset. Defaulted to 2.
 - [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
 - [Optional] `enable_breadth_evolve`: a boolean indicating whether to enable breadth evolution strategies during data generation. When set to True, it introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
-- [Optional] `evolution_types`: a list of `EvolutionType`, specifying methods used during data evolution. Defaulted to all `EvolutionType`s.
+- [Optional] `evolution_types`: a list of `Evolution`, specifying methods used during data evolution. Defaulted to all `Evolution`s.
 
 :::caution
 While the previous methods first use an LLM to generate a series of inputs based on the provided context before evolving them, `generate_goldens_from_inputs` simply evolves the provided list of inputs into more complex and diverse `Golden`s. It's also important to note that this method will only populate the input field of each generated `Golden`.
 :::
 
-### 3. Generating From Provided Inputs
+### 3. Generating From Provided Prompts
 
-If your LLM application **does not rely on a retrieval context**, or if you simply wish to generate a synthetic dataset based on information outside your application's information database, `deepeval` also supports generating synthetic `Golden`s from an initial list of inputs, which serve as examples from which additional inputs will be generated.
+If your LLM application **does not rely on a retrieval context**, or if you simply wish to generate a synthetic dataset based on information outside your application's information database, `deepeval` also supports generating synthetic `Golden`s from an initial list of prompts, which serve as examples from which additional prompts will be generated.
 
 :::info
-While the previous methods first use an LLM to generate a series of inputs based on the provided context before evolving them, `generate_goldens_from_inputs` simply **evolves the provided list of inputs** into more complex and diverse `Golden`s. It's also important to note that this method will only populate the input field of each generated `Golden`.
+While the previous methods first use an LLM to generate a series of inputs based on the provided context before evolving them, `generate_goldens_from_prompts` simply **evolves the provided list of prompts** into more complex and diverse `Golden`s. It's also important to note that this method will only populate the input field of each generated `Golden`.
 :::
 
 ```python
 from deepeval.synthesizer import Synthesizer
 
 synthesizer = Synthesizer()
-synthesizer.generate_goldens_from_inputs(
-    inputs=[
+synthesizer.generate_goldens_from_prompts(
+    prompts=[
         "What is 2+2",
         "Give me the solution to 12/5",
         "5! = ?"
@@ -130,17 +130,17 @@ synthesizer.generate_goldens_from_inputs(
 
 There are one mandatory and three optional parameters when using the `generate_goldens_from_docs` method:
 
-- `inputs`: a list of strings, representing your initial list of example inputs.
-- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
+- `prompts`: a list of strings, representing your initial list of example prompts.
+- [Optional] `num_evolutions`: the number of evolution steps to apply to each prompt. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial prompts. Defaulted to 1.
 - [Optional] `enable_breadth_evolve`: a boolean which when set to `True`, introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
-- [Optional] `evolution_types`: a list of `InputEvolutionType`, specifying methods used during data evolution. Defaulted to all `InputEvolutionType`s.
+- [Optional] `evolution_types`: a list of `PromptEvolution`, specifying methods used during data evolution. Defaulted to all `PromptEvolution`s.
 
 ### 4. Generating From Scratch
 
-If you do not have a list of example inputs, or wish to solely rely on an LLM generation for synthesis, you can also generate synthetic `Golden`s simply by specifying the subject, task, and output format you wish your inputs to follow.
+If you do not have a list of example prompts, or wish to solely rely on an LLM generation for synthesis, you can also generate synthetic `Golden`s simply by specifying the subject, task, and output format you wish your prompts to follow.
 
 :::tip
-This method is especially helpful when you wish to **evaluate your LLM on a specific task**, such as red-teaming or text-to-SQL use cases!
+Generating goldens from scratch is especially helpful when you wish to **evaluate your LLM on a specific task**, such as red-teaming or text-to-SQL use cases!
 :::
 
 ```python
@@ -156,15 +156,21 @@ synthesizer.generate_goldens_from_scratch(
 )
 ```
 
+This method is a **2-step function** that first generates a list of prompts about a given subject for a certain task and in a certain output format, before using the generated list of prompts to generate more prompts through data evolution.
+
+:::info
+The subject, task, and output format parameters are all strings that are inserted into a predefined prompt template, meaning these parameters are **flexible and will need to be iterated on** for optimal results.
+:::
+
 There are four mandatory and three optional parameters when using the `generate_goldens_from_docs` method:
 
 - `subject`: a string, specifying the subject and nature of your generated `Golden`s
 - `task`: a string, representing the purpose of these evaluation `Golden`s
 - `output_format`: a string, representing the expected output format. This is not equivalent to python `type`s but simply gives you more control over the structure of your synthetic data.
 - `num_initial_goldens`: the number of goldens generated before consequent evolutions
-- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
+- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated prompt. This parameter controls the **complexity and diversity** of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.
 - [Optional] `enable_breadth_evolve`: a boolean which when set to `True`, introduces a **wider variety of context modifications**, enhancing the dataset's diversity. Defaulted to `False`.
-- [Optional] `evolution_types`: a list of `InputEvolutionType`, specifying methods used during data evolution. Defaulted to all `InputEvolutionType`s.
+- [Optional] `evolution_types`: a list of `PromptEvolution`, specifying methods used during data evolution. Defaulted to all `PromptEvolution`s.
 
 ### Saving Generated Goldens