-
Notifications
You must be signed in to change notification settings - Fork 610
Open
Labels
Description
When a multiple choice (either through Literal
or Enum
) is contained in a List
as an output type, the value of the multiple choice is missing quotation marks. The problem does not exist with only a multiple choice as an output type as the llm response is a string.
Example to illustrate the problem and to reproduce:
from typing import List, Literal
import outlines
import transformers
from outlines.types.dsl import to_regex, python_types_to_terms
from enum import Enum
TEST_MODEL = "microsoft/Phi-3-mini-4k-instruct"
model = outlines.from_transformers(
transformers.AutoModelForCausalLM.from_pretrained(TEST_MODEL),
transformers.AutoTokenizer.from_pretrained(TEST_MODEL),
)
# multiple choices (same issue with Enum as with Literal)
output_type = List[Literal["Paris", "London", "Rome", "Berlin"]]
print(to_regex(python_types_to_terms(output_type))) # \[(Paris|London|Rome|Berlin)(,\ (Paris|London|Rome|Berlin))*\]
result = model("Give me a list of cities.", output_type, max_new_tokens=100)
print(result) # [Paris]
# string
output_type = List[str]
print(to_regex(python_types_to_terms(output_type))) # \[("[^"]*")(,\ ("[^"]*"))*\]
result = model("Give me a list of cities.", output_type, max_new_tokens=100)
print(result) # ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]
I think there may be similar issue of strings in complex output types elsewhere.
torchss