Skip to content

Missing quotation marks in lists of multiple choice values for output type #1630

@RobinPicard

Description

@RobinPicard

When a multiple choice (either through Literal or Enum) is contained in a List as an output type, the value of the multiple choice is missing quotation marks. The problem does not exist with only a multiple choice as an output type as the llm response is a string.

Example to illustrate the problem and to reproduce:

from typing import List, Literal
import outlines
import transformers
from outlines.types.dsl import to_regex, python_types_to_terms
from enum import Enum

TEST_MODEL = "microsoft/Phi-3-mini-4k-instruct"

model = outlines.from_transformers(
    transformers.AutoModelForCausalLM.from_pretrained(TEST_MODEL),
    transformers.AutoTokenizer.from_pretrained(TEST_MODEL),
)

# multiple choices (same issue with Enum as with Literal)
output_type = List[Literal["Paris", "London", "Rome", "Berlin"]]

print(to_regex(python_types_to_terms(output_type))) # \[(Paris|London|Rome|Berlin)(,\ (Paris|London|Rome|Berlin))*\]

result = model("Give me a list of cities.", output_type, max_new_tokens=100)
print(result) # [Paris]

# string
output_type = List[str]

print(to_regex(python_types_to_terms(output_type))) # \[("[^"]*")(,\ ("[^"]*"))*\]

result = model("Give me a list of cities.", output_type, max_new_tokens=100)
print(result) # ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]

I think there may be similar issue of strings in complex output types elsewhere.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions