-
Notifications
You must be signed in to change notification settings - Fork 252
Description
Area(s)
area:gen-ai
What's missing?
As discussed in #2179 (comment), we need to decide how to represent tools that are part of a model API instead of functions called by the client, e.g. the OpenAI code interpreter.
Describe the solution you'd like
In general I think these should be represented similarly to function tool calls for simplicity and consistency. For example, OpenAI returns an output part like this for the code interpreter:
{
'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
'code': 'import random\n'
'\n'
'# Generate a random number\n'
'first_random_number = random.randint(1, 100)\n'
'first_random_number',
'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23',
'outputs': [{'logs': '21', 'type': 'logs'}],
'status': 'completed',
'type': 'code_interpreter_call',
},
which I think should be represented by these parts:
{
'type': 'tool_call',
'name': 'code_interpreter_call',
'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
'arguments': {
'code': 'import random\n\n# Generate a random number\nfirst_random_number = random.randint(1, 100)\nfirst_random_number',
'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23',
},
},
{
'type': 'tool_call_response',
'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
'response': {
'outputs': [{'logs': '21', 'type': 'logs'}],
'status': 'completed',
},
},
Optionally there could be an extra boolean to indicate that this is a server-side call.
In Gemini, there's already two parts in the output:
{
"executableCode": {
"language": "PYTHON",
"code": "import random\n\nrandom_number = random.randint(1, 100)\nprint(f\"The random number generated is: {random_number}\")\n"
}
},
{
"codeExecutionResult": {
"outcome": "OUTCOME_OK",
"output": "The random number generated is: 6\n"
}
},
Note that there's nothing to use here as a tool call ID, so sticking with that format may mean making that field optional, or generating a random ID.
Alternatively, maybe we should have semantic conventions for common tools like code interpreters. It would be nice for 'outputs': [{'logs': '21', 'type': 'logs'}],
and "output": "21"
to be unified.
Metadata
Metadata
Assignees
Type
Projects
Status
Status