Skip to content

Builtin (server side) tool calls #2585

@alexmojaki

Description

@alexmojaki

Area(s)

area:gen-ai

What's missing?

As discussed in #2179 (comment), we need to decide how to represent tools that are part of a model API instead of functions called by the client, e.g. the OpenAI code interpreter.

Describe the solution you'd like

In general I think these should be represented similarly to function tool calls for simplicity and consistency. For example, OpenAI returns an output part like this for the code interpreter:

    {
        'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
        'code': 'import random\n'
        '\n'
        '# Generate a random number\n'
        'first_random_number = random.randint(1, 100)\n'
        'first_random_number',
        'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23',
        'outputs': [{'logs': '21', 'type': 'logs'}],
        'status': 'completed',
        'type': 'code_interpreter_call',
    },

which I think should be represented by these parts:

    {
        'type': 'tool_call',
        'name': 'code_interpreter_call',
        'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
        'arguments': {
            'code': 'import random\n\n# Generate a random number\nfirst_random_number = random.randint(1, 100)\nfirst_random_number',
            'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23',
        },
    },
    {
        'type': 'tool_call_response',
        'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
        'response': {
            'outputs': [{'logs': '21', 'type': 'logs'}],
            'status': 'completed',
        },
    },

Optionally there could be an extra boolean to indicate that this is a server-side call.

In Gemini, there's already two parts in the output:

          {
            "executableCode": {
              "language": "PYTHON",
              "code": "import random\n\nrandom_number = random.randint(1, 100)\nprint(f\"The random number generated is: {random_number}\")\n"
            }
          },
          {
            "codeExecutionResult": {
              "outcome": "OUTCOME_OK",
              "output": "The random number generated is: 6\n"
            }
          },

Note that there's nothing to use here as a tool call ID, so sticking with that format may mean making that field optional, or generating a random ID.

Alternatively, maybe we should have semantic conventions for common tools like code interpreters. It would be nice for 'outputs': [{'logs': '21', 'type': 'logs'}], and "output": "21" to be unified.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    New issues

    Status

    Need triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions