Skip to content

Evaluation results of Gemini-2.5-Pro #1

Open
@jwwang424

Description

@jwwang424

Thanks for the interesting work!

I tested gemini-2.5-pro-preview-05-06 on the released Video-Holmes-test benchmark, and got the overall avg score of 62.3, which seems differ from your reported Gemini-2.5-Pro result (51.3). I just the raw video (with audio) to gemini's api and use your provided prompt:

Based on the given video, reason and answer the single-choice question. Provide your reasoning between the <think> and </think> tags, and then give your final answer between the <answer> and </answer> tags. The question is: {question}. The options are: {options}. Your answer:

What could be the reasons for the differences in test results?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions