Scaling Open Vocabulary Object Detection, Owlv2 finetuning

### Search before asking

- [x] I have searched the Multimodal Maestro [issues](https://github.com/roboflow/multimodal-maestro/issues) and found no similar feature requests.


### Description

https://huggingface.co/docs/transformers/en/model_doc/owlv2

Be able to finetune Owlv2 for grounded object detection using JSONL referencing 3-channel imagery. N-channel imagery would be extra dope. Ideally with high bit depth TIFF support, since my imagery comes in .tif. I see Pillow in the requirements so high bit depth TIFF support might [not be possible today](https://github.com/python-pillow/Pillow/issues/1888) without more work to change how imagery is loaded.

### Use case

I've played around with OWLv2 a bit and compared it to GroundingDINO and Qwen 2.5 and it seems to do a better job at producing bounding boxes on hard images with small objects (satellite images) whereas the other models produce nothing. This makes me think it is a better candidate for fine-tuning potentially. But I'm definitely not certain and have more testing to do.

### Additional

In the geospatial computer vision domain we are in the very earliest of days toward applying VLMs to solve actual problems on massive imagery corpuses. There have been some [cool experiments](https://x.com/bradneuberg/status/1889718789635449234) recently that have inspired me to try fine-tuning VLMs to test their limits on remotely sensed imagery using modest sized datasets for fine-tuning. 

Can't commit to a PR right now (but might be able to in the future.

### Are you willing to submit a PR?

- [ ] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scaling Open Vocabulary Object Detection, Owlv2 finetuning #160

Description

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions