How to design negative samples for Florence-2 model training? #144

David-19940718 · 2024-09-18T02:48:31Z

David-19940718
Sep 18, 2024

Search before asking

I have searched the Multimodal Maestro issues and found no similar feature requests.

Question

We currently have a good understanding of how to create positive samples for the Florence-2 model, using a format like this:

{
  "image": "IMG_20220316_144445_jpg.rf.a79f523e54855af2323f0cfdb9a4dedc.jpg",
  "prefix": "<OD>",
  "suffix": "5 of hearts<loc_54><loc_213><loc_291><loc_598>6 of hearts<loc_205><loc_251><loc_471><loc_670>7 of hearts<loc_363><loc_309><loc_688><loc_797>8 of hearts<loc_598><loc_395><loc_973><loc_974>"
}

However, I'm unclear on how to properly design negative samples for training. Negative samples are crucial for improving the model's ability to discriminate and reduce false positives. Some questions I have:

Should negative samples use the same image but with incorrect object descriptions?
Do we need to use completely unrelated images and descriptions?
How do we handle the location tags for negative samples?
What's the recommended ratio of positive to negative samples in the training set?

Any guidance or best practices for creating effective negative samples would be greatly appreciated. This will help ensure we're training the Florence-2 model optimally for object detection tasks.

Additional

If there are any existing resources, documentation, or examples specifically for Florence-2 negative sample creation, please point me in that direction. Also, if there are any tools or scripts the team recommends for generating or augmenting negative samples, that information would be very helpful.

David-19940718 · 2024-09-18T05:51:01Z

David-19940718
Sep 18, 2024
Author

We're currently experiencing a situation where our model's mAP (mean Average Precision) metrics are degrading while the loss values suggest overfitting. Our current saving strategy is based solely on validation loss, as shown in the following code snippet:

    def save_best(self, processor: AutoProcessor, model: AutoModelForCausalLM, val_loss: float):
        """Saves the best model checkpoint if the validation loss improves.

        Args:
            processor (AutoProcessor): The processor to save.
            model (AutoModelForCausalLM): The model to save.
            val_loss (float): The current validation loss.
        """
        if val_loss < self.best_val_loss:
            self.best_val_loss = val_loss
            save_model(self.best_checkpoint_dir, processor, model)
            print(f"New best model saved with validation loss: {self.best_val_loss}")

I've been looking at our model saving strategy, and I'm curious about your thoughts on its effectiveness. While we're using validation loss as the primary metric for saving the best model, it seems that our mAP scores are not reflecting the improvements we see in the loss. Do you think relying solely on validation loss is the best approach for designing our model saving criteria?

Would it be more beneficial to consider a combination of metrics, such as both validation loss and mAP, to ensure we're not just minimizing loss but also improving the model's precision? Or are there other metrics or strategies you believe would be more suitable for our current situation?

Looking forward to your insights on this matter.

0 replies

SkalskiP · 2024-09-18T11:39:31Z

SkalskiP
Sep 18, 2024
Maintainer

Hi @David-19940718 👋🏻 First of all, I'm thrilled to have users like you who are eager to experiment early on and push the library forward.

Regarding negative samples, I don't think there are any established best practices at the moment, but I'll ask a few people involved in VLM training about it.

I thought a good idea, and potentially simple to implement, would be to use the COCO dataset as negative samples. For example, splitting the training into two parts. In the first part, you fine-tune only on your dataset, and in the second part, on a mix of your dataset and the COCO dataset. This way, in the first phase, the model quickly learns your classes, and in the second phase, it becomes resistant to overfitting.

As for your second question, the ability to define any metric as a condition for saving a checkpoint sounds very reasonable. I'll try to add a GH issue to add such support.

0 replies

David-19940718 · 2024-09-19T08:20:17Z

David-19940718
Sep 19, 2024
Author

Thank you for your detailed and encouraging response. 😄

0 replies

David-19940718 · 2024-09-23T03:16:48Z

David-19940718
Sep 23, 2024
Author

Hi @SkalskiP,

By introducing appropriate data augmentation strategies, I've observed a significant reduction in overfitting. Moreover, under the same experimental conditions, the mAP accuracy has improved by several percentage points.

In future version development plans, it might be worth considering the addition of this feature.

0 replies

SkalskiP · 2024-09-24T13:37:30Z

SkalskiP
Sep 24, 2024
Maintainer

Hi @David-19940718 👋🏻 That looks fantastic! Could you tell me exactly what strategies you employed?

0 replies

David-19940718 · 2024-09-25T09:21:52Z

David-19940718
Sep 25, 2024
Author

Sure! The main strategies I employed are:

Random horizontal flipping (50% chance)
Color jittering (adjusting brightness, contrast, saturation, and hue)

class DetectionDataset(Dataset):
    def __init__(self, jsonl_file_path: str, image_directory_path: str, split_name: str):
        self.dataset = JSONLDataset(jsonl_file_path, image_directory_path)
        self.mode = split_name
        if split_name == "train":
            self.transform = transforms.Compose([
                transforms.RandomHorizontalFlip(p=0.5),
                transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
            ])

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, data = self.dataset[idx]
        prefix = data["prefix"]
        suffix = data["suffix"]
        # Apply data augmentation
        if self.mode == "train":
            image = self.transform(image)
        
        return prefix, suffix, image

0 replies

SkalskiP · 2024-09-25T10:29:21Z

SkalskiP
Sep 25, 2024
Maintainer

Hi @David-19940718 👋🏻 Oh, so you ended up using fairly traditional data augmentation techniques?

From what I see, you applied flipping. I understand that you also had to augment the object detection suffix in the process.

0 replies

David-19940718 · 2024-09-25T16:22:59Z

David-19940718
Sep 25, 2024
Author

Yes, I just did a simple initial validation. I applied some basic data augmentation techniques to get started and test things out. 😄

0 replies

SkalskiP · 2024-09-25T19:23:30Z

SkalskiP
Sep 25, 2024
Maintainer

@David-19940718 would you perhaps have a moment to draft a PR introducing basic data augmentation?

0 replies

kengboonang · 2025-01-24T01:58:08Z

kengboonang
Jan 24, 2025

Hello! Would be interested to know if there are any updates regarding this! Currently working on fine-tuning Florence2-base-ft for Object Detection tasks and have tried the following:

leaving out negative samples entirely
using the following annotations:
- none<loc_000><loc_000><loc_000><loc_000> (only on negative samples)
- background<loc_000><loc_1000><loc_000><loc_1000> (for all samples)

Leaving the negative samples out entirely still led to better results as compared to the two annotation methods I've tried where the model is unable to converge as well.

0 replies

How to design negative samples for Florence-2 model training? #144

Uh oh!

David-19940718 Sep 18, 2024

Search before asking

Question

Additional

Replies: 10 comments

Uh oh!

David-19940718 Sep 18, 2024 Author

Uh oh!

SkalskiP Sep 18, 2024 Maintainer

Uh oh!

David-19940718 Sep 19, 2024 Author

Uh oh!

David-19940718 Sep 23, 2024 Author

Uh oh!

SkalskiP Sep 24, 2024 Maintainer

Uh oh!

David-19940718 Sep 25, 2024 Author

Uh oh!

SkalskiP Sep 25, 2024 Maintainer

Uh oh!

David-19940718 Sep 25, 2024 Author

Uh oh!

SkalskiP Sep 25, 2024 Maintainer

Uh oh!

kengboonang Jan 24, 2025

David-19940718
Sep 18, 2024

David-19940718
Sep 18, 2024
Author

SkalskiP
Sep 18, 2024
Maintainer

David-19940718
Sep 19, 2024
Author

David-19940718
Sep 23, 2024
Author

SkalskiP
Sep 24, 2024
Maintainer

David-19940718
Sep 25, 2024
Author

SkalskiP
Sep 25, 2024
Maintainer

David-19940718
Sep 25, 2024
Author

SkalskiP
Sep 25, 2024
Maintainer

kengboonang
Jan 24, 2025