Skip to content

GPU OOMs on large directory scans #232

@filmo

Description

@filmo

I'm crawling a large directory structure that contains 10s of thousands of high resolution images.

Using the CNN() method, it OOMs before it finishes the scan.

Traceback (most recent call last): File "/home/philglau/dedup_py/main.py", line 85, in <module> search('PyCharm') File "/home/philglau/dedup_py/main.py", line 50, in search encodings = cnn.encode_images(image_dir=image_dir,recursive=True,num_enc_workers=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/imagededup/methods/cnn.py", line 251, in encode_images return self._get_cnn_features_batch(image_dir=image_dir, recursive=recursive, num_workers=num_enc_workers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/imagededup/methods/cnn.py", line 146, in _get_cnn_features_batch arr = self.model(ims.to(self.device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

I think the problem is that during the scan, it is reading all the images and, most importantly running apply_mobilenet_preprocess() which is performing a transform on the image on the GPU. However, those transforms are not being consumed ?? In other words it seems like it's trying to scan the entire structure before proceeding with encoding the results??

Or at least that's what it seems like to me. (or perhaps it's doing the encoding, but some parts of the GPU memory are not being released after being consumed.)

  • CNN.encode_images() calls _get_cnn_features_batch
  • _get_cnn_features_batch calls img_dataloader(image_dir='mypath)
  • img_dataloader calls ImgDataset with a basenet_preprocess set
  • ImgDataset.get_item then applies the self.basenet_preprocess
  • self.basenet_preprocess then hits apply_mobilenet_preprocess()
  • which calls self.transform() which moves the data to the GPU

I believe it's all the apply_mobilenet_preprocess() that are filling the GPU before they have a chance to be consumed be the encoder ??? (or at least that's my guess)

Here's a screen shot from nvtop while CNN.encode_images() is still scanning the directories:

Image

Shortly there after the encode_images() process crashes once the GPU (RTX 3090 with 24GB VRAM) goes OOM. I've tried adjusting the batch size using cnn.batch_size = 16 and other lower numbers but that doesn't make a difference. Still always OOMs.

As shown on the image, memory usage keeps increasing, but there is little or no compute occurring on the GPU during the time it is filling up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions