GPU OOMs on large directory scans

I'm crawling a large directory structure that contains 10s of thousands of high resolution images. 

Using the CNN() method, it OOMs before it finishes the scan. 

`Traceback (most recent call last):
  File "/home/philglau/dedup_py/main.py", line 85, in <module>
    search('PyCharm')
  File "/home/philglau/dedup_py/main.py", line 50, in search
    encodings = cnn.encode_images(image_dir=image_dir,recursive=True,num_enc_workers=1)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/imagededup/methods/cnn.py", line 251, in encode_images
    return self._get_cnn_features_batch(image_dir=image_dir, recursive=recursive, num_workers=num_enc_workers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/imagededup/methods/cnn.py", line 146, in _get_cnn_features_batch
    arr = self.model(ims.to(self.device))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/philglau/miniconda3/envs/id/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
`



I think the problem is that during the scan, it is reading all the images and, most importantly running apply_mobilenet_preprocess() which is performing a transform on the image on the GPU. However, those transforms are not being consumed ?? In other words it seems like it's trying to scan the entire structure before proceeding with encoding the results?? 

Or at least that's what it seems like to me. (or perhaps it's doing the encoding, but some parts of the GPU memory are not being released after being consumed.)

- CNN.encode_images() calls _get_cnn_features_batch
- _get_cnn_features_batch calls img_dataloader(image_dir='mypath)
- img_dataloader calls ImgDataset with a basenet_preprocess set
- ImgDataset.__get_item__ then applies the self.basenet_preprocess
- self.basenet_preprocess then hits apply_mobilenet_preprocess() 
- which calls self.transform() which moves the data to the GPU

I believe it's all the apply_mobilenet_preprocess() that are filling the GPU before they have a chance to be consumed be the encoder ??? (or at least that's my guess)

Here's a screen shot from nvtop while CNN.encode_images() is still scanning the directories:

![Image](https://github.com/user-attachments/assets/075841d5-aac7-446e-b1e1-949dcc97f1a2)

Shortly there after the encode_images() process crashes once the GPU (RTX 3090 with 24GB VRAM) goes OOM. I've tried adjusting the batch size using cnn.batch_size = 16 and other lower numbers but that doesn't make a difference. Still always OOMs.

As shown on the image, memory usage keeps increasing, but there is little or no compute occurring on the GPU during the time it is filling up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU OOMs on large directory scans #232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU OOMs on large directory scans #232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions