Open
Description
Hello, thank you for the great work.
I have a couple of questions regarding the dataset preprocessing:
For WebVid-10M, you mentioned filtering out videos with captions that do not contain dynamic content using the LLaMA-3 model. Could you please share the criteria or code used to determine whether a caption contains dynamic content?
For Panda-70M, you stated that 5.3 million videos were downloaded. Could you clarify which subset of videos were selected and how they were chosen?
Thank you in advance for your help!
Activity