Video Data Processing

A comprehensive toolkit for processing video data, extracting speech, generating transcripts, and analyzing emotions and body poses.

Key Features

SharePoint Integration: Download videos directly from SharePoint
Speech Separation: Extract clean speech from videos with background noise
Speech-to-Text Transcription: Convert speech to accurate text with speaker identification
Emotion & Pose Recognition: Analyze facial emotions and body poses in videos
Parallel Processing: Run CPU and GPU intensive tasks simultaneously for better performance
Sequential Processing Pipeline: Process videos through all steps automatically
Batch Processing: Process multiple videos without manual intervention
Secure Authentication: Hugging Face tokens used only during session (never stored)
Robust Audio Processing: Enhanced algorithms to prevent file corruption

Getting Started

Prerequisites

Python 3.12
Poetry (dependency management)
FFmpeg (audio/video processing)
Hugging Face account (for AI model access)

Installation

# Clone the repository
git clone https://github.com/tkhahns/Video_Data_Processing.git
cd Video_Data_Processing

# Install dependencies
poetry install
poetry install --with common --with speech --with emotion --with download

Quick Start

The simplest way to use this toolkit is through the all-in-one pipeline script:

# macOS/Linux
./run_all.sh

# Windows
.\run_all.ps1

This will:

Prompt for your Hugging Face token (used in-memory only, never saved to disk)
Guide you through downloading videos from SharePoint
Process the videos through all pipeline stages sequentially
Output results in timestamped directories
Report total processing time and success status

Documentation

Detailed documentation for each platform:

Pipeline Workflow

Download Videos: Download videos from SharePoint or use existing files
Speech Separation: Extract clean speech audio from videos
Transcription: Convert speech to text with speaker identification
Emotion & Pose Recognition: Analyze facial emotions and body language

Components

Each component can be used individually:

# Speech separation
poetry run scripts/macos/run_separate_speech.sh --input-dir "./my-videos"

# Speech-to-text
poetry run scripts/macos/run_speech_to_text.sh --input-dir "./my-speech"

# Emotion and pose recognition
poetry run scripts/macos/run_emotion_and_pose_recognition.sh --input-dir "./my-videos"

Batch Processing

For automated processing of multiple videos:

# Run the complete pipeline in batch mode
./run_all.sh --batch

# Run individual components in batch mode
poetry run scripts/macos/run_separate_speech.sh --input-dir "./my-videos" --batch
poetry run scripts/macos/run_speech_to_text.sh --input-dir "./my-speech" --batch
poetry run scripts/macos/run_emotion_and_pose_recognition.sh --input-dir "./my-videos" --batch

Performance Optimization

The pipeline is optimized for performance in several ways:

Parallel Processing: Emotion recognition runs in parallel with speech processing
Memory Management: Resources are released after each processing step
Batch Processing: Process multiple videos without manual intervention
File Format Handling: Robust audio conversion with validation checks

Known Issues & Solutions

Audio File Corruption: Enhanced error handling prevents corrupted audio files
RTTM Errors: Fixed issues with spaces in filenames for speaker diarization
Memory Management: Improved memory cleanup between processing steps

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
docs		docs
scripts		scripts
src		src
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run_all.ps1		run_all.ps1
run_all.sh		run_all.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Data Processing

Key Features

Getting Started

Prerequisites

Installation

Quick Start

Documentation

Pipeline Workflow

Components

Batch Processing

Performance Optimization

Known Issues & Solutions

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tkhahns/Video_Data_Processing

Folders and files

Latest commit

History

Repository files navigation

Video Data Processing

Key Features

Getting Started

Prerequisites

Installation

Quick Start

Documentation

Pipeline Workflow

Components

Batch Processing

Performance Optimization

Known Issues & Solutions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages