Website Logo Scraper

A Python tool for automatically scraping and extracting logo images from websites.

Features

Extracts logo images from websites using various detection methods
Processes multiple websites concurrently to improve efficiency
Uses intelligent heuristics to identify the most likely logo candidate
Provides comprehensive logging and progress tracking
Randomized delays and user agents to avoid rate limiting

Installation

Clone this repository
Install the required dependencies:

pip install -r requirements.txt

Usage

The script requires an input CSV file with a column named website containing the URLs to scrape.

Basic Usage

python logo_scraper.py --input websites.csv --output logos.csv

Command-line Arguments

--input, -i: Input CSV file with websites (default: websites.csv)
--output, -o: Output CSV file for logo URLs (default: logos.csv)
--workers, -w: Number of worker threads (default: 5)
--delay, -d: Delay between requests in seconds (default: 1.0)
--timeout, -t: Request timeout in seconds (default: 10)

Example

python logo_scraper.py --input companies.csv --output company_logos.csv --workers 10 --delay 2 --timeout 15

Input Format

The input CSV file should contain a column named website with the URLs to scrape:

website
example.com
google.com
github.com

Output Format

The script outputs a CSV file with the following columns:

website: The original website URL
logo_url: The URL of the extracted logo (if found)
status: The status of the extraction (success, no logo found, or an error message)

How It Works

The logo scraper uses multiple methods to identify logo images:

Looks for images with "logo" in their URL, class name, ID, or alt text
Checks for images positioned in header elements or as home page links
Examines SVG elements with "logo" in their class names
Checks for meta tags with OpenGraph images
Looks for favicon and apple-touch-icon links

For each website, the script:

Sends an HTTP request with a random user agent
Parses the HTML content
Applies logo detection heuristics
Scores potential logo candidates
Returns the highest-scoring logo URL

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logo_scraper.py		logo_scraper.py
requirements.txt		requirements.txt
websites.csv		websites.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Website Logo Scraper

Features

Installation

Usage

Basic Usage

Command-line Arguments

Example

Input Format

Output Format

How It Works

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

davidumoru/logo-scraper

Folders and files

Latest commit

History

Repository files navigation

Website Logo Scraper

Features

Installation

Usage

Basic Usage

Command-line Arguments

Example

Input Format

Output Format

How It Works

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages