web-scraping

Here are 10,531 public repositories matching this topic...

firecrawl / firecrawl

🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

markdown crawler scraper ai html-to-markdown web-crawler scraping web-scraper web-scraping data-extraction webscraping web-data-extraction ai-agents web-search ai-search web-data llm ai-crawler ai-scraping

Updated Mar 6, 2026
TypeScript

scrapy / scrapy

Star

Scrapy, a fast high-level web crawling & scraping framework for Python.

python crawler framework scraping crawling web-scraping hacktoberfest web-scraping-python

Updated Mar 2, 2026
Python

Mintplex-Labs / anything-llm

Sponsor

Star

The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configration.

mcp web-scraping no-code ai-agents kimi multimodal rag moonshot vector-database llm localai local-llm ollama lmstudio deepseek llama3 custom-ai-agents mcp-servers qwen3

Updated Mar 6, 2026
JavaScript

Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monitoring—all for free or enjoy our SaaS plan!

Updated Mar 6, 2026
Python

D4Vinci / Scrapling

Sponsor

Star

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

Updated Mar 5, 2026
Python

ScrapeGraphAI / Scrapegraph-ai

Sponsor

Star

Python scraper based on AI

Updated Feb 24, 2026
Python

apify / crawlee

Star

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Mar 5, 2026
TypeScript

Evil0ctal / Douyin_TikTok_Download_API

Sponsor

Star

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

Updated Oct 12, 2025
Python

getmaxun / maxun

Star

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

api crawler scraper automation crawling web-scraper self-hosted web-scraping data-extraction webscraping agents browser-automation no-code web-search rpa robotic-process-automation nocode playwright

Updated Mar 3, 2026
TypeScript

seleniumbase / SeleniumBase

Star

SeleniumBase is a Python framework for browser automation, testing, and scraping. Has CDP Mode for stealth.

python webdriver selenium chromium test-automation pytest web-scraping chromedriver testing-tools cdp bot-detection web-automation python-scraper selenium-python e2e-testing seleniumbase anti-detection playwright web-scraping-python

Updated Mar 6, 2026
Python

yusufkaraaslan / Skill_Seekers

Star

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

github python pdf documentation automation ocr mcp code-analysis web-scraping ast-parser documentation-generator conflict-detection multi-source github-scraper ai-tools claude-ai mcp-server claude-skills

Updated Mar 2, 2026
Python

mherrmann / helium

Star

Lighter web automation with Python

python firefox chrome webdriver selenium python3 web-scraping helium web-automation selenium-python

Updated Feb 4, 2026
Python

apify / crawlee-python

Star

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.