Skip to content

Commit 146f9d4

Browse files
committed
Update README
1 parent 37fd80e commit 146f9d4

File tree

2 files changed

+46
-21
lines changed

2 files changed

+46
-21
lines changed

README.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -523,8 +523,33 @@ async def test_news_crawl():
523523

524524
- **📊 Table-to-DataFrame Extraction**: Extract HTML tables directly to CSV or pandas DataFrames:
525525
```python
526-
crawler_config = CrawlerRunConfig(extract_tables=True)
527-
# Access tables via result.tables or result.tables_as_dataframe
526+
crawler = AsyncWebCrawler(config=browser_config)
527+
await crawler.start()
528+
529+
try:
530+
# Set up scraping parameters
531+
crawl_config = CrawlerRunConfig(
532+
table_score_threshold=8, # Strict table detection
533+
)
534+
535+
# Execute market data extraction
536+
results: List[CrawlResult] = await crawler.arun(
537+
url="https://coinmarketcap.com/?page=1", config=crawl_config
538+
)
539+
540+
# Process results
541+
raw_df = pd.DataFrame()
542+
for result in results:
543+
if result.success and result.media["tables"]:
544+
raw_df = pd.DataFrame(
545+
result.media["tables"][0]["rows"],
546+
columns=result.media["tables"][0]["headers"],
547+
)
548+
break
549+
print(raw_df.head())
550+
551+
finally:
552+
await crawler.stop()
528553
```
529554

530555
- **🚀 Browser Pooling**: Pages launch hot with pre-warmed browser instances for lower latency and memory usage
@@ -544,7 +569,7 @@ async def test_news_crawl():
544569
claude mcp add --transport sse c4ai-sse http://localhost:11235/mcp/sse
545570
```
546571

547-
- **🖥️ Interactive Playground**: Test configurations and generate API requests with the built-in web interface at `/playground`
572+
- **🖥️ Interactive Playground**: Test configurations and generate API requests with the built-in web interface at `http://localhost:11235//playground`
548573

549574
- **🐳 Revamped Docker Deployment**: Streamlined multi-architecture Docker image with improved resource efficiency
550575

docs/examples/crypto_analysis_example.py

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -383,29 +383,29 @@ async def main():
383383
scroll_delay=0.2,
384384
)
385385

386-
# # Execute market data extraction
387-
# results: List[CrawlResult] = await crawler.arun(
388-
# url="https://coinmarketcap.com/?page=1", config=crawl_config
389-
# )
390-
391-
# # Process results
392-
# raw_df = pd.DataFrame()
393-
# for result in results:
394-
# if result.success and result.media["tables"]:
395-
# # Extract primary market table
396-
# # DataFrame
397-
# raw_df = pd.DataFrame(
398-
# result.media["tables"][0]["rows"],
399-
# columns=result.media["tables"][0]["headers"],
400-
# )
401-
# break
386+
# Execute market data extraction
387+
results: List[CrawlResult] = await crawler.arun(
388+
url="https://coinmarketcap.com/?page=1", config=crawl_config
389+
)
390+
391+
# Process results
392+
raw_df = pd.DataFrame()
393+
for result in results:
394+
if result.success and result.media["tables"]:
395+
# Extract primary market table
396+
# DataFrame
397+
raw_df = pd.DataFrame(
398+
result.media["tables"][0]["rows"],
399+
columns=result.media["tables"][0]["headers"],
400+
)
401+
break
402402

403403

404404
# This is for debugging only
405405
# ////// Remove this in production from here..
406406
# Save raw data for debugging
407-
# raw_df.to_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv", index=False)
408-
# print("🔍 Raw data saved to 'raw_crypto_data.csv'")
407+
raw_df.to_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv", index=False)
408+
print("🔍 Raw data saved to 'raw_crypto_data.csv'")
409409

410410
# Read from file for debugging
411411
raw_df = pd.read_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv")

0 commit comments

Comments
 (0)