CHANGELOG.md · 05891383960cdee5502b7955d01060c052e973e4 · engineering / unturf / spider.unturf.com

Add detailed crawl diagnostic logging to upstream spider · 05891383

Russell Ballestrini authored Nov 27, 2025

Ported from Discord bot's async web fetcher improvements.

Add comprehensive logging to understand crawl behavior:
- Log crawl parameters at start (max_depth, URLs, keywords)
- Debug log for crawl queue state during processing
- Detailed link extraction stats with skip reasons:
  - Total links found
  - Links added to crawl queue
  - Links skipped by robots.txt
  - Links skipped (wrong domain)
  - Links skipped (max depth reached)

Applied to both:
- Fresh page fetching and link extraction
- Cached page link extraction for depth traversal

This diagnostic logging helps identify why crawlers find fewer
pages than expected (e.g., robots.txt blocking, domain filtering,
depth limits).

No crawl logic changes - purely diagnostic visibility.

05891383

Admin message