grow food, not lawn.

Skip to content
  • Russell Ballestrini's avatar
    Add detailed crawl diagnostic logging to upstream spider · 05891383
    Russell Ballestrini authored
    Ported from Discord bot's async web fetcher improvements.
    
    Add comprehensive logging to understand crawl behavior:
    - Log crawl parameters at start (max_depth, URLs, keywords)
    - Debug log for crawl queue state during processing
    - Detailed link extraction stats with skip reasons:
      - Total links found
      - Links added to crawl queue
      - Links skipped by robots.txt
      - Links skipped (wrong domain)
      - Links skipped (max depth reached)
    
    Applied to both:
    - Fresh page fetching and link extraction
    - Cached page link extraction for depth traversal
    
    This diagnostic logging helps identify why crawlers find fewer
    pages than expected (e.g., robots.txt blocking, domain filtering,
    depth limits).
    
    No crawl logic changes - purely diagnostic visibility.
    05891383