← Back to blog

Visual Web Scraping: Using Screenshots for Data Extraction

March 2026 -- 9 min read

Traditional web scraping extracts data from HTML source code. But modern websites render content with JavaScript, load data dynamically, and use complex layouts that break conventional scrapers. Visual web scraping uses screenshots to capture the rendered output -- what users actually see -- opening up new possibilities for data extraction and monitoring.

When Visual Scraping Beats Traditional Scraping

Use Case 1: Visual Change Detection

Monitor websites for visual changes by comparing screenshots over time. This catches layout breaks, content changes, and defacements that DOM-based monitoring might miss.

Node.js -- Visual change detection system
const fs = require('fs');
const crypto = require('crypto');

const API_BASE = 'https://screenshotapi-api-production.up.railway.app';
const API_KEY = process.env.SCREENSHOT_API_KEY;

async function captureAndCompare(url, label) {
  const params = new URLSearchParams({
    url,
    width: '1280',
    height: '800',
    format: 'png',
    wait: '2000',
    // Hide dynamic elements that change every load
    css: '.timestamp, .ad-slot, .cookie-banner { display: none !important; }',
  });

  const response = await fetch(
    `${API_BASE}/v1/screenshot?${params}`,
    { headers: { 'Authorization': `Bearer ${API_KEY}` } }
  );

  const buffer = Buffer.from(await response.arrayBuffer());
  const hash = crypto.createHash('md5').update(buffer).digest('hex');

  const previousHash = getPreviousHash(label); // from your database
  const changed = previousHash && previousHash !== hash;

  if (changed) {
    console.log(`CHANGE DETECTED: ${label}`);
    // Save both old and new screenshots for comparison
    fs.writeFileSync(`screenshots/${label}_${Date.now()}.png`, buffer);
    // Send alert (email, Slack, webhook)
    await sendAlert(label, url);
  }

  saveHash(label, hash); // store in database
  return { changed, hash };
}

// Monitor multiple pages
const pages = [
  { url: 'https://competitor.com/pricing', label: 'competitor-pricing' },
  { url: 'https://yoursite.com', label: 'homepage' },
  { url: 'https://yoursite.com/checkout', label: 'checkout-flow' },
];

for (const page of pages) {
  await captureAndCompare(page.url, page.label);
}

Use Case 2: Competitor Price Monitoring

Capture competitor pricing pages and use the custom JS injection feature to extract specific data before taking the screenshot.

Python -- Capture pricing with custom JS
import requests
import json

API_BASE = 'https://screenshotapi-api-production.up.railway.app'
API_KEY = 'YOUR_API_KEY'

def capture_pricing_page(url):
    """Capture a pricing page with dynamic content fully loaded"""
    params = {
        'url': url,
        'width': 1280,
        'height': 2000,      # Tall viewport to capture all plans
        'format': 'png',
        'fullpage': 'true',
        'wait': 3000,         # Wait for animations and lazy-loaded content
        'wait_for_selector': '.pricing-table, .price, [data-price]',
        'css': '''
            .cookie-banner, .popup, .chat-widget { display: none !important; }
            .pricing-table { border: 3px solid red; }
        ''',
    }

    response = requests.get(
        f'{API_BASE}/v1/screenshot',
        params=params,
        headers={'Authorization': f'Bearer {API_KEY}'}
    )

    filename = f'pricing_{url.split("//")[1].split("/")[0]}_{int(time.time())}.png'
    with open(filename, 'wb') as f:
        f.write(response.content)

    print(f'Captured {filename} ({len(response.content)} bytes)')
    return filename

# Monitor competitor pricing
competitors = [
    'https://competitor1.com/pricing',
    'https://competitor2.com/plans',
]

for url in competitors:
    capture_pricing_page(url)

Use Case 3: Visual Regression Testing

Before deploying code changes, capture screenshots of key pages and compare with the previous version. This catches CSS bugs, missing elements, and layout shifts that unit tests cannot detect.

CI/CD integration -- Compare before and after
#!/bin/bash
# Run in your CI/CD pipeline after deploying to staging

API_BASE="https://screenshotapi-api-production.up.railway.app"
API_KEY="$SCREENSHOT_API_KEY"
STAGING_URL="https://staging.yoursite.com"

# Pages to test
PAGES=("/" "/pricing" "/docs" "/blog" "/dashboard")
VIEWPORTS=("1280x800" "375x812")  # Desktop and mobile

for page in "${PAGES[@]}"; do
  for vp in "${VIEWPORTS[@]}"; do
    width=$(echo $vp | cut -d'x' -f1)
    height=$(echo $vp | cut -d'x' -f2)
    name=$(echo "${page}" | tr '/' '_')

    curl -s "${API_BASE}/v1/screenshot?\
url=${STAGING_URL}${page}&\
width=${width}&height=${height}&\
format=png&wait=2000" \
      -H "Authorization: Bearer ${API_KEY}" \
      -o "screenshots/${name}_${vp}.png"

    echo "Captured ${page} at ${vp}"
  done
done

echo "All screenshots captured. Compare with baseline in screenshots/baseline/"

Use Case 4: Archive and Compliance

Some industries require proof of what a web page displayed at a specific time. Screenshots provide court-admissible evidence of web content for:

Advanced: CSS and JS Injection for Better Captures

The CSS and JS injection features let you customize pages before capture, which is essential for visual scraping:

Examples of CSS/JS injection
// Hide dynamic/noisy elements
css: ".ad, .chat-widget, .cookie-banner, .timestamp { display: none !important; }"

// Expand collapsed sections
js: "document.querySelectorAll('details').forEach(d => d.open = true)"

// Scroll to load lazy content
js: "window.scrollTo(0, document.body.scrollHeight)"

// Click "Show More" buttons
js: "document.querySelectorAll('.show-more').forEach(b => b.click())"

// Wait for a specific element to appear
wait_for_selector: "#price-table, .loaded-content, [data-ready='true']"

Best Practices

  1. Respect robots.txt: Visual scraping is still scraping. Check the site's policies.
  2. Rate limit yourself: Do not hammer sites with requests. Space them out.
  3. Use appropriate wait times: Dynamic sites need 2-3 seconds to fully render. Use wait and wait_for_selector.
  4. Hide dynamic elements: Use CSS injection to hide timestamps, ads, and other content that changes on every load.
  5. Store metadata: Save the URL, timestamp, and viewport size alongside each screenshot for context.
  6. Use thumbnails for storage: Full-resolution screenshots add up. Use output_width to generate smaller versions for archival.

Conclusion

Visual web scraping with screenshots complements traditional scraping by capturing the rendered output of modern web applications. With CSS/JS injection and selector-based waiting, you can customize captures for precise data extraction. Whether you are monitoring competitors, running visual regression tests, or archiving content for compliance, a screenshot API provides a reliable foundation.

Start Visual Scraping

100 free screenshots/month. CSS/JS injection included on all plans.

Get Free API Key

Related Articles