I Built a Tool to Migrate 500+ Images to WebP in One Hour

I was staring at my Lighthouse report like it owed me money.

Performance: 62.

The culprit? Images. Hundreds of them scattered across markdown files, hosted on Flickr, Imgur, GitHub… all in glorious, unoptimized JPEG and PNG formats.

The manual fix would be:

Download each image
Convert to WebP
Upload to my CDN
Find and replace every URL in every markdown file

For 500 unique images? That’s not a weekend project. That’s a prison sentence.

So I did what any lazy engineer would do: I automated it.

The Problem

My blog uses Hugo with markdown files. Images are referenced everywhere:

# In frontmatter
image: "https://live.staticflickr.com/65535/54519397357_403fc67f4a_k. jpg"

# In gallery shortcodes
< gallery id="example_id">
- https://live.staticflickr.com/65535/54525108024_adbff3cc9b_k. jpg
- https://live.staticflickr.com/65535/54520449879_784f0f24ca_k. jpg
< /gallery >

# Standard markdown
![My photo](https://i.imgur.com/sXyG3GX. jpeg)

Each image had to be:

Downloaded from the original source
Converted to WebP (smaller, faster)
Uploaded to my new CDN
URL replaced in the markdown file

Multiply by 500. No thanks.

The Solution: An ETL Pipeline

I built bulk-webp-url-replacer—a Python tool that does exactly what it says:

python -m bulk_webp_url_replacer \
  --scan-dir ./content \
  --download-dir ./downloads \
  --output-dir ./webp_images \
  --new-url-prefix "https://cdn.example.com/images" \
  --threads 8

What it does:

Extract — Scans all .md files for image URLs (frontmatter, galleries, inline)
Transform — Downloads each image and converts to WebP
Load — Replaces all old URLs with new CDN paths

One command. 500 images. Done.

The Technical Bits

Regex Patterns for URL Extraction

Markdown has multiple ways to embed images. My extractor handles them all:

PATTERNS = [
    # YAML frontmatter: image: "https://..."
    re.compile(r'^image:\s*["\']?(https?://[^"\'>\s]+)["\']?\s*$'),
    # TOML frontmatter: image = "https://..."
    re.compile(r'^image\s*=\s*["\']?(https?://[^"\'>\s]+)["\']?\s*$'),
    # Gallery shortcodes: - https://...
    re.compile(r'^\s*-\s+(https?://[^\s]+\.(jpg|jpeg|png|gif|webp))\s*$'),
    # Standard markdown: <img src="https://..." alt="alt" loading="lazy" decoding="async"  />
    re.compile(r'!\[[^\]]*\]\((https?://[^)]+)\)'),
]

Parallel Downloads

Downloading 500 images sequentially? Slow. With ThreadPoolExecutor:

with ThreadPoolExecutor(max_workers=8) as executor:
    futures = {executor.submit(process_url, url): url for url in urls}
    for future in as_completed(futures):
        # Process results as they complete

8 threads = 8x faster. Simple math.

Rate Limiting & Retries

Imgur wasn’t happy with my enthusiasm. HTTP 429 errors everywhere.

The fix: exponential backoff with browser-like headers.

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...',
    'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
}

for attempt in range(max_retries):
    response = requests.get(url, headers=HEADERS, timeout=30)
    if response.status_code == 429:
        time.sleep(2 ** attempt)  # 1s, 2s, 4s...
        continue

Smart Skipping

The tool saves a mapping.json after each run:

{
  "https://old-url.com/image.jpg": "new-filename.webp"
}

Next run? It skips already-processed images. Incremental migrations FTW.

The Results

Before:

612 image references across 72 markdown files
Images scattered across Flickr, Imgur, GitHub
Lighthouse begging for mercy

After:

All images converted to WebP
Hosted on a single CDN
URLs automatically updated
One hour of work (mostly watching the progress bar)

Performance improvement:

Average image size: 60-80% smaller
Lighthouse Performance: 62 → 89

Lessons Learned

Automation scales. What would take days manually took an hour to build and minutes to run.
Rate limiting is real. Always add retries and backoff. Sites like Imgur will throttle you.
Dry-run first. The --dry-run flag saved me from accidentally breaking 72 files.
WebP is worth it. Same quality, fraction of the size. There’s no reason to serve JPEGs in 2026.

Try It Yourself

The tool is open source on GitHub.

# Preview what would change
bulk-webp-url-replacer \
  --scan-dir ./content \
  --download-dir ./downloads \
  --output-dir ./webp \
  --dry-run

# Run for real
bulk-webp-url-replacer \
  --scan-dir ./content \
  --download-dir ./downloads \
  --output-dir ./webp \
  --new-url-prefix "https://your-cdn.com/images" \
  --threads 8

Your Lighthouse score will thank you. 🚀

Example Output

After running the migration tool, the URLs are automatically updated to point to the optimized WebP versions:

# In frontmatter
image: "https://raw.githubusercontent.com/HoangGeek/store/refs/heads/main/webp/54519397357_403fc67f4a_k.webp"

# In gallery shortcodes
< gallery id="example_id">
- https://raw.githubusercontent.com/HoangGeek/store/refs/heads/main/webp/54525108024_adbff3cc9b_k.webp
- https://raw.githubusercontent.com/HoangGeek/store/refs/heads/main/webp/54520449879_784f0f24ca_k.webp
< /gallery >

# Standard markdown
<Gallery items={["https://raw.githubusercontent.com/HoangGeek/store/refs/heads/main/webp/sXyG3GX.webp"]} rows={1} />