Stop Bloating Your Database: The S3 Pointer Pattern
Why 10MB JSON blobs kill your buffer pool, and how to keep MySQL lean with S3 + presigned URLs.
The day I almost bricked production.
It started with a feature request that sounded innocent enough:
“We need to show the raw transaction logs in the admin panel. Just dump the JSON into the DB so we can list them.”
I was young. I was naive. I thought, “Sure, MySQL has a JSON column type. What could go wrong?”
The logs were 10MB.
The table grew.
And then, one random Tuesday, the database choked.
CPU spiked to 99%. Queries that usually took 10ms were taking 10 seconds. My phone started vibrating with PagerDuty alerts, and I was sweating through my shirt.
Here’s what I learned the hard way using massive JSON blobs in a relational database is a surefire way to kill your performance.
The Trap: “It’s Just Data, Put It In The DB”
It feels logical. You have a User. Users have Logs. Put the Logs in the database.
But when those logs are massive blobs (10MB+), you aren’t just “storing data.” You are fundamentally breaking how a database works.
Why Your DB Hates You Now
Databases like MySQL live and die by their Buffer Pool (RAM). You want your RAM filled with useful stuff:
- Index pages (so lookups are instant).
- Hot rows (data people actually access).
When you stuff a 10MB blob into a row, that row becomes a Buffer Pool Bully.
It walks into the club, kicks out all the lean, efficient index pages, and sits there taking up space. Even if you just want to read one field, the DB often has to drag that entire fat payload from disk into RAM.
Your cache hit ratio tanks. Your disk IOPS scream. And your users leave.
I built a simulation so you can see exactly what happens inside your RAM:
BLOB
See what happened? That red bully just nuked your performance cache.
The “Solutions” That Don’t Work
I tried these. Do not try these.
1. “Let’s Use Athena / Data Lake”
The logic: “S3 is cheap, let’s dump it there and query it!” The reality: Athena is designed for analytics, not for your user clicking “View Details”. It takes seconds. Users don’t wait seconds. They rage-click.
2. “The Hybrid Approach”
The logic: “Store small JSON in DB, move big ones to S3!” The reality: Now you have two code paths. Two sources of truth. And an onboarding doc that takes 30 minutes to explain “where the data is.” Don’t be cute. Be boring.
The Smart Fix: The S3 Pointer Pattern
Here is the “Secret Sauce” (it’s actually really standard, but feels like magic).
The DB stores: A pointer (s3_key).
The S3 bucket stores: The fat blob.
That’s it. Your database row is now tiny. It fits in the index. It stays in RAM.
The Flow
1. User asks Backend for data.
2. Backend checks DB: "Oh, file is at logs/123/cat.json"
3. Backend tells S3: "Give me a temporary URL for this file."
4. Backend gives URL to User.
5. User downloads from AWS directly.
Your backend server? Chill. Your database? Chill. Amazon’s massive servers? They carry the heavy load.
“Wait, isn’t direct S3 access dangerous?”
I can hear you screaming: “You want me to open my S3 bucket to the internet? Are you crazy?!”
Calm down. I’m not saying “Public Read”.
I’m saying Presigned URLs.
These are magic tokens your backend generates. They look like this: https://s3.aws.../cat.json?signature=123xyz&expires=300.
- They are valid for 5 minutes (or whatever you set).
- They only unlock one specific file.
- They are cryptographically signed by your backend.
Why you shouldn’t “proxy” the file
You might think: “I’ll just download the file to my backend and pass it to the user. That feels safer.”
This is the “Proxy Trap”.
| Strategy | Backend Impact | Result |
|---|---|---|
| Direct S3 | Zero. Backend signs a tiny string. CPU sleeps. | 🚀 Fast. Scalable. |
| Proxy | Massive. Backend holds 10MB in RAM for every request. | 🐌 Slow. OOM errors. |
Unless you are watermarking a PDF on the fly or working around a strict corporate firewall, don’t be a proxy. Let S3 do its job.
”Show me the Code”
Okay, here is how stupidly simple this is.
1. The Schema
Your DB row loses weight immediately.
CREATE TABLE logs (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
s3_key VARCHAR(255) NOT NULL, -- The only thing that matters
details JSON, -- Store SMALL metadata here if you need to query it
created_at TIMESTAMP,
PRIMARY KEY (id)
);
2. The Read Path (Python/Boto3)
def get_log_download_url(log_id, user):
# 1. Check if user owns this log (CRITICAL STEP)
log = db.fetch_log(log_id)
if log.owner_id != user.id:
raise Forbidden("Nice try, hacker.")
# 2. Ask AWS for a temporary pass
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-bucket', 'Key': log.s3_key},
ExpiresIn=300 # 5 minutes
)
# 3. Give it to the frontend
return {"url": url}
Check the Receipt (Cost)
If you are still unconvinced, let’s talk money.
Storing 1TB of junk in RDS (SSD storage) costs about $115/month. Storing 1TB of junk in S3 Standard IA costs about $12.50/month.
You are paying 10x more to make your database slower. Make it make sense.
TL;DR
- Don’t put 10MB blobs in MySQL. It bullies your Buffer Pool.
- Do store files in S3 and keep a pointer (
s3_key) in the DB. - Do use Presigned URLs so your backend doesn’t have to carry the payload.
- Don’t overcomplicate it.
Simple scales. Clever fails.
Related posts
-
Yellorn: The Browser Tab I Wanted for Dirty Data and Webhook Debugging
A human tour of Yellorn: repair broken JSON, XML, YAML, and CSV, publish mock webhook APIs, send HTTP requests, and keep integration debugging in one focused browser workspace.
-
Production Google OAuth In One Prompt: I Let Cursor Drive Chrome DevTools MCP
I asked an AI agent to wire production-grade Google OAuth into yellorn.com. One prompt later, it had clicked through Google Cloud Console, configured the consent screen, registered the OAuth client, stored the secrets in Cloudflare, and shipped to prod. I just sipped coffee.
-
MoneyPrinterV2: What 18,000 Stars Worth of Automated Content Actually Looks Like
An assembly line for AI content — local LLMs write the script, KittenTTS reads it, Gemini paints the pictures. The video uploads itself.
-
Project N.O.M.A.D.: The Knowledge Bunker You Build for a Rainless Day
When the cloud evaporates, what stays on your disk matters.