Blob Storage Patterns in MySQL

notes

This note covers the three main approaches to storing binary data alongside MySQL — inline BLOBs, external file references, and hybrid patterns — with specific guidance on when each one makes sense and where they fall apart in practice.

The Three Approaches

When you need to store files (images, PDFs, serialized objects, small backups) and your application already uses MySQL, you have three basic options:

  1. Store the bytes directly in a BLOB column. MySQL supports TINYBLOB (255 bytes), BLOB (64KB), MEDIUMBLOB (16MB), and LONGBLOB (4GB). The data lives in the table, managed by InnoDB’s buffer pool just like any other row data.

  2. Store files on the filesystem (or object storage) and keep a path or URL in a VARCHAR column. The database becomes an index; the actual bytes live elsewhere.

  3. Hybrid: small objects go into the database, large objects go to disk, with a size threshold that you tune based on your workload.

Most teams pick option 2 reflexively, and honestly, for most workloads that is the right call. But the answer is less obvious than it first appears.

When Inline BLOBs Actually Work

Inline BLOB storage makes sense when:

  • Objects are consistently small (under 1MB, ideally under 256KB)
  • You need transactional consistency between metadata and the binary data
  • The total dataset fits comfortably in your available storage
  • You want backup/restore to be a single mysqldump operation

The InnoDB storage engine stores BLOB data on overflow pages when the row exceeds the page size. For the default 16KB innodb_page_size, InnoDB keeps the first 768 bytes of a BLOB inline with the row and pushes the rest to dedicated overflow pages. This means small BLOBs have decent read performance — they’re already on the same page as the row.

CREATE TABLE documents (
  id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  content_type VARCHAR(100) NOT NULL,
  data MEDIUMBLOB NOT NULL,
  size_bytes INT UNSIGNED NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_name (name)
) ENGINE=InnoDB;

Here’s what breaks in practice: once you have a few thousand rows with multi-megabyte BLOBs, your buffer pool fills up with overflow pages. Regular queries on the same table start evicting useful cached pages. SELECT * becomes lethal because every row now drags in megabytes of overflow data. Your DBA will find you.

The Filesystem Approach

For most production systems, the filesystem (or S3/GCS/R2) is the right home for binary objects:

CREATE TABLE documents (
  id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  content_type VARCHAR(100) NOT NULL,
  storage_path VARCHAR(500) NOT NULL,
  size_bytes BIGINT UNSIGNED NOT NULL,
  checksum CHAR(64) NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_name (name)
) ENGINE=InnoDB;

Your database stays lean. Backups are fast. Queries don’t accidentally drag in gigabytes of binary data. The tradeoff is that you lose atomic consistency between the metadata row and the file on disk — you can delete a row without deleting the file, or vice versa. You need cleanup processes. Orphaned files accumulate. It is manageable, but it is work.

The Hybrid Threshold

The interesting engineering decision is where to set the threshold in a hybrid approach. Based on testing across several workloads, the sweet spot is usually between 256KB and 1MB. Below the threshold, store inline. Above it, store externally.

-- Pseudocode for the hybrid insert
IF file.size <= 256KB THEN
  INSERT INTO documents (name, data, storage_path)
  VALUES (?, file.bytes, NULL);
ELSE
  WRITE file.bytes TO /storage/{hash};
  INSERT INTO documents (name, data, storage_path)
  VALUES (?, NULL, '/storage/{hash}');
END IF;

You can read more about InnoDB’s BLOB storage internals and row format details in the MySQL documentation on InnoDB row formats.

What Actually Goes Wrong

The failure modes are predictable:

  • Buffer pool pressure: large inline BLOBs thrash the buffer pool. Monitor Innodb_buffer_pool_reads and Innodb_buffer_pool_read_requests to catch this early.
  • Replication lag: BLOB-heavy writes increase the binary log size. Replicas fall behind during bulk inserts.
  • Backup bloat: mysqldump of a BLOB-heavy table is painfully slow and produces enormous files.
  • Orphaned files: the filesystem approach leaves dead files around unless you run periodic reconciliation.

None of these are showstoppers. They are all operational costs you should plan for rather than discover in production at 2 AM.

Recommendation

Default to external storage with database references. Use inline BLOBs only when you need transactional atomicity between metadata and binary content and the objects are reliably small. Set a size threshold and enforce it at the application layer. Monitor the buffer pool if you go the inline route.

The right answer depends on your specific workload, access patterns, and operational maturity. There is no universal best practice — just tradeoffs you should understand before you pick one.