DevOps & Infrastructure

Redis Single-Node READONLY Error Mystery Solved

A production system ground to a halt, plagued by a cryptic Redis error. The culprit? Not what you'd expect for a single-node Redis setup.

Diagram illustrating a single Redis node architecture with a red error indicator.

Key Takeaways

  • A single-node Redis setup experienced a `READONLY` error due to stale client connections and transient network/Docker issues.
  • Common causes like Redis Cluster failovers and OOM errors were ruled out.
  • Configuration flaws like `maxmemory:0` and `noeviction` were identified as separate risks.
  • The fix involved securing replication commands by renaming `REPLICAOF` and `SLAVEOF`, and implementing memory limits.
  • Future debugging protocols now mandate collecting diagnostic data before restarting the affected service.

This isn’t about Redis. It’s about how the invisible cracks in our most fundamental digital infrastructure can send shockwaves through real people’s lives. Imagine a collaborative whiteboard, a shared document, a live chat – all vanishing in an instant. That’s the human cost of a technical glitch that blindsides you, leaving you scrambling in the digital dark. Our production application, a bustling hub for real-time collaboration and caching, recently experienced precisely this kind of catastrophic outage.

For months, a phantom menace haunted our system. Every few months, out of the blue, Redis would decide to crash everything. The logs? A deafening chorus of one single, infuriating error: READONLY You can't write against a read only replica. Writes failed. Reads choked. The entire real-time experience just… stopped. A quick restart of the Docker container would bring it back, a temporary reprieve, but the dread of its inevitable return always lingered.

Here’s the journey into the rabbit hole, the debugging, and the eventual triumph over this elusive Redis gremlin.

The Setup: Simple on the Surface

Before diving headfirst into logs, clarity on the battlefield was paramount. What was this infrastructure? A single Google Cloud Platform VM, modest in its specs (t2d-standard-1 with 4 GB RAM), running Redis tucked away inside a Docker container. No fancy cluster. No Sentinel. Just a lone Redis node. It sounds straightforward, right? That’s what made the READONLY error so baffling. A single node shouldn’t have a concept of being a ‘read-only replica.’

My first instinct was to verify the very role Redis claimed to be. A quick redis-cli INFO <a href="/tag/replication/">replication</a> confirmed the expected: role:master, connected_slaves:0. It was a master, with no servants. This wasn’t a permanent role-change then. Something else entirely was at play.

Ruling Out the Usual Suspects

In the sprawling landscape of distributed systems, it’s terrifyingly easy to chase ghosts. I meticulously sifted through potential culprits, discarding them one by one:

  • Redis Cluster & Sentinel: My mind immediately leaped to failovers. Had an automated process, perhaps a rogue Sentinel, demoted our primary? But no, we weren’t running Cluster or Sentinel. There was no orchestrator to trigger such a demotion or shift any precious slots.

  • Distributed Lock Failures: Could a Redlock or similar distributed locking mechanism have gone haywire? Possible, but these typically mess with consensus, not a server’s fundamental replication role.

  • The ‘Read’ Misdirection: If Redis had truly become a replica, even an unhealthy one, reads should have still functioned. The fact that both reads and writes died simultaneously was a massive clue. This wasn’t a standard replica scenario.

Memory Matters, But Not Here

Could the server be gasping for air under memory pressure? I checked the memory stats, bracing for an OOM killer scenario. used_memory_human: 1.60M. used_memory_rss_human: 15.85M. total_system_memory_human: 3.83G. Our actual dataset? A mere 672 KB. Redis was using a sliver of RAM. No OOM crash here. However, this memory check did reveal a colossal, glaring hole in our configuration: maxmemory:0 and maxmemory_policy:noeviction. This meant that if Redis did ever fill up, it would simply refuse all writes. A ticking time bomb, absolutely, but not the immediate cause of this particular, intermittent READONLY error.

The Shadowy Culprits Emerge

With the common bogeymen banished, the evidence began to coalesce around a few highly probable, yet subtle, invaders in a single-node setup:

  • Accidental REPLICAOF Command: Perhaps a transient network blip, a forgotten script, or an automation gone rogue had, for a fleeting moment, sent a REPLICAOF command. This would temporarily reassign the node’s role.

  • Stale Client Connections: Our Node.js backend and Hocuspocus websocket server maintained long-lived TCP connections. If a network flickered or a Docker container hiccuped, these connections could become stale. The client might misinterpret a lost connection as a replica state, aggressively pushing READONLY errors back to the server when it simply couldn’t reach it properly.

  • Docker/Network Instability: Temporary network partitions or even disk IO blocks during AOF/RDB saves could potentially force Redis into a peculiar protective mode that the application clients, clinging to their long-lived connections, would misinterpret.

The ephemeral nature of the issue, coupled with the complete shutdown of both reads and writes, screamed a potent cocktail of stale client connections colliding with a transient Docker or network hiccup. Restarting the container? That simply severs those dead connections, forcing a clean handshake and a fresh start.

Fortifying the Digital Fortress

To finally vanquish this demon and ensure lasting stability, a multi-pronged approach was necessary.

First, the memory time bomb was defused. By adding proper limits to /etc/redis/redis.conf:

maxmemory 2gb
maxmemory-policy allkeys-lru

To absolutely, unequivocally prevent any accidental role changes in our single-node environment, the replication commands were locked down tight in redis.conf:

rename-command REPLICAOF ""
rename-command SLAVEOF ""

This established a firm decree: This node will NEVER become a replica.

And perhaps the most critical change? A new protocol for future incidents. Next time it fails, do not restart immediately. Instead, crucial diagnostics must be run before touching anything. This preserves the exact state of failure for deeper, more precise analysis. The commands:

redis-cli INFO replication
redis-cli INFO stats
redis-cli CONFIG GET rep

This meticulous debugging process is akin to a surgeon preserving a patient’s delicate state before operating. It’s the only way to truly understand the underlying cause when the symptoms are so perplexing.

This wasn’t just a Redis bug; it was a powerful illustration of how complex systems, even when seemingly simple, can harbor hidden vulnerabilities. Understanding them requires patience, a systematic approach, and a willingness to question every assumption. The digital world is a platform shift, and these are the foundational mechanics we’re all learning to master, one solved mystery at a time.


🧬 Related Insights

Frequently Asked Questions

Will this READONLY error affect my Redis setup?

If you’re running a single-node Redis without explicit REPLICAOF or SLAVEOF command restrictions, and experience network instability or client connection issues, you could be susceptible. It’s best practice to secure these commands.

Is this error common?

This specific cause in a single-node setup isn’t extremely common, as most users employ Sentinel or Cluster for HA. However, any scenario involving network flakiness and long-lived connections could lead to similar client-side misinterpretations or temporary role changes.

Should I always rename replication commands?

For single-node Redis instances not intended to be replicas, yes, it’s a strong security and stability measure. If you are intentionally setting up a replica or using Sentinel/Cluster, you would not rename these commands.

Written by
Open Source Beat Editorial Team

Curated insights, explainers, and analysis from the editorial team.

Frequently asked questions

Will this READONLY error affect my Redis setup?
If you're running a single-node Redis without explicit `REPLICAOF` or `SLAVEOF` command restrictions, and experience network instability or client connection issues, you *could* be susceptible. It’s best practice to secure these commands.
Is this error common?
This specific *cause* in a single-node setup isn't extremely common, as most users employ Sentinel or Cluster for HA. However, any scenario involving network flakiness and long-lived connections could lead to similar client-side misinterpretations or temporary role changes.
Should I always rename replication commands?
For single-node Redis instances *not* intended to be replicas, yes, it's a strong security and stability measure. If you are intentionally setting up a replica or using Sentinel/Cluster, you would not rename these commands.

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from Open Source Beat, delivered once a week.