Too much of SPOF

SPOF: Single point of failure. I know the title of this post is grammatically incorrect but bear with me.

I had an annoying issue pop up today. My blog stopped responding to all requests without any changes from me. At first I thought it might be related to digital ocean’s downtime but then that was completely unrelated (was not affecting nodes in SF2 region). SPOF #1.

Then I realized something might be happening at the Cloudflare level.
I love using Cloudflare for virtually any projects outside of work because it combines all the networking related work into one place. SPOF #2.

Now comes all the various plugins I’ve accumulated on my blog over the past 2-3 years that have added maintenance load and cruft that is unnecessary. Each one of those is a point of failure, so I won’t say plugins are a single point of failure but many little issues waiting to happen, dependent on each plugin maintainer.

There’s just too much of this Single Point of Failure waiting to happen. I felt like I was chasing a digital criminal within one system. This is why devops isn’t my favorite thing but often, you’re faced with doing it.

After about 2 hours of debugging, countless restarts, restoring from a backup, I wasn’t able to tell what was happening.

What I am able to tell is that no requests were making it all the way to apache or the wordpress PHP runtime since no logging was happening there and I was able to SSH into the machine. What might have caused all this to happen was some wonky behavior at the Cloudflare level that redirected too many times. This crops up early on when setting up things with Cloudflare but once it gets going, CF works like magic, until it doesn’t. SPOF again.