CompanyOctober 16, 2019

How to Avoid Spending the Holiday Season in the Server Room

How to Avoid Spending the Holiday Season in the Server Room

Maybe not the server room—but sitting out Thanksgiving dinner with a laptop on a remote desktop session, browsing database logs, or hopping between database management consoles diagnosing unplanned downtime or database bottlenecks, can really put a dampener on the holidays. Unfortunately, getting “that text” is more common than you think. And in most cases, it is avoidable.

Server Room

For example, downtime. In a recent survey,1 one in three companies reported they had experienced preventable outage incidents or severe degradation in just the last twelve months. And here’s the kicker, 60% of them said the issue could have been prevented with better management, processes, or configuration. During the holidays uptime becomes even more critical because just minutes of an outage can result in hundreds of thousands in lost revenue at peak demand periods like Cyber Monday. What was perhaps forgivable in the summer, becomes a Code Red on Black Friday.

While there can be many causes for unplanned downtime, given today’s reliable public cloud infrastructure and redundant private cloud data centers, increasingly it isn’t due to hardware, server, storage, or network issues. And with modern applications supported by cloud vendors often running on fault-tolerant infrastructure, using modern distributed architectures and dedicated ops staff to seamlessly managed upgrades and maintenance, it’s becoming less due to those either. So, what gives?

Database infrastructure is now hitting extreme complexity levels.

That’s a problem for the holiday season.

Increasingly, what’s to blame are databases that are run by enterprises as part of their in-house stack. Because modern web applications and reengineered business processes around processing web orders, multiple online payment options, omnichannel inventory management and fulfillment, and connecting customer experience touchpoints all often rely on a sprawl of database deployments across hybrid cloud, multi-cloud, and on-premises infrastructure. Take a closer look and you’ll find it’s often a blend of legacy relational databases, key-value, columnar, document, and analytical databases, and other models. While the prevalence of APIs has been a boon to innovation and has enabled application developers to tap into every existing database in their stack, it’s come at a cost, one that’s compounded by AppDev teams spinning up more, often proprietary, database technology silos running on Amazon Web Services, Microsoft Azure, or Google Cloud Platform for point needs and data model one-offs during digital projects. Think of it as a growing database deployment hairball.

All of this adds up to an enterprise database environment that has become more fragmented than ever, and often IT, DBAs and operations are left holding the bag at the holidays—with different vendors, admin consoles, configuration, ways to trace errors, provisioning needs, and scalability characteristics. And getting proficient with database tools and best practices when there is so much proliferation and database diversity is often a hard ask, which all places a drag on operational effectiveness. That’s why database complexity is increasingly the issue behind downtime events, and one that’s harder to trace and fix than before.

Most database architectures really don’t make gracefully handling uptime easy either. Whether in a master/slave, multi-master, or a sharded database arrangement, they still often have single points of failure, and in many cases create a big maintenance overhead in the process. If your database stack sounds like this, then your risks of an untimely Christmas RDP or SSH session are perhaps a little higher.

One way to back holiday risks off is to use a peer-to-peer masterless database architecture that has no failure points. For example, eBay connects over 170 million buyers and sellers across over a billion listings. By running a peer-to-peer, fully distributed masterless database architecture that’s deployed across all their data centers globally, eBay can meet uptime goals while at the same time handle data velocity demands of more than 5B database transactions a day. It’s a way to handle both uptime and performance, without adding overhead.

Performance issues can turn into critical IT escalations way faster during the holidays.

With peak loads sometimes hitting 5x more than average, everything from ecommerce customer sign-ins, in-store/web inventory lookups, and order tracking, can all start to slow database performance and that’s a recipe for an IT escalation, especially when just a one-second delay in responsiveness can result in a 7% reduction in sales.2 Even though cloud infrastructure is way more elastic than on premises, the problem is that the database infrastructure that runs on that same infrastructure is often less so, because it’s just too difficult to provision new nodes quickly to meet demand.

That all creates an ops issue, where IT and DBAs often must choose between overprovisioning database infrastructure to safely meet anticipated demand and ending up with more capacity, costs, and complexity than they need, or face a performance issue when tremendous and sometimes unexpected bursts in activity can bring databases to their knees.

Walmart uses a different approach to ensure lights-out responsiveness during the holidays, and every day. By using a distributed database architecture that runs across their data centers and clouds, they can dynamically scale database nodes across all their infrastructure on the fly, and then shrink it down automatically when demand ticks down to a normal level. It’s all responsive to volume, no matter the spike in transactions and queries, while avoiding overprovisioning and the issues that come with it.

Evaluating your risk level is easier than you think.

By working with enterprises like Capital One, Bank of America, McDonald’s, and Macy’s, we’ve built a data platform to handle modern business applications along with some serious experience in how to architect infrastructure for the holidays, and we’re here to help. Because understanding your exposure, and identifying where the risks are, is the first step to architecting your stack to meet the holidays with more confidence—and to avoid playing blame-game pingpong on Super Saturday.

Take the first step with our quiz, “Will I Spend the Holidays in the Server Room?”, and you’ll not only learn your risks, but some of the other common issues that can lead to “that text” at the worst holiday moment.

References

1 Uptime Institute 2019
2
 Kissmetrics

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.