There's a lot of talk about scalability of applications and frameworks. A whole subculture has formed around distributed databases like CouchDB and Hbase. Great, but the truth is, most people don't need scalability (as Ted Dziuba harshly points out in his "I'm Going To Scale My Foot Up Your Ass" article.) I've seen only a handful of projects that grew fast enough that they have had to worry about scale, LiveJournal, Digg and Twitter all had growing pains. In most cases they fixed the problems with better programming, and only a sprinkling of neat tools.
In the sites I've had to support, the problem has not been scaling, its redundancy. The web has placed tremedous pressure on even small sites for 24x7 operation. As I've said before, datacenters are not reliable. No matter what Tier the provider claims to be, you're never immune from outages at the datacenter level. Geographical diversity is important.. you don't want your server located in the next World Trade Center, or in Houston or New Orleans during hurricane season. My current provider has a datacenter "conveniently " near an airport.. as in, practically under a runway approach path, and they are only one of several datacenters in that business park.
I want geographic diversity, and my customers want active/active or zero failover time. I want the "distributed" part of CouchDB, without having to teach programmers new tricks. SQL is not good in this area. You can have replication, hot spares and what haves but it never works fast, or cheap, or perfectly. Most SQL databases are designed to be consistent, and true consistency is hard to do over high or variable latency connections. Some apps can function in an "eventual consistency" model, and many more could if programmers gave it some thought during the design.
The first database to give me SQL like syntax and function, good replication and reliable eventual consistency will make me very happy. In the meantime I'll watch the skies and wring my hands.
Update: High scalability covered the same topic, but with different conclusions.
In the sites I've had to support, the problem has not been scaling, its redundancy. The web has placed tremedous pressure on even small sites for 24x7 operation. As I've said before, datacenters are not reliable. No matter what Tier the provider claims to be, you're never immune from outages at the datacenter level. Geographical diversity is important.. you don't want your server located in the next World Trade Center, or in Houston or New Orleans during hurricane season. My current provider has a datacenter "conveniently " near an airport.. as in, practically under a runway approach path, and they are only one of several datacenters in that business park.
I want geographic diversity, and my customers want active/active or zero failover time. I want the "distributed" part of CouchDB, without having to teach programmers new tricks. SQL is not good in this area. You can have replication, hot spares and what haves but it never works fast, or cheap, or perfectly. Most SQL databases are designed to be consistent, and true consistency is hard to do over high or variable latency connections. Some apps can function in an "eventual consistency" model, and many more could if programmers gave it some thought during the design.
The first database to give me SQL like syntax and function, good replication and reliable eventual consistency will make me very happy. In the meantime I'll watch the skies and wring my hands.
Update: High scalability covered the same topic, but with different conclusions.

Leave a comment