Some snippets for context (that I will post about later):
I have a startup idea. Before I produce an MVP, I’m going to make a pet project to test out the dev-environ that I’m predicting will be best fit.
At this point, I’ve chosen: Golang backend, AngularJS frontend, Twitter-Bootstrap for CSS, jQuery for final stage prettifying
Again, I will post later to this, but for now, I wanted a DB or general data store that was scalable and fast. So… MySQL?!?!? I mean… Cassandra, Hadoop, HBase, Riak, Voldemort, MongoDB, CouchDB, …
Why not MongoDB? I was itching to try MongoDB. It’s still growing, has a great marketing team (not a slight, you need to have buzz/devs using your product to have longevity), is easy to dev for… But my product has highly relational data. Besides, what data is truly monotonic? And who wants to deal with the MongoDB cliff for “Webscale”?
So, Cassandra, right? I was very close to betting on Cassandra. Half open-source, half-Datastax, all passionate community. Started at Facebook. Backed by Apache. Blazing write speeds. Then, I read eBay’s excellent ideas on designing an app with Cassandra in mind, rather than retrofitting Cassandra to an app. Designing a database with queries in mind rather than data in mind seemed reasonable. Not natural, but reasonable. But immediately, I began to see issues. If I needed to denormalize data this once to serve a query, what about variable, more complex queries? For a typical webapp, there are only so many queries that need to so prepared, leading to a defined number of materialized views/denormalizations. Given that my value proposition is centered around the customizable specificity of the data itself, and more generally speaking, the more relational your data is, the less it makes sense to use Cassandra. Sure, for each specific use case, your performance may be an order of a magnitude higher… for the first twenty or so denormalizations. But what happens after 100? And do you really want to have 100 sets of duplicate data that is differently structured? Gaining faster reads via duplication is one thing. Gaining it through denormalization is an entirely different story.
Then I read about Twitter‘s decision. Digg‘s lesson. Facebook‘s efforts, especially as the originators of Cassandra. Youtube‘s Vitess. Let me simplify: Scale is a good problem to have. It means you have both a product and users. And the companies dealing with scale, have either stuck with a MySQL variant or are failing. I suspect because of Cassandra’s ineffectiveness as a general data store.
TL:DR? RDBMS has survived decades. There’s a reason. If bits of your data gains any meaning when connected with other bits of [your] data, and/or you want your app to be agile and able to pivot/add features at any point in the future without massive DB/code refactoring/rewrites, go old-school and use MySQL or a SQL variant. The DB gods have a sense of irony, it seems.