Should I use Cassandra for a 100,000 user project? In MySQL 5, I have full-text search and table partitioning. I’m starting a Q&A system like SO with CodeIgniter. It’s a move from vBulletin to a new system. In the old vBulletin system I had 100,000 users, with a total post count around 80,000. In the next 3 or 4 years, I expect there will be more and more users and posts both. So, should I use Cassandra instead of MySQL 5?
If I use Cassandra, I need to change from Grid-Service to Dedicated-Virtual hosting at Media Temple. Because Cassandra is not provided as part of a hosting system, I need to use a VPS or DV server solution. If I use MySQL, hosting is not a problem, but then what about performances, search speed.
By the way, what database is Stack Overflow using?
You say 100,000 users – but how many concurrent users?
Cassandra is not built in hosting system
Using a hosted service on a single server suggests a very small scale operation – and your obviously limited by your budget. There’s certainly no advantage running Cassandra on a single server node.
In mysql 5 have full text search
Which is not a very scalable solution – you should definitely think about using a normalized search (which I believe you’d have to do if you were migrating to Cassandra anyway).
Given that you can comfortably scale the MySQL solution to multiple databases using replication before you even think about fully clustered solution, and you obviously don’t have the budget to do your own hosting, migrating to Cassandra seems like a massive overkill.
From the information you provided, I would suggest to stick to MySQL.
Just as a side-note, Facebook was using MySQL at first, and eventually moved to Cassandra only after it was storing over 7 Terabytes of inbox data, for over 100 million users.
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I would NOT recommend you using cassandra in your case for the following reasons:
Cassandra needs good understanding of the application you’re building. It will be much harder to make changes and to run complex queries against data stored in cassandra. SQL is more flexible and easier to maintain. Cassandra is good when you need to store huge amounts of data and when you know exactly how the data stored in cassandra will be accessed and sorted.
Mysql works fine for millions of rows if properly indexes are built.
If you hit some bottlenecks in the future with mysql, you may look at what exactly your problems are and scale them using cassandra. I mean you must be able to combine both approaches: SQL and noSQL in the same project.
With regards to mysql full-text index I can say that it’s useless. I mean that it works too bad to be used in high-loaded projects. Look at sphinxsearch.com, which is a great implementation of full-text search made for sql databases.
But if you expect that your system grows fast and is going to serve millions of users, you should consider cassandra since the beginning.