I’m trying to learn more about NoSQL as I’m building a key-based archiving app in a Linux/PHP architecture. Can anyone explain the differences between the major solutions (CouchDB/MongoDB/etc) as to the advantages/disadvantages to each? Links would be great, though I’m having a hard time doing research using Google alone.
http://aphyr.com/tags/Jepsen this series is excellent if you want to see how a lot of databases handle network problems:
We’re going to learn about distributed consensus, discuss the CAP theorem’s implications, and demonstrate how different databases behave under partition.
Some bloguer posted a visual answer to this question about a month after this question was posted on stackoverflow.
Interesting since it positionned the different available solutions in reference to the CAP theorem.
I would just add that Cassandra can fit on both sides of ‘P’ depending if you always query using quorum or not.
Note : The author arbitrary put RDBMS and data-warehousing solutions on the CA side of the triangle. I know an available non partition-tolerant is controversial, but that’s not the point.
here is a comparison of MongoDB, Cassandra, Riak, CouchBase 2.0, HBase distributed on HDFS using ZooKeeper, Berkeley db 11g (java Ed HA) and Oracle NoSQL 11g
The author has basically gone through all their documentations and quoted the areas that describe their behavior in the following 5 categories:
- Internal partitioning
- Automated flexible data distribution
- Hot swappable nodes
- Automated failover strategy
and then provided short quotes for each of the them.
I think it’s pertinent to also look at the clustering features of the various solutions, because a major use case of NoSQL is scaling out. Here is a brief overview with links to vendor-supplied info:
- Couchbase – each node comes with a cluster manager, there is no central cluster coordination component. Vendor says the Couchbase server scales linearly with each node added to the cluster. There is also a feature called XDCR which provides replication across different geographical locations. CouchBase Cluster Overview
- MongoDB – offers a sharding architecture in which the data is split into shards, there is a config server which maps data to shards, and a Mongo instance that delivers the data to the client application (clients do not access shards directly). The vendor notes that sharding is a highly complex operation. MongoDB Cluster Overview
- Redis open source – cluster feature is under development, currently in Alpha release, will offer live reconfiguration, fault tolerance and pub/sub. Vendor has announced that some Redis commands will not be supported in cluster mode – complex multi key operations Set type unions or intersections, and all operations where keys are not available in the same node.
Cluster Feature Specs
- Redis Cloud (commercial) – cloud service with a working clustering feature. Vendor says they are able to scale up on demand, adding more shards dynamically, and that all Redis operations are supported. Redis Cluster Overview
- Riak – clustering is built in, data automatically partitioned between the Riak nodes. Nodes can be added and removed from the cluster dynamically and Riak will redistribute the data accordingly. Vendor says the product is designed to be distributed, and core operations such as read/write and map/reduce actually become faster when more nodes are added. Riak Cluster Overview