0

I am learning about the characteristics of distributed database and I came across this website that describes some of the advantages of distributed database: https://www.atlantic.net/cloud-hosting/about-distributed-databases-and-distributed-data-systems/

According to that site, the advantages of distributed database are listed below:

Reliability – Building an infrastructure is similar to investing: diversify to reduce your chances of loss. Specifically, if a failure occurs in one area of the distribution, the entire database does not experience a setback.

Security – You can give permissions to single sections of the overall database, for better internal and external protection.

Cost-effective – Bandwidth prices go down because users are accessing remote data less frequently.

Local access – Similarly to #1 above, if there is a failure in the umbrella network, you can still get access to your portion of the database.

Growth – If you add a new location to your business, it’s simple to create an additional node within the database, making distribution highly scalable.

Speed & resource efficiency – Most requests and other interactivity with the database are performed at a local level, also decreasing remote traffic.

Responsibility & containment – Because any glitches or failures occur locally, the issue is contained and can potentially be handled by the IT staff designated to handle that piece of the company.

However, parallelism (I mean not concurrent write, but processing data in parallel in each node) is not on the list. This makes me wonder: are all distributed databases (i.e. Mongo DB, Cassandra, HBase) designed to process data in parallel? If this is false, which distributed databases support parallel processing and which ones don't?

To find out what I mean by Parallel Processing (not concurrent write), please see: https://softwareengineering.stackexchange.com/questions/190719/the-difference-between-concurrent-and-parallel-execution

Stanleyrr
  • 858
  • 3
  • 12
  • 31
  • 1
    Look, you may be confusing with the parallel query processing of databases. In distributed environment parallel processing is done with the help of MapReduce. As such, Neither MongoDB nor Postgresql provides parallelism in queries. They all do batch processing. – Abhinav Jun 12 '18 at 05:03
  • 2
    Hadoop nor Hive are databases. HBase is – OneCricketeer Jun 12 '18 at 13:20
  • @Abhinav, if I understand correctly, Cassandra NoSQL does real-time transactions and not batch processing. But Cassandra NoSQL can not do parallel processing? – Stanleyrr Jun 12 '18 at 23:23
  • Thanks @cricket_007. I had corrected it. – Stanleyrr Jun 12 '18 at 23:24
  • 1
    There's no reason Cassandra can't do batch operations, in my opinion. I'm fairly sure all tools you listed distribute workloads over multiple servers (maybe excluding Mongo) (Couchbase is the other scalable document store, I know of). But it all comes down to the fact that the data is usually indexed and stored only on a single server. Even if replicated, there is often a "leader" in the list of replicas that receives the query and processes it. All of which comes with the cost of eventual consistency – OneCricketeer Jun 13 '18 at 00:04
  • I see @cricket_007. I think my main ambiguity lies on the difference between concurrent write (there are overlaps but task from each node doesn't have to start at the same time) and parallel write (task from each node actually starts at the same time). Maybe the way Cassandra distributes workloads over multiple servers relates to concurrent write and not strictly parallel write. Is my assumption correct about this? – Stanleyrr Jun 13 '18 at 00:22
  • I'm only here for the Hadoop tag. As far as I know, Cassandra requires a primary key on the namespace, which is given to a token() or hash function, and that determines the server that receives the data. All read or write requests go through that process. – OneCricketeer Jun 13 '18 at 01:21
  • @Stanleyrr You are right. Cassandra performs real-time data processing. But not the MongoDB or Postgres as they're not NoSQL databases – Abhinav Jun 13 '18 at 04:55

0 Answers0