0

Suppose I have a system where around 1000 machines will generate data from a sensor and they have to send it to a central system where this will get stored in a SQL table.

My question is if it'd generally be better to have each system connect directly to the database and insert (this is the only operation needed) data, or send it to the server using a messaging server like ie. HornetQ, and have a single (or few) instance of a software consuming data from HornetQ and writing it to the database system?

I'd like to know how the two approaches compare in terms of CPU/memory cost and scalability, specially at the server side of the system (ie, are database systems designed to handle a lot of clients).

Vitor
  • 2,734
  • 29
  • 40

2 Answers2

0

We have a system more or less like yours.

We have hundreds of data producer. And we are writing those data to mysql database with so many writers.

Having 1000 concurrent writer clients for a database is not a good idea in any world. From my experience I definitely suggest you to use a message queue in your case. It will be a buffer between your data producer and consumer. So in case of any problem on your database side -and believe me there will be- you don't have to don't anything about that on your producer side. You may stop your consumers, do your maintenance and fixes etc. without touching your producers.

About number of consumer it is completely up to you and your system requirements. You can use one or more consumers as you want and need.

There is a couple of gotcha here. If you need to write too much data means you need write-scalability you may need to use sharding but as I said it is related to size of data and number of transaction you need. You may also need to be careful about some stuff like not using auto-increment id field etc.

cool
  • 1,746
  • 1
  • 14
  • 15
  • Do you have any references to back this -> "Having 1000 concurrent writer clients for a database is not a good idea in any world. " Also, what is the problem with auto-increment fields in this scenario? – Vitor Feb 03 '16 at 10:09
  • At the end of the day you are writing to the disk. I am not sure how much data you should write but each one of your writer at least wait for disk io to complete its write operation. There will be also overheads because of 1K database connections etc. if you try to insert into same table with 1000 concurrent threads each one should wait for getting auto-incremented id and there is some kind of mutex to prevent id duplication on database side AFAIK. – cool Feb 03 '16 at 13:48
0

The advantages of using a messaging queue in between are:

  • being able to increase the number of machines generating messages and never mind about scaling your database servers.
  • control the queue consumption to keep your database healthy.
  • In case of database maintenance or crash, your machines data will not be lost.

I don't want to be too verbose, there are other advantages but I think you get the picture.

Paulo Pedroso
  • 3,555
  • 2
  • 29
  • 34