-1

I'm storing lots of variables in APC per second. At the same time, I have a CRON task which continuously executes a php process for reading the variables from APC, deleting it from APC and storing it in database. The php script get all variables identified by a prefix. The problem is that while a PHP process access to APC, read all variables, delete them from APC and inserts it on database, another process (launched also by CRON) could read the same data because it has not been deleted yet and I would be duplicating data in my database. Is there any solution for this? Maybe it is an APC limitation?

Thanks in advance.

Mark

mark
  • 9
  • 1
  • 1
    You shouldn't be using the cache in the first place. Use the database directly. And switch to Postgres if MySQL can't handle the load. – Denis de Bernardy Jun 07 '13 at 09:32
  • There are many solutions available, from creating a daemon php process which has a thread that saves to APC and a thread that saves data to DB. Having the db with unique constraint set up will take care of having duplicates. Also, advices such as the one above my comment are something you should ignore and focus on getting concurrency working first. – N.B. Jun 07 '13 at 09:38
  • Even though a message queue certainly sounds like the right way to go here just for the sake of simplicity and keeping on track of your current implementation: when you say 'another' cron, do you mean a completely different job or the same job just a later (overlapping) run ? – smassey Jun 07 '13 at 09:57
  • #Denis, thanks for the answer, but we are expecting to have a scenario where the database cannot manage with all these load. Because of that, we want to create a buffer with APC and have a process to store via big block data inserts (instead of doing an insert for every request). Thanks for the answer. – mark Jun 07 '13 at 10:38
  • #N.B Thanks for the answer, creating a daemon sounds good but I have had bad experiences with daemons and databases because of keeping open connections during so much time and because of that, the database memory load be highly increased. Nevertheless, I will consider it. Also you have mentioned 'unique constraint'. My data will not be unique. Suppose that I'm collecting data for analyzing it later. It is possible to have repeated data if I collect it, but I don't want to detract my data because concurrency problems when inserting it on database. Thanks for your answer. – mark Jun 07 '13 at 10:40
  • A daemon will have only 1 open connection to the database, there is no need to have N connections. As for unique constraints, your APC entries have keys. Just add the keys to the insert and have a unique constraint on that, have it autoincrement in APC or similar and you don't have to worry about any kinds of duplicates. – N.B. Jun 07 '13 at 10:50
  • #smassey, thanks for your answer. I'm referring to 'the same job just a later (overlapping) run'. Where the problem is this 'overlapping' that could read APC data already read by the previously executed job but not deleted yet from APC, and as a result of this overlapping, have duplicated (or triplicated...) stored data. Thanks for your answer. – mark Jun 07 '13 at 10:50
  • Thanks again N.B. I wanted to mean only one connection, sorry. When I worked with dat, because of having it opened along the time, it used to consume ram exponentially. Also the key storage could be a nice solution ;) Thanks for the answer! – mark Jun 07 '13 at 11:02
  • That's why you optimize it in a way that after 10-20 inserts, you kill the connection and open a new one, much similar to what PHP-FastCGI does with PHP processes. However, having a buffer infront of the database in form of APC isn't the best solution. This looks like a huge misconfiguration of the database itself. On today's SSD's, you can achieve at least 3000 inserts per second (on my regular SSD without any kind of raid @ my test system, I can get about 4k inserts per second). I would look into optimizing the db before performing a full-fledged PHP solution. – N.B. Jun 07 '13 at 12:31

1 Answers1

0

First, I'd love to know out of curiosity the reasoning behind this solution! Very interested!

But this is a common problem with timed jobs. Your more serious concern is that if one of your cron jobs takes a long time to complete, and another one starts, it could double the load on your server, taking twice as long to complete, then you could end up in a loop that eventually takes your server down. It's an interesting problem!

I've solved it previously by setting a switch that notifies the code a previous job is running. When your job finishes, remove the switch, and next time your cron will run again. You can put that switch in APC.

dKen
  • 3,078
  • 1
  • 28
  • 37
  • Hello dKen, thanks for your answer. I will bear in mind your idea about the switch. The reasoning is quite simple, we expect to collect a lot of data and we want to store it on database. We have thought about creating a receiver: php who is attending requests and store it in APC, instead of saving it on database, because we expect a lot of traffic and doing an insert for every request would knock down our database. And on the other side, we have another php who reads big blocks of data from APC and stores it with a unique insert into database. Thanks for the answer! – mark Jun 07 '13 at 10:59
  • No problem, good luck. It may be worth considering some load testing and other due diligence on your database to prove that you need to build a solution like this though. I just worry your solution is more complex that it may need to be which will give you more headaches down the road. Database overhead is pretty small, and I find it's almost always the code and the way it's written that is slower than the database itself. A 30-minute load test could save you days of dev and pain, and you can always add your APC solution afterwards, but good luck anyway :) – dKen Jun 07 '13 at 11:05