2

I am a programmer at an internet marketing company that primaraly makes tools. These tools have certian requirements:

  • They run in a browser and must work in all of them.
  • The user either uploads something (.csv) to process or they provide a URL and API calls are made to retrieve information about it.
  • They are moving around THOUSANDS of lines of data (think large databases). These tools literally run for hours, usually over night.
  • The user must be able to watch live as their information is processed and is presented to them.

Currently we are writing in PHP, MySQL and Ajax.

My question is how do I process LARGE quantities of data and provide a user experience as the tool is running. Currently I use a custom queue system that sends ajax calls and inserts rows into tables or data into divs.

This method is a huge pain in the ass and couldnt possibly be the correct method. Should I be using a templating system or is there a better way to refresh chunks of the page with A LOT of data. And I really mean a lot of data because we come close to maxing out PHP memory and is something we are always on the look for.

Also I would love to make it so these tools could run on the server by themselves. I mean upload a .csv and close the browser window and then have an email sent to the user when the tool is done.

Does anyone have any methods (programming standards) for me that are better than using .ajax calls? Thank you.


I wanted to update with some notes incase anyone has the same question. I am looking into the following to see which is the best solution:

  • SlickGrid / DataTables
  • GearMan
  • Web Socket
  • Ratchet
  • Node.js

These are in no particular order and the one I choose will be based on what works for my issue and what can be used by the rest of my department. I will update when I pick the golden framework.

RachelD
  • 4,072
  • 9
  • 40
  • 68
  • You should look into web sockets. And instead of sending all of the data over and over again, just send new data. – Rob Feb 01 '13 at 17:04
  • I wrote something like that once. You want my advice? Forget about PHP and JS. Go back to design board and make a proper application in C++. It's not worth time and effort - it'll never work efficiently, especially if you have complicated relations between objects (rows of data). – MarcinWolny Feb 01 '13 at 17:12
  • I'm running a database with 2200 users and over 200 tables. The largest dataset is just under 2 Million records and that includes geospatial polygon data. I'm running that off of a 5 year old 2GB dual core PC. I'm not seeing any kind of server lag even when I calculate relative distances to the center of a polygon across North American zip codes. When you say you're running processes that execute for hours, sometimes over night, I'm concerned there's something else going on that doesn't have anything to do with the size of your dataset. How many datasets are you processing? – Strixy Feb 01 '13 at 17:27
  • @Rob Web sockets looks very interesting and Im going to read the documentation this weekend. Thank you. – RachelD Feb 01 '13 at 18:00
  • @MarcinWolny You are completely correct. I know that PHP is NOT the best language for the job here however its the requirements of the project :( – RachelD Feb 01 '13 at 18:01
  • @Strixy Its a crawler of sorts. It mainly uses CURL calls to gather data which is why it takes so long. And Im processing around 100,000 rows that gets boiled down to about 10,000 curl calls. Again its the requirements of the project. – RachelD Feb 01 '13 at 18:02

3 Answers3

1

First of all, you cannot handle big data via Ajax. To make users able to watch the processes live you can do this using web sockets. As you are experienced in PHP, I can suggest you Ratchet which is quite new.

On the other hand, to make calculations and store big data I would use NoSQL instead of MySQL

seferov
  • 4,111
  • 3
  • 37
  • 75
  • I agree 100% with your first statement. Ajax is not the answer here. What happened was I was told to copy a tool that I thought was only used for smaller data-sets and ajax handle it just fine. And then the scope creep happened.... so now I'm trying to learn the correct way so I never dam myself to Ajax hell again :) I will read up on Rachet and NoSQL thank you very much. – RachelD Feb 01 '13 at 18:04
1

Since you're kind of pinched for time already, migrating to Node.js may not be time sensitive. It'll also help with the question of notifying users of when the results are ready as it can do browser notification push without polling. As it makes use of Javascript you might find some of your client-side code is reusable.

Strixy
  • 568
  • 5
  • 15
1

I think you can run what you need in the background with some kind of Queue manager. I use something similar with CakePHP and it lets me run time intensive processes in the background asynchronously, so the browser does not need to be open.

Another plus side for this is that it's scalable, as it's easy to increase the number of queue workers running.

Basically with PHP, you just need a cron job that runs every once in a while that starts a worker that checks a Queue database for pending tasks. If none are found it keeps running in a loop until one shows up.

Nick Zinger
  • 1,174
  • 1
  • 12
  • 28
  • Thank you for your answer. Were working on switching this to CakePHP and probably also Node.js. This was just one of those projects that was specked out one way and then changed completely over time. Go scope creep yay! – RachelD Apr 10 '13 at 19:04
  • In that case expect you to be back on SO relatively soon with new issues lol. But seriously, sounds like an interesting setup, would love to combine the two at some point in my projects. – Nick Zinger Apr 11 '13 at 21:01