0

On a PHP based website, users will give permission to a Dropbox app to use their Dropbox folder, then the user put thousands of text files in this folder, maybe at once, maybe continuously, which I need to process, save to database and show the results as quickly as possible if the user is logged in to the website (I would like to process and output it nearly real time). What are the best technologies to do this with the least resources ? First, I need to do it for 30 people, but later it has to work with hundreds of users instantly. Each user will have thousand of files, and some files need to process more than once (they are increasing) others don't.

I tought I run a command line php script in a infinite loop, which copy from Dropbox and process files for every user periodically, but it seems too slow. API calls to the Dropbox server seems very slow, so doing it continuously is maybe not the best option.
A better tought maybe the user click on a "Import" button on the website, and the script only get files for those users currently clicked that button.
What's your suggestion ? It doesn't have to be PHP. I have a dedicated server to this, but I would like to hear hosting-friendly solutions too.

Maybe offer me an another simple, secure and fast way to get those files to the server as the Dropbox method. (I choose to do this, because Dropbox is very easy for the user to set up, to use, the sync is very reliable, secure and fast.)

kissgyorgy
  • 2,947
  • 2
  • 32
  • 54
  • So dropbox is not integral to your app? You are considering using it just as a convenient way for your clients to get files to you? – walrii Jun 01 '12 at 04:33
  • yes, exactly. I thought it is possibly the easiest way for my non-tech users, and the most secure and fast way for me. And I don't really want to reinvent the wheel either... – kissgyorgy Jun 01 '12 at 04:35
  • So, for your convenience, you will be causing bandwidth up and down from dropbox, as well as a load on their api service as you poll repeatedly to see if new files have been uploaded to your users. Drop box gets cost and negligible benefit. – walrii Jun 01 '12 at 04:39
  • 2
    You can avoid having to poll for changes if you have some sort of upload service on your own server. Then you process the files as they come in. I would try to find some OSS services, such as for photo sharing, document sharing, etc that you could modify to meet your needs. – walrii Jun 01 '12 at 04:43
  • 2
    Stuff like http://stackoverflow.com/questions/102476/open-source-app-that-provides-yousendit-style-functionality http://www.webresourcesdepot.com/open-source-dropbox-alternatives-to-start-building-a-file-storage-sharing-system/ http://www.webstuffshare.com/2010/02/plupload-superb-open-source-file-uploader/ – walrii Jun 01 '12 at 04:54
  • Thanks ! I will watch and process a folder on my server, and offer a user different methods for upload, like simple form upload, or sync with a software like Syncany. – kissgyorgy Jun 01 '12 at 05:20

1 Answers1

1

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can leave this running:

inotifywait -q -m -r -e MOVED_TO -e CLOSE_WRITE --format %w%f Dropbox_dir | parallel -u your_program

Everytime something is uploaded to any dir below Dropbox_dir, your_program will be run on that file. parallel will make sure only to run 1 job per CPU core, so your server will not be overloaded if the user uploads 10000 files in one go.

For every user you will then simply do:

mkdir Dropbox_dir/user_folder

and wait for Dropbox to put a file in there.

As Dropbox is using .dropbox.cache and .dropbox it may be needed to ignore files with that in its path:

inotifywait ... | grep -v /.dropbox | parallel ...

This would also work for other file transfer methods (FTP/Samba/Rsync/scp and probably more).

You can install GNU Parallel simply by:

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • You can not expect your users to share their full dropbox credentials so that you can share their whole files that you then sync even. So your answer is nice maybe for files on disk, but not for dropbox itself. They use OAuth btw, – hakre Jun 03 '12 at 11:59
  • I always assumed the users would simply share a subfolder with me - i.e. no use of their credentials. Any reason why that would not work? – Ole Tange Jun 04 '12 at 06:48
  • AFAIK that does not work over OAUth. However, I think your description is generally a good answer in case it works on the file-system. – hakre Jun 04 '12 at 08:12