10

In PHP Curl case when we need to store/read cookies in term of web scraping, it feels that many resources out there encourage to use a file for handling cookies with these option

curl_setopt($ch, CURLOPT_COOKIEJAR, $CookieJarFilename);

curl_setopt($ch, CURLOPT_COOKIEFILE, $CookieJarFilename);

The bottom line here is they use a single file as cookiejar (usually .txt file).

But in the real scenario, our website is not only accessed by one computer, most likely there are many computers accessed it in the same time, and also there are some bots like Googlebots, Yahoo Slurp, etc.

So, with the single .txt file, isn't it obvious that the cookie jar will overwrite the same text file, make it a real mess for cookie?

Or am I mistaken here?

What's the 'right' method for handling cookies?

Krish R
  • 22,583
  • 7
  • 50
  • 59
bagz_man
  • 555
  • 2
  • 8
  • 20

2 Answers2

9

If there are multiple people accessing your page, and you need to perform curl with unique cookies for everyone, then there are several things you can do to handle this scenario.

1) If your user is authenticated and has a $_SESSION started on your end, then you can use the session_id() for cookie's file name.

2) If your user doesn't require any session(a Google bot, for example), you can create the cookie using timestamp + an extra random number for your cookie file name. For example:

$cookieName = time()."_".substr(md5(microtime()),0,5).".txt"; 
// Would output something like:
// `1388788940_91ab4.txt`

But in this case, you can not reuse the cookie if the user returns back to you 5 minutes later(unless you set the user's cookie with your cookie file name).

For either case, make sure you are cleaning these files periodically. Otherwise you'll have tons of cookie files created in your directory.

rm-vanda
  • 3,122
  • 3
  • 23
  • 34
Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
  • 1
    Like he said, if you don't delete these files, Linux will run out of "inodes" which jacks up your whole operating system, even if the disk isn't full. Been there, done that :-) – PJ Brunet Jun 01 '15 at 17:49
0

If you want PHP to do the cleanup for you..

Use tempnam like bagz_man says, but after using it read the file contents and store it in your session. You can then delete the temporary file. Create a new file next time you need it.

The only thing left behind is the session, which php takes care of.

Dwight Wilbanks
  • 131
  • 1
  • 6