0

I downloaded the latest version of phpcrawler, and I can access a test website of my own.

I only have an image and some text on this site, I run the crawler and I receive the text minus the image because I did the proper $crawler->addNonFollowMatch("/.(jpg|gif|png)$/ i");

I cannot get it to save the tmp file It does not save the unique tmp file in the folder I run the crawler from, I have tried to save a named file no luck.

I did run into many depreciated errors on different lines in all the php files, for example: @fopen, the @ cause problems in different area's. I use PHP and can also do Regex. David.

Msonic
  • 1,456
  • 15
  • 25

1 Answers1

0

I answered my own question, since I see that PHPCrawler questions really do not get answered; I saw a question from last year not answered. I will answer it also, though it might be too late to do any good. This is the answer.

I added in a modified phpcrawler I adjusted for my needs:

$fp = fopen('c:/test/poopoo.txt','w');
fwrite($fp,($page_data['source'])); 
fclose($fp);

You put it before flushing the file and create your instance of class.

I found out using PHP Simple HTML DOM Parser from this project works well. If you need more control use RegExp, but that does have a steep learning curve.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Congrats on the fix, and thanks for looking out for the PHPCrawler sub-community! When you are able, please make sure to mark your answer as 'accepted' so that others may learn from your success. Cheers~ – Andrew Kozak Apr 05 '12 at 16:26