2

I've written a script to process html files from URLs, however, due to a 30's script runtime restriction with my cheap host provider I've had to alter the script to store the html as txt files and run it from a local WAMP server.

I am trying to load each file up, extract what I need, then move onto the next file.

URL's as source file_get_html was doing the job perfectly (I could ->find the required elements) Txt file as source file_get_html is returning a blank object.

Based on some advice in the below post I changed file_get_html for file_get_contents which created an array with a single large string containing the contents of the text file.

First, make sure that file_get_contents can get data. If it can, file_get_html will be able to load data to simplehtml Dom

If file_get_contents returns a string, which it does, how would I "load data to simplehtml Dom?"

File not getting read using file_get_html

I then tried to convert the string into an object str_get_html, however, this didn't work either.

include('simple_html_dom.php');
$html = file_get_html('file.txt');
var_dump($html);

Returns: object(simple_html_dom)[1] but with no other contents or arrays.

include('simple_html_dom.php');
$html = file_get_contents('file.txt');
var_dump($html);

Returns: string < ! DOCTYPE html PUBLIC.....

Questions:

Can anyone give me any advice? What's the best way to load up a text file containing html markup into an object so that I can utilise the find method on it's contents. I want to avoid loading the file into an array of strings and using regex to process contents.

Are there any considerations I need to make if using a local WAMP server?

Community
  • 1
  • 1
Jim
  • 51
  • 2
  • 5
  • Can you post your code and text file you are trying to read ? – Navneet Singh Nov 29 '12 at 11:12
  • I managed to fix it using str_get_html after i'd used file_get_contents to open the file. The text file is literally html source code dump of a webpage e.g. ....... – Jim Nov 29 '12 at 11:21

1 Answers1

1

(Answered by the OP in a question. Converted to a community wiki answer. See Question with no answers, but issue solved in the comments (or extended in chat) )

The OP wrote:

I managed to solve this myself. I am sure i'd already tried to extract html from string, doh!

include('simple_html_dom.php');
$html = file_get_contents('file.txt');    
$html = str_get_html($html);
var_dump($html)

Returns object(simple_html_dom)[1] including all expected arrays etc

Instead of trying to create the html object directly from the source file using file_get_html I've extracted the file contents file_get_contents then converted str to html using str_get_html which allows me to use the simple html dom methods e.g. find on attributes within the object e.g.

$html->find('a');
Community
  • 1
  • 1
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129