How to parse information within HTML `tag using php`

Question

I am using a PHP script to crawl data from HTML and struggling to parse the data within HTML <code> tag. For instance, in the following code, I would like to parse the content such as name, location, position and company name.

<code id="content" style="display:none;">
<!--{"required content":{"name:"John Smith", 
"location:"UK"}, "position:"Manager", "company:"IBM"}}-->
</code>

I would appreciate it if someone can point me in the right direction.

how are you currently parsing the html? are you using DOMDocument, see link http://php.net/manual/en/domdocument.loadhtml.php — Liam Sorsby, Nov 04 '13 at 13:43
Don't use regular expressions, use a DOM explorer like e.g. http://php.net/manual/en/book.dom.php — Reeno, Nov 04 '13 at 13:43

score 0 · Accepted Answer · answered Nov 04 '13 at 13:47

0

It seems you have json inside the <code> tag.
So first (after you get the inner html of the <code> tag) get rid of the comments ('') and then use function json_decode()

answered Nov 04 '13 at 13:47

lvil

4,326
9
48
76

score -1 · Answer 2 · answered Nov 04 '13 at 13:46

Take a look at PHP's strip_tags function: http://php.net/manual/en/function.strip-tags.php

This will at least remove the HTML entities from your string. You can also specify any HTML entities (tags, comments, etc.) that you want to keep, while removing everything else.

How to parse information within HTML tag using php

2 Answers2

How to parse information within HTML `tag using php`