Questions tagged [html-tree]

HTML-Tree is a Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

HTML-Tree is the most popular Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

There are a number of other modules that build on top of HTML-Tree. Some notable ones are:

HTML::TreeBuilder::XPath — adds XPath support to HTML::Element.
pQuery — allows jQuery-like queries
WWW::Mechanize — automated web browsing in Perl

33 questions

vote

1 answer

How to put values from a look_down tree array's HTML tag into a regular array in Perl?

This is a snippet of code I've got: #!/usr/bin/perl use strict; use warnings; use LWP::Simple; use Time::Piece; use HTML::Tree; my $url0 = 'http://www.website.ch/blah.aspx'; my $doc0 = get($url0); my $tree0 =…

asked Mar 03 '14 at 18:35

user3333975

vote

2 answers

UPDATED: Editing Hash Array Content

In my array I've got stuff that looks like this; that is, the format is like this: Monday, June 12 I want to get rid of the Monday, <--- n.b.: There is a space after this comma. part. What I'm used to doing is just regexing the tags and then…

arrays perl hash tree html-tree

asked Feb 14 '14 at 05:07

user3269763

vote

2 answers

perl html treebuilder how to handle error condition

The task is quite simple: access a url and parse it based on the result. In case there is an error (404, 500 etc etc), take appropriate action. The last piece is the one that I am having issue with. I have listed both the pieces of code that I…

perl lwp html-tree

asked Jun 11 '13 at 02:02

souser

5,868
5
35
50

vote

1 answer

HTML::TagFilter remove div based on class

I'm trying to use a perl script to pull content from static html files on a server. I'd like to pull the content of a specific div. I know the div by its class name ("getme"). I can get to the div using HTML::TreeBuilder->look_down. How can I…

perl html-parsing html-tree

asked May 30 '12 at 15:02

bart

votes

3 answers

Perl: why does this web scraper regex work inconsistently?

I have run into another problem in relation to a site I am trying to scrape. Basically I have stripped most of what I don't want from the page content and thanks to some help given here have managed to isolate the dates I wanted. Most of it seems to…

regex perl web-scraping lwp html-tree

asked Feb 08 '12 at 12:34

SlowLearner

7,907
11
49
80

votes

1 answer

How to keep data marked as UTF-8 after parsing with HTML::Tree?

I wrote a script, where i slurp in UTF-8 encoded HTML-file and then parse it to tree with HTML::Tree. Problem is that after parsing the strings are not marked as UTF-8 anymore. As _utf8_on() is not recommended way to set flag on, i am looking for…

perl utf-8 html-parsing html-tree

asked Aug 29 '11 at 14:23

w.k

8,218
4
32
55

votes

1 answer

Static changes by saving of web page as "web page complete"

I save a web site using Firefox 33.0 as "Web Page, Complete". The problem is, the html tree of the main html file is changed statically. Before saving there was something like that: Stuff before

votes

1 answer

How to fetch the value of a HTML tag using HTML::Tree?

Lets say i have an array which holds the contents of the body tag like shown below: print Dumper(\@array); $VAR1 = [

perl cpan html-tree html-treebuilder

asked Mar 16 '14 at 10:57

user3199303

votes

1 answer

Matching Multiple 'id' Values Using RegEx in Combination with HTML::TreeBuilder

I've got a list of URLs in an array: http://www.site.sx/doc1.html http://www.site.sx/doc2.html http://www.site.sx/doc3.html . . . Let's view the contents of the first page, namely doc1.html:

python regex dictionary tree html-tree

asked Mar 11 '14 at 06:31

user3404787

votes

2 answers

Suckerupper With Hash Enumeration

I've got some code that a friend of mine helped create: 1 use LWP::Simple; 2 use HTML::TreeBuilder; 3 use Data::Dumper; 4 5 my $tree = url_to_tree( 'http://www.registrar.ucla.edu/schedule/schedulehome.aspx' ); 6 7 my @selects =…

perl hash web-scraping web-crawler html-tree

asked Mar 05 '14 at 05:41

user3333975

votes

1 answer

Extracting all links of a certain form

I've got a page that I want all the links off of (e.g. http://www.stephenfry.com/). I want to put all the links that are of the form http://www.stephenfry.com/WHATEVER into an array. What I've got now is just the following method: #!/usr/bin/perl…

regex arrays perl html-tree

asked Feb 16 '14 at 01:14

user3269763

votes

1 answer

Update column values in an HTML file using HTML::TreeBuilder

I have an HTML file with several tables (all tables have same number of columns and same column names). The tables are separated by other HTML tags. For each row in each table, I would like to change the value of cell 1 and cell 3. This what I have…

perl html-tree

asked Feb 10 '13 at 15:07

smithy

votes

1 answer

Extracting Text in body that is not part of tag with HTML::TreeBuilder

I have some ugly html that is emailed to my program that looks like: Saved search results.

Name: 'Some splunk…

perl html-tree

asked Feb 08 '13 at 17:40

Todd

votes

1 answer

HTML parser by perl script

#!/usr/bin/perl use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; $tree->parse_file("sample.html"); foreach my $anchor ($tree->find("p")) { print $anchor->as_text, "\n"; } my code is not printing any output. $tree->find("p") is…

html perl parsing html-parsing html-tree

asked Nov 05 '12 at 14:36

dreamer

votes

1 answer

HTML::TreeBuilder->new_from_url() in perl not working

Using HTML::TreeBuilder->new_from_url() want to go to a website say https://abc.com/index.html and want to display some values from that html page. https://abc.com/index.html asks for user authentication(test/test123 are username and password). I…

perl html-tree

asked Sep 27 '12 at 11:08

Cindrella

1,671
7
27
47

Prev 1

3 Next