Questions tagged [html-tree]

HTML-Tree is a Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

HTML-Tree is the most popular Perl library for parsing HTML into DOM-like trees. It includes HTML::TreeBuilder and HTML::Element.

There are a number of other modules that build on top of HTML-Tree. Some notable ones are:

33 questions
1
vote
1 answer

How to put values from a look_down tree array's HTML tag into a regular array in Perl?

This is a snippet of code I've got: #!/usr/bin/perl use strict; use warnings; use LWP::Simple; use Time::Piece; use HTML::Tree; my $url0 = 'http://www.website.ch/blah.aspx'; my $doc0 = get($url0); my $tree0 =…
user3333975
  • 125
  • 10
1
vote
2 answers

UPDATED: Editing Hash Array Content

In my array I've got stuff that looks like this; that is, the format is like this: Monday, June 12 I want to get rid of the Monday, <--- n.b.: There is a space after this comma. part. What I'm used to doing is just regexing the tags and then…
1
vote
2 answers

perl html treebuilder how to handle error condition

The task is quite simple: access a url and parse it based on the result. In case there is an error (404, 500 etc etc), take appropriate action. The last piece is the one that I am having issue with. I have listed both the pieces of code that I…
souser
  • 5,868
  • 5
  • 35
  • 50
1
vote
1 answer

HTML::TagFilter remove div based on class

I'm trying to use a perl script to pull content from static html files on a server. I'd like to pull the content of a specific div. I know the div by its class name ("getme"). I can get to the div using HTML::TreeBuilder->look_down. How can I…
bart
  • 68
  • 2
  • 8
0
votes
3 answers

Perl: why does this web scraper regex work inconsistently?

I have run into another problem in relation to a site I am trying to scrape. Basically I have stripped most of what I don't want from the page content and thanks to some help given here have managed to isolate the dates I wanted. Most of it seems to…
SlowLearner
  • 7,907
  • 11
  • 49
  • 80
0
votes
1 answer

How to keep data marked as UTF-8 after parsing with HTML::Tree?

I wrote a script, where i slurp in UTF-8 encoded HTML-file and then parse it to tree with HTML::Tree. Problem is that after parsing the strings are not marked as UTF-8 anymore. As _utf8_on() is not recommended way to set flag on, i am looking for…
w.k
  • 8,218
  • 4
  • 32
  • 55
0
votes
1 answer

Static changes by saving of web page as "web page complete"

I save a web site using Firefox 33.0 as "Web Page, Complete". The problem is, the html tree of the main html file is changed statically. Before saving there was something like that: Stuff before
Igor K
  • 1
  • 1
0
votes
1 answer

How to fetch the value of a HTML tag using HTML::Tree?

Lets say i have an array which holds the contents of the body tag like shown below: print Dumper(\@array); $VAR1 = [
0
votes
1 answer

Matching Multiple 'id' Values Using RegEx in Combination with HTML::TreeBuilder

I've got a list of URLs in an array: http://www.site.sx/doc1.html http://www.site.sx/doc2.html http://www.site.sx/doc3.html . . . Let's view the contents of the first page, namely doc1.html:
user3404787
  • 11
  • 1
  • 6
0
votes
2 answers

Suckerupper With Hash Enumeration

I've got some code that a friend of mine helped create: 1 use LWP::Simple; 2 use HTML::TreeBuilder; 3 use Data::Dumper; 4 5 my $tree = url_to_tree( 'http://www.registrar.ucla.edu/schedule/schedulehome.aspx' ); 6 7 my @selects =…
user3333975
  • 125
  • 10
0
votes
1 answer

Extracting all links of a certain form

I've got a page that I want all the links off of (e.g. http://www.stephenfry.com/). I want to put all the links that are of the form http://www.stephenfry.com/WHATEVER into an array. What I've got now is just the following method: #!/usr/bin/perl…
0
votes
1 answer

Update column values in an HTML file using HTML::TreeBuilder

I have an HTML file with several tables (all tables have same number of columns and same column names). The tables are separated by other HTML tags. For each row in each table, I would like to change the value of cell 1 and cell 3. This what I have…
smithy
  • 581
  • 2
  • 6
  • 10
0
votes
1 answer

Extracting Text in body that is not part of tag with HTML::TreeBuilder

I have some ugly html that is emailed to my program that looks like: Saved search results.

Name: 'Some splunk…
Todd
  • 698
  • 6
  • 19
0
votes
1 answer

HTML parser by perl script

#!/usr/bin/perl use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; $tree->parse_file("sample.html"); foreach my $anchor ($tree->find("p")) { print $anchor->as_text, "\n"; } my code is not printing any output. $tree->find("p") is…
dreamer
  • 478
  • 1
  • 11
  • 24
0
votes
1 answer

HTML::TreeBuilder->new_from_url() in perl not working

Using HTML::TreeBuilder->new_from_url() want to go to a website say https://abc.com/index.html and want to display some values from that html page. https://abc.com/index.html asks for user authentication(test/test123 are username and password). I…
Cindrella
  • 1,671
  • 7
  • 27
  • 47