As an example:
I load in the input from a .txt:
Benjamin,Schuvlein,Germany,1912,M,White
I do some code that I will not post here for brevity and get to the link:
https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ
- I want to scrape multiple things from that page. In the code below, I only do 1.
- I'd also like to make each item be separated by a , in the output .txt.
- And, I'd like the output to be preceded by the input.
I'm using the following packages in the code:
use strict;
use warnings;
use WWW::Mechanize::Firefox;
use Data::Dumper;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
use HTML::DOM;
Here's the relevant code:
my $ua = LWP::UserAgent->new;
open(my $o, '>', 'out2.txt') or die "Can't open output file: $!";
# Here is the url, although in practice, it is scraped itself using different code
my $url = 'https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ';
print "My URL is <$url>\n";
my $request = HTTP::Request->new(GET => $url);
$request->push_header('Content-Type' => 'application/json');
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
my $dom_tree = new HTML::DOM;
$dom_tree->write($response->content);
$dom_tree->close;
my $str = $dom_tree->getElementsByTagName('table')->[0]->getElementsByTagName("td")->[10]->as_text();
print $str;
print $o $str;
Desired Output (from that link) is something like:
Benjamin,Schuvlein,Germany,1912,M,White,Queens,New York,Married,Same Place,Head, etc ....
(How much of that output section is scrapable?)
Any help on how to get the link within the link would be much appreciated!