I do something rather similar, actually.
By using GreaseMonkey, you can write a user-script that will interact with the pages however you need. You can get the next page link and scroll things as you like.
You can also store any data locally, within Firefox though some new functions called GM_getValue and GM_setValue.
I take the lazy way out. I just generate a long list of the URLs that I find when navigating the pages. I do a crude "document.write" method and I dump out my list of URLs as a batch file that rules on wget
.
At that point I copy-and-paste the batch file then run it.
If you need to run this often enough that it should be automated, there used to be a way to turn GreaseMonkey scripts into Firefox extensions, that have access to more power.
Another option is currently AFAIK, Chrome only. You can collect whatever information you need and build a large file from it, then use the download
attribute of a link and come up with a single-click to save things.
Update
I was going to share the full code for that I was doing, but it was so tied to a particular website that it wouldn't have really helped -- so I'll go for a more "general" solution.
Warning, this code typed on the fly and may not be actually correct.
// Define the container
// If you are crawling multiple pages, you'd want to load this from
// localStorage.
var savedLinks = [];
// Walk through the document and build the links.
for (var i = 0; i < document.links.length; i++) {
var link = document.links[i];
var data = {
url: link.url,
desc = getText(link)
};
savedLinks.push(data);
}
// Here you'd want to save your data via localStorage.
// If not on the last page, find the 'next' button and load the next page
// [load next page here]
// If we *are* on the last page, use document.write to output our list.
//
// Note: document.write totally destroys the current document. It really is quite
// an ugly way to do it, but in this case it works.
document.write(JSON.stringify(savedLinks, null, 2));