Is there e.g. a crawler that can find (and list the form action etc.) all pages that have forms in my site?
I'd like to log all pages with unique actions to then audit further.
Is there e.g. a crawler that can find (and list the form action etc.) all pages that have forms in my site?
I'd like to log all pages with unique actions to then audit further.
Norconex HTTP Collector is an open source web crawler that can certainly help you. Its "Importer" module has a "TextBetweenTagger" feature to extract text between any start and end text and store it in a metadata field of your choice. You can then filter out those that have no such text extracted (look at the EmptyMetadataFilter option for this).
You can do this without writing code. As far as storing the results, the product uses "Committers". A few committers are readily available (including a filesystem one), but you may want to write your own to "commit" your crawled data wherever you like (e.g. in a database).
Check its configuration page for ideas.