I am creating a python program which searches the web for specific pieces of text in a source file to see if they exist online/are plagiarized. I am currently using a combination of google and beautifulsoup to find websites which likely have this piece of text (literally just as if I had google searched for the string) yet I am at a loss for how I should go about actually searching each specific website. I am looking for suggestions as to how I should go about searching for the string of text on each website. (*Currently pretty new to this aspect of python) Any advice is appreciated! Thanks for your time!
Asked
Active
Viewed 258 times
1
-
1i think bs has methods to extract the text of html page. else you can simply traverse the html body and extract the text recursively – Nikos M. May 04 '20 at 22:19
-
1possible duplicate https://stackoverflow.com/questions/23380171/using-beautifulsoup-to-extract-text-without-tags – Nikos M. May 04 '20 at 22:20
-
Did you find a good way to do it? Would like to know! – Mr.J Apr 24 '22 at 22:53