This code searches through website html files and extracts a list of domain names...
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | grep -iEo '[[:alnum:]-]+\.(com|net|org)'
The result looks like this.
- domain1.com
- domain2.com
- domain3.com
I plan to use this code on very large websites, therefore this will generate a very large list of domain names. In addition, the above code generates a lot of duplicate domain names. Therefore, I setup a mysql database with a unique field so duplicates will not be inserted.
Using my limited knowledge of programming I hacked together this line below, but this is not working. When I execute the command, I get no error, just a new command prompt of > and a blinking cursor. I assume I'm not using the correct syntax or methodology, and/or maybe what I want to do is not possible via command line. Any help is much appreciated.
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | domain=“$(grep -iEo '[[:alnum:]-]+\.(com|net|org)’)” | mysql -pPASSWORD -e "INSERT INTO domains.domains (domains) VALUES ($domain)”
And yes, my database name is domains, and my table name is domains, and my field name is domains.