Questions tagged [httrack]

HTTrack (Website copier)

HTTrack is a free and open source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

HTTrack allows users to download World Wide Web sites from the Internet to a local computer.[4][5] By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or "mirrored") website can be browsed by opening a page of the site in a browser.

HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs.

HTTrack uses a Web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.

Reference :

http://www.httrack.com/

http://en.wikipedia.org/wiki/HTTrack

74 questions

votes

1 answer

Remove Domain URL from downloaded wbsite by HTTrack

I have downloaded full website by HTTrack. But after downloading the site all URL contain the Domain name url of the site like: www.example.com/index.html instead of index.html is there any way to remove this url ?

asked Sep 10 '16 at 18:30

akib

votes

0 answers

What does block the crawl of my website by Httrack or Wget?

I am attempting to clone my website to show it for a presentation offline. However I tried either with Httrack either with Wget and both are stoping to the second level of the source tree. What could be the reason ? This is the Wget cmd : wget -r…

wget httrack

asked May 30 '16 at 21:22

Baldráni

5,332
7
51
79

votes

1 answer

Node.js get HTTP_USER_AGENT and Block HTTrack

I want to block all bots (like a HTTrack) on my website. Normally, I would use .htaccess file to block bots via RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]. However, my server is running Node.js Express. How can I get HTTP_USER_AGENT and do a…

javascript node.js user-agent httrack

asked Apr 25 '16 at 16:30

Barış Saçıkara

votes

1 answer

Download .torrent from YTS

Is it possible to download all torrent files from the yts website? In HHTRACK I get a mirror error, probably caused by the captcha that you need to enter before accessing the site. Is there a way to bypass this or use another method?

download torrent httrack

asked Apr 18 '16 at 09:04

dcf007

votes

1 answer

Trying to mirror site that uses strapdown.js

there is a site that uses strapdown.js that I am trying to mirror using httrack or wget, but I fall short, because the site contains markdown and not HTML. Only strapdown converts the links to html links. Hence the client needs to interpret…

javascript linux unix wget httrack

asked Nov 26 '14 at 12:56

Buddy

votes

1 answer

httrack only downloads the index.html file

Usually when I download sites with Httrack I get all the files; images, CSS, JS etc. Today, the program finished downloading in just 2 seconds and only grabs the index.html file with CSS, IMG code etc inside still linking to external. I've already…

html web download httrack

asked Nov 22 '14 at 17:16

user3379220

votes

1 answer

How do I push the result of this complex command line grep statement to mysql database?

This code searches through website html files and extracts a list of domain names... httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | grep -iEo '[[:alnum:]-]+\.(com|net|org)' The result looks like…

mysql bash grep httrack

asked May 24 '14 at 22:31

Wyatt Jackson

-1

votes

1 answer

How to prevent laravel websites being copied using HTTracks

How to prevent laravel websites being copied using HTTracks or other software, Thank you

laravel httrack

asked Aug 10 '22 at 07:11

Rajath Ramachandran

-1

votes

1 answer

Downloading website with wget or httrack yields error "301 Moved Permanently"

I am trying to download https://untools.co/ with wget or httrack, but I repeatedly get the error "301 Moved Permanently". I get the main page downloaded, but once I open the index.html with a browser and try to click on some of the links I get…

download wget download-manager httrack

asked Jun 29 '22 at 17:54

Make42

12,236
24
79
155

-1

votes

1 answer

How to find the directory structure and the file names under a php website?

How do i get the directory structure and filenames under a PHP website I do not own?. Not the code, just the structure and the filenames.? I tried httrack, but since it's a PHP website, it doesn't work.

php web-scraping scrapy web-crawler httrack

asked Jun 04 '20 at 01:24

user12871659

-1

votes

1 answer

Different source code in inspect and in view-source code

While I was looking for source code a website it showed me some random-looking JS code in body block in view-source-code like following: