1

I'm in the process of writing a web scraping python script, and one of the things I'd like it to be able to do is have it take a snapshot of certain pages (all of the html, style sheets, and images necessary to view that particular page properly offline). Seems like HTTrack is a good way to do that, and I thought I would be able to call it from within the python script using

subprocess.call(["httrack", "http://www.example.com", "-O", "\tmp\example"])

But attempting to do this results in "FileNotFoundError: [WinError 2] The system cannot find the file specified". I've also tried giving it the full file path,

subprocess.call(["C:\Program Files\WinHTTrack\httrack.exe", "http://www.example.com", "-O", "\tmp\Example"])

but I get the error "SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape"

I think this is a problem with me not understanding subprocess correctly, since I can get HTTrack working through windows command prompt. Can anyone help me understand the correct way to use subprocess?

Empiromancer
  • 3,778
  • 1
  • 22
  • 53
  • 1
    The `"\t"` in `"\tmp\example"` doesn't jump out at you at all? As to `\U`, it seems you're using Python 3 and aren't showing us the line with a string containing `"\U"` in position 2-3, such as `"C:\Users"`. Anyway, just use [r]aw strings to avoid this problem -- except if a path ends on a backslash, in which case use a regular string and escape each backslash with another backslash, such as `"C:\\"`. – Eryk Sun Jan 14 '16 at 06:49

1 Answers1

1

Resolved thanks to eryksun's comment. It wasn't a problem with the subprocess syntax at all, but rather that I wasn't being careful about escaping all of my backslashes. Pulling r in front of those strings to make them raw strings fixed up my code just fine.

Community
  • 1
  • 1
Empiromancer
  • 3,778
  • 1
  • 22
  • 53