2

Working on a python scraper/spider and encountered a URL that exceeds the char limit with the titled IOError. Using httplib2 and when I attempt to retrieve the URL I receive a file name too long error. I prefer to have all of my projects within the home directory since I am using Dropbox. Anyway around this issue or should I just setup my working directory outside of home?

DrewK
  • 143
  • 1
  • 2
  • 9
  • 4
    It would really help to show the actual error and traceback (and what you were calling at the time). – abarnert Feb 15 '13 at 00:43
  • 1
    Also, IIRC, `httplib2` by default doesn't save files; it gives you a `content` object that you can do whatever you want with (like, say, building a local filename out of the URL, response headers, etc. and saving the `content` to that file). Which means that the error would be coming from code that you wrote, and could just write differently… – abarnert Feb 15 '13 at 00:46
  • Error passed by Python IOError: [Errno 36] File name too long: '.cache/www.example.com' The url is exceeding the filename limit. I am only running a "GET" request using httplib2 and receive that error. If I run the .py file outside of my home directory, no problem. After some similar errors I saw an example of someone with the same problem because it was in an encrypted home directory, he had no solution other than moving the py file outside of home directory. Hate the idea that the only way to fix this is moving it outside of home directory. – DrewK Feb 15 '13 at 00:51
  • OK, the problem is that the _cache_ files have filenames too long. Moving the script out of the home directory isn't going to help; you'd have to change its cache directory. Give me a second to look up more details. – abarnert Feb 15 '13 at 00:55
  • @DrewK please show the code. – wRAR Feb 15 '13 at 00:57

3 Answers3

6

You are probably hitting limitation of the encrypted file system, which allows up to 143 chars in file name.

Here is the bug: https://bugs.launchpad.net/ecryptfs/+bug/344878

The solution for now is to use any other directory outside your encrypted home directory. To double check this:

mount | grep ecryptfs

and see if your home dir is listed. If that's the case either use some other dir above home, or create a new home directory without using encryption.

yǝsʞǝla
  • 16,272
  • 2
  • 44
  • 65
2

The fact that the filename that's too long starts with '.cache/www.example.com' explains the problem.

httplib2 optionally caches requests that you make. You've enabled caching, and you've given it .cache as the cache directory.

The easy solution is to put the cache directory somewhere else.

Without seeing your code, it's impossible to tell you how to fix it. But it should be trivial. The documentation for FileCache shows that it takes a dir_name as the first parameter.

Or, alternatively, you can pass a safe function that lets you generate a filename from the URI, overriding the default. That would allow you to generate filenames that fit within the 144-character limit for Ubuntu encrypted fs.

Or, alternatively, you can create your own object with the same interface as FileCache and pass that to the Http object to use as a cache. For example, you could use tempfile to create random filenames, and store a mapping of URLs to filenames in an anydbm or sqlite3 database.

A final alternative is to just turn off caching, of course.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Ok sweet, that makes perfect sense. Seems super obvious now, new to ubuntu and was having some issues with that. Thanks! – DrewK Feb 16 '13 at 23:37
2

As you apparently have passed '.cache' to the httplib.Http constructor, you should change this to something more appropriate or disable the cache.

wRAR
  • 25,009
  • 4
  • 84
  • 97