5

I'm currently using the httplib library in Python 2.7 to obtain some headers from a website to establish a) the filesize of a download and b) the last modified date of the file. I've used some online tools and these details do exist.

I'm currently scripting my Python code and it appears to work correctly bringing back the required information. Nonetheless, the response containing the header information is a list containing a number of tuples. A sample of the response is below:-

[('content-length', '2501479'),
 ('accept-ranges', 'bytes'),
 ('vary', 'Accept-Encoding'),
 ('server', 'off'),
 ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
 ('etag', '"2c8171a-262b67-4afb368edfffc"'),
 ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
 ('content-type', 'text/plain')]

What I am looking to do is strip out basically the file size ("2501479") and the date ("Thu, 20 Oct 2011 04:30:01 GMT"). Any ideas how I can go about doing this? I originally tried variable[0] but this returns "'content-length', '2501479'". How can I return the filesize solely (in theory the second part of the first tuple in the list!).

phihag
  • 278,196
  • 72
  • 453
  • 469
thefragileomen
  • 1,537
  • 8
  • 24
  • 40

5 Answers5

7

First, you can make it a little easier to work with by turning your list of tuples into a dictionary:

>>> headers = [('content-length', '2501479'),
...  ('accept-ranges', 'bytes'),
...  ('vary', 'Accept-Encoding'),
...  ('server', 'off'),
...  ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
...  ('etag', '"2c8171a-262b67-4afb368edfffc"'),
...  ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
...  ('content-type', 'text/plain')]
>>> 
>>> headers = dict(headers)
>>> int(headers['content-length'])
2501479

For the date, I would turn it into a datetime object using the email.utils.parsedate function:

>>> import email.utils
>>> email.utils.parsedate(headers['date'])
(2011, 10, 20, 16, 1, 11, 0, 1, -1)
jterrace
  • 64,866
  • 22
  • 157
  • 202
4

First, convert the tuples into a dict, and then convert the value to int to get a number:

response_tupels = [('content-length', '2501479'), ('accept-ranges', 'bytes'),]
response = dict(response_tupels)
try:
  content_length = int(response['content-length'])
except KeyError:
  raise # Handle missing content-length here
phihag
  • 278,196
  • 72
  • 453
  • 469
2

You simply have to index it again in order to access the tuple. Like

length = variable[0][1]
last_mod = variable[4][1]

for size and the date of last modification.

Note: This only works when the indices of content-length and last-modified are always the same.

naeg
  • 3,944
  • 3
  • 24
  • 29
0

You've got tuples inside an array... Luckily you can reference (or dereference them depending on your terminology) the same way...

so v = x[0] will give you as you state the tuple ("'content-length', '2501479'") and v[0] will give you 'content-length' and v[1] will give you '2501479' (although you probably want to do an int(v[0]) on that with perhaps some error checking.

You may be better putting that array into a dict though; so you can be certain you are getting out the content length if the order should ever change.

Thankfully, the syntax is almost the same - it uses the [] operator. However, I am going to leave it to you to look at the python man pages to see how to convert an array -> dict (can't do everything for you!!)

Richard Green
  • 2,037
  • 2
  • 20
  • 38
0
mas = [('content-length', '2501479'),
 ('accept-ranges', 'bytes'),
 ('vary', 'Accept-Encoding'),
 ('server', 'off'),
 ('last-modified', 'Thu, 20 Oct 2011 04:30:01 GMT'),
 ('etag', '"2c8171a-262b67-4afb368edfffc"'),
 ('date', 'Thu, 20 Oct 2011 16:01:11 GMT'),
 ('content-type', 'text/plain')]
mas = dict(mas)
mas.get('content-length')
pod2metra
  • 256
  • 1
  • 6