Downloading a file into memory

Question

I am writing a python script and I just need the second line of a series of very small text files. I would like to extract this without saving the file to my harddrive as I currently do.

I have found a few threads that reference the TempFile and StringIO modules but I was unable to make much sense of them.

Currently I download all of the files and name them sequentially like 1.txt, 2.txt, etc, then go through all of them and extract the second line. I would like to open the file grab the line then move on to finding and opening and reading the next file.

Here is what I do currently with writing it to my HDD:

while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    linkFile = open('Summary.txt', 'a')
    linkFile.write(cand_summary)
    linkFile.write("\n")
    count4 = count4 + 1
    linkFile.close()

I would be very interested in what tutorial/book you are using to learn Python so I can recommend you a different one. — Tim Pietzcker, Sep 30 '11 at 20:25

Tim Pietzcker · Answer 1 · 2011-09-30T20:29:00.353

You open and close the output file in every iteration.

Why not simply do

with open("Summary.txt", "w") as linkfile:
    while (count4 <= num_files):
        file_p = [directory,str(count4),'.txt']
        file_path = ''.join(file_p)        
        cand_summary = linecache.getline(file_path, 2).strip() # string module is deprecated
        linkFile.write(cand_summary)
        linkFile.write("\n")
        count4 = count4 + 1

Also, linecache is probably not the right tool here since it's optimized for reading multiple lines from the same file, not the same line from multiple files.

Instead, better do

with open(file_path, "r") as infile:
    dummy = infile.readline()
    cand_summary = infile.readline.strip()

Also, if you drop the strip() method, you don't have to re-add the \n, but who knows why you have that in there. Perhaps .lstrip() would be better?

Finally, what's with the manual while loop? Why not use a for loop?

Lastly, after your comment, I understand you want to put the result in a list instead of a file. OK.

All in all:

summary = []
for count in xrange(num_files):
    file_p = [directory,str(count),'.txt'] # or count+1, if you start at 1
    file_path = ''.join(file_p)        
    with open(file_path, "r") as infile:
        dummy = infile.readline()
        cand_summary = infile.readline().strip()
        summary.append(cand_summary)

I think the question is "how do I maintain the summary in memory without writing to summary.txt" — David Heffernan, Sep 30 '11 at 20:19
I must admit I'm not sure at all what the question is. The title is about "downloading", but there is no download at all in the code... — Tim Pietzcker, Sep 30 '11 at 20:23
The downloading part is in another part of the script, but David is correct, sorry for not explaining it better. There is a site that offers a file to download, I would rather not have to download the file then open it then grab the second line, I would like to know if there is a more direct way. — jimstandard, Sep 30 '11 at 20:26

score 0 · Answer 2 · answered Sep 30 '11 at 20:16

0

Just replace the file writing with a call to append() on a list. For example:

summary = []
while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    summary.append(cand_summary)
    count4 = count4 + 1

As an aside you would normally write count += 1. Also it looks like count4 uses 1-based indexing. That seems pretty unusual for Python.

answered Sep 30 '11 at 20:16

David Heffernan

601,492
42
1,072
1,490

or use `for count4 in range(1, num_files + 1)` instead of incrementing yourself! – agf Sep 30 '11 at 20:19
@agf Agreed, but I can't be 100% sure that count4 runs from 1. – David Heffernan Sep 30 '11 at 20:24

Downloading a file into memory

2 Answers2