-1

I have written some code to read data from a specific page of a "pdf" file and write it to a csv file using python. It does it's job only partially. However, when it comes to write data to a csv file, it writes those in a single line instead of the regular pattern. How should I modify my script to serve the purpose? Thanks in advance.

Here is what I've tried so far:

import csv
from PyPDF2 import PdfFileReader

outfile = open("conversion.csv",'w', newline='')
writer = csv.writer(outfile)

infile = open('some.pdf', 'rb')
reader = PdfFileReader(infile)
contents = reader.getPage(7).extractText().split('\n')
writer.writerow(contents)

print(contents)
infile.close()

Data in pdf are like these:

Creating a PivotTable Report 162
PivotCaches 165
PivotTables Collection 165
PivotFields 166
CalculatedFields 170

I'm getting data in csv output like:

Creating a PivotTable Report 162 PivotCaches 165 PivotTables Collection 165 PivotFields 166 CalculatedFields 170
SIM
  • 21,997
  • 5
  • 37
  • 109
  • Did that and ran but still being written in a single line. – SIM Aug 17 '17 at 17:49
  • 2
    [writerow**s**](https://docs.python.org/3/library/csv.html#csv.csvwriter.writerows)?? – wwii Aug 17 '17 at 17:52
  • You also should close `outfile`, or you may get an incomplete file. Or use [context managers](http://eigenhombre.com/2013/04/20/introduction-to-context-managers) – Paulo Almeida Aug 17 '17 at 17:55
  • Thanks wwii, for your solution. You are very close. Now these are being written in multiple lines but each letter in each cell – SIM Aug 17 '17 at 17:58
  • 1
    What is `content`? [mcve] – wwii Aug 17 '17 at 18:09
  • @Shahin You are getting text and splitting by `\n`, which gets you rows. What do you want the columns to be? A single column with a line of text? As it is, `writerows` is iterating through every line and putting each letter in a separate column. In short, you need a list of lists. – Paulo Almeida Aug 17 '17 at 18:11
  • I'm startled to see that my question got downvote. Either I couldn't describe what the problem is , or it is hard to solve. However, i can't understand what else could bring the clarity of my question except for the way i asked? Thanks anyway. – SIM Aug 17 '17 at 18:21
  • 1
    @Shahin I didn't downvote, but what was unclear was your input, and what is still unclear is your desired output. But whatever it is, the solution is now very clear. You have to split your lines however you want them to go in the CSV file (I would guess "text,number") and then `writerows` (or `writerow` each row separately after splitting it). – Paulo Almeida Aug 17 '17 at 18:26
  • Thanks Paulo Almeida for your elaborative answer. Gonna check it out and let you know. Btw, that was not meant to you. – SIM Aug 17 '17 at 18:42

3 Answers3

0

For this Specific Code :

as contents is a list of items[lines]

contents = reader.getPage(7).extractText().split('\n')
for each in contents:
    writer.writerow(each)

print(contents)

Try this and let me know.

vintol
  • 48
  • 4
  • Upon running your code the error I'm getting in the console is: writer.writerow(content+"\n") TypeError: can only concatenate list (not "str") to list – SIM Aug 17 '17 at 18:02
  • let me know how this goes – vintol Aug 17 '17 at 19:11
0

Suppose you have

>>> print(s)
Line 1
Line 2
Line 3
Line 4

Or a representation of that string:

>>> s
'Line 1\nLine 2\nLine 3\nLine 4'

If you split by \n, the line ending are no longer there:

>>> s.split('\n')
['Line 1', 'Line 2', 'Line 3', 'Line 4']

So if you print each line to a file in turn, you get one line:

>>> with open('/tmp/file', 'w') as f:
...    for line in s.split('\n'):
...       f.write(line)
... 
# will write 'Line 1Line 2Line 3Line 4'

So you need to add the lines endings back when you write to the file:

writer.writerow('\n'.join(contents)) # assuming that is a list of strings

You should also either use a context manager (the with I used above) or close the file or you may only get a partial write.

dawg
  • 98,345
  • 23
  • 131
  • 206
0

This is the solution I was after:

import csv
from PyPDF2 import PdfFileReader

outfile = open("conversion.csv",'w',newline='')
writer = csv.writer(outfile)

infile = open('some.pdf', 'rb')
reader = PdfFileReader(infile)
contents = reader.getPage(15).extractText().split('\n')
for each in contents:
  writer.writerow(each.split('\n'))

infile.close()
outfile.close()

As vintol was very close to what the output I was looking for, I'm gonna accept his solution as an answer.

SIM
  • 21,997
  • 5
  • 37
  • 109