-1

In the sample data frame

YYYYMM q1 q2 q3 q4 q5 q6 q7 q8 q9 q0 d1 d2 d3 d4 d5
197501  2 11 12 26 25 10 29 21 30 22  8  7 14  4 13
197502 27 22  8 20  6 26 21  4 19  9 10  1 11 12 23
197503  8  7 21 22 25  9  4 30  2 19 10 11 28 12 27
197504 29 28 27 17 19  2 30 16 18  3  9 10 11  8 13
197505 11 15 12 31 28 24  1 30 13 18  5  6 16  7 20
197506 24 10 27  8 23 28 25 26  9 22  2 12 29 30  1

After reading it

df1=pd.read_csv("Qdays_Ddays.docx",low_memory=False) #error_bad_lines=False)

Getting an error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

Please help to rectify it.

Prater
  • 107
  • 1
  • 8
  • https://stackoverflow.com/questions/53256091/how-can-i-fix-error-tokenizing-data-on-pandas-csv-reader – ddejohn Mar 11 '22 at 05:32
  • ```UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte``` – Prater Mar 11 '22 at 05:53
  • 1
    Microsoft Word files are not plain text files. Save your data as a plain text file. – tdy Mar 11 '22 at 05:57

1 Answers1

0

You can't read docx with pandas, however you can read it with python-docx:

import docx
import pandas as pd
 
# open connection to Word Document
doc = docx.Document("test.docx")
 
# read in each paragraph in file
result = [p.text for p in doc.paragraphs]
print(result)

#Then you can convert it to Dataframe
df = pd.DataFrame(result)
#You can specify the return orientation.
df.to_dict('series')
#or 
df.to_dict('split')
#or
df.to_dict('records')
#or
df.to_dict('index')
Gaston Alex
  • 151
  • 1
  • 6