0

I have entire table as string like below: a= "id;date;type;status;description\r\n1;20-Jan-2019;cat1;active;customer is under\xe9e observation\r\n2;18-Feb-2019;cat2;active;customer is genuine\r\n"

inside string we do have some ascii code like \xe9e so we have to convert the string to non-ascii

My expected output is to convert above string to a dataframe as below:

id date       type status description
1 20-Jan-2019 cat1 active customer is under observation
2 18-Feb-2019 cat2 active customer is genuine

My code :

b = a.splitlines()
c = pd.DataFrame([sub.split(";") for sub in b])

I am getting the following output. but I need the fist row as my header and also convert the ascii to utf-8 text.

        0   1           2           3   4                               5  6
    0   id  date        type    status  description                     None  None
    1   1   20-Jan-2019 cat1    active  customer is underée observation None  None
    2   2   18-Feb-2019 cat2    active  customer is genuine             None  None

Also, please not here it is creating extra columns with value None. Which should not be the case

1 Answers1

0

Here is a bit of a hacky answer, but given that your question isn't really clear, this should hopefully be sufficient.

 import pandas as pd
 import numpy as np
 import re

 a="id;date;type;status;description\r\n1;20-Jan-2019;cat1;active;customer is under\xe9e observation\r\n2;18-Feb-2019;cat2;active;customer is genuine\r\n"

 b=re.split('; |\r|\n',a) #split at the delimiters.

 del b[-1] #also delete the last index, which we dont need
 b[1:]=[re.sub(r'\xe9e', '', b[i]) for i in range(1,len(b))]  #get rid of that \xe9e issue

 df=pd.DataFrame([b[i].split(';') for i in range(1,len(b))]) #make the dataframe
 ##list comprehension allows to generalize this if you add to string##
 df.columns=b[0].split(';') #split the title words for column names
 df['id']=[i for i in range(1,len(b))]
 df

This output is presumably what you meant by a dataframe: ity

Jimmy TwoCents
  • 165
  • 1
  • 1
  • 9