strings to column using python

Question

I have entire table as string like below: a= "id;date;type;status;description\r\n1;20-Jan-2019;cat1;active;customer is under\xe9e observation\r\n2;18-Feb-2019;cat2;active;customer is genuine\r\n"

inside string we do have some ascii code like \xe9e so we have to convert the string to non-ascii

My expected output is to convert above string to a dataframe as below:

id date       type status description
1 20-Jan-2019 cat1 active customer is under observation
2 18-Feb-2019 cat2 active customer is genuine

My code :

b = a.splitlines()
c = pd.DataFrame([sub.split(";") for sub in b])

I am getting the following output. but I need the fist row as my header and also convert the ascii to utf-8 text.

        0   1           2           3   4                               5  6
    0   id  date        type    status  description                     None  None
    1   1   20-Jan-2019 cat1    active  customer is underée observation None  None
    2   2   18-Feb-2019 cat2    active  customer is genuine             None  None

Also, please not here it is creating extra columns with value None. Which should not be the case

`c.columns = c.iloc[0]`, then `c = c.iloc[1:].reset_index(drop=True)` — Erfan, Oct 01 '19 at 07:45
Possible duplicate of [Convert row to column header for Pandas DataFrame,](https://stackoverflow.com/questions/26147180/convert-row-to-column-header-for-pandas-dataframe) — Erfan, Oct 01 '19 at 07:47
@Erfan I have missed one line to update here that it is creating extra columns with value None — Jahnab Kumar Deka, Oct 01 '19 at 07:58
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/200223/discussion-between-erfan-and-jahnab-kumar-deka). — Erfan, Oct 01 '19 at 09:46

Jimmy TwoCents · Answer 1 · 2019-10-01T08:22:26.693

Here is a bit of a hacky answer, but given that your question isn't really clear, this should hopefully be sufficient.

 import pandas as pd
 import numpy as np
 import re

 a="id;date;type;status;description\r\n1;20-Jan-2019;cat1;active;customer is under\xe9e observation\r\n2;18-Feb-2019;cat2;active;customer is genuine\r\n"

 b=re.split('; |\r|\n',a) #split at the delimiters.

 del b[-1] #also delete the last index, which we dont need
 b[1:]=[re.sub(r'\xe9e', '', b[i]) for i in range(1,len(b))]  #get rid of that \xe9e issue

 df=pd.DataFrame([b[i].split(';') for i in range(1,len(b))]) #make the dataframe
 ##list comprehension allows to generalize this if you add to string##
 df.columns=b[0].split(';') #split the title words for column names
 df['id']=[i for i in range(1,len(b))]
 df

This output is presumably what you meant by a dataframe:

Your code will be restricted to two rows as you have specifically mentioned b=re.split('; |\r|\n1|\n2|\n',a) but I need it for a large file — Jahnab Kumar Deka, Oct 01 '19 at 08:00
you should probably mention that in your question then...I've updated the solution however which should help you — Jimmy TwoCents, Oct 01 '19 at 08:08

strings to column using python

1 Answers1