2

I have a text data that look like this:

3,"a","b","e","r"\n4,"1","2","5","7"\n4,"23","45","76","76"

I want to transfor it to be table like this:

a  b  e  r
1  2  5  7
23 45 76 76

I've tried to use a pandas data frame for that, but the data size is quite big, like 40 Mb. So what should I do to solve it? Sorry for my bad explanation. I hope you can understand what I mean. Thanks!

import os
import pandas as pd
from io import StringIO
a = pd.read_csv(StringIO("12test.txt"), sep=",", header=None, error_bad_lines=False)
df = pd.DataFrame([row.split('.') for row in a.split('\n')])


print(df)

I've tried this but it doesn't work. Some errors occurred like "'DataFrame' object has no attribute 'split' ", the data frame containing a string "12test.txt" not the data inside the file, memory problem, etc.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
Kirana
  • 21
  • 3

2 Answers2

0

Try:

>>> s = '3,"a","b","e","r"\n4,"1","2","5","7"\n4,"23","45","76","76"'
>>> pd.DataFrame([[x.strip('"') for x in i.split(',')[1:]] for i in s.splitlines()[1:]], columns=[x.strip('"') for x in s.splitlines()[0].split(',')[1:]])
    a   b   e   r
0   1   2   5   7
1  23  45  76  76
>>> 

Use a list comprehension then convert it to a pandas.DataFrame.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • Thank you so much for the answer. I've tried it, but this error occurred "pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 0" Sorry, but what is list comprehension? – Kirana Dec 16 '22 at 04:14
0

To read files or binary text data you can use StringIO, removing first digit of string and digits alongside \n make a readable input string when pass to read_csv.

import io
import re

import pandas as pd

s = '3,"a","b","e","r"\n4,"1","2","5","7"\n4,"23","45","76","76"'
s = re.sub(r'[\n][0-9]', "\n", s)

df = pd.read_csv(io.StringIO(s))

# remove column generated by first character that contains NAN values
df.drop(df.columns[0], axis=1)
user11717481
  • 1
  • 9
  • 15
  • 25