3

I just got a csv file that I want to load the dataset as dataframe using pandas. However, I'm kinda confused this data format.

Here is the sample of data for two lines:

Name=John, Gender=M, BloodType=A, Location=New York, Age=18
Name=Mary, Gender=F, BloodType=AB, Location=Seatle, Age=30

How do I load this dataset into dataframe with columns (Name, Gender, BloodType,...etc)?

I will appreciate that someone gives me tips to start with!

Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
Jammy
  • 87
  • 1
  • 8

2 Answers2

5

Use the pandas read_csv method to read the csv file. Here is a sample program how you can do that:

import pandas as pd

data = pd.read_csv("path_to_csv_file")

print(data)
Dov Grobgeld
  • 4,783
  • 1
  • 25
  • 36
3

Use read_csv with header=None first:

import pandas as pd

temp=u"""Name=John,Gender=M,BloodType=A,Location=New York,Age=18
Name=Mary,Gender=F,BloodType=AB,Location=Seatle,Age=30"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
           0         1             2                  3       4
0  Name=John  Gender=M   BloodType=A  Location=New York  Age=18
1  Name=Mary  Gender=F  BloodType=AB    Location=Seatle  Age=30

Then DataFrame.apply with Series.str.split and select second lists, last change columns names:

df1 = df.apply(lambda x: x.str.split('=').str[1])
df1.columns = df.iloc[0].str.split('=').str[0].rename(None)
#if necessary
df1['Age'] = df1['Age'].astype(int)
print (df1)
   Name Gender BloodType  Location  Age
0  John      M         A  New York   18
1  Mary      F        AB    Seatle   30
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252