0

I split data for train data and test data for machine learning like this

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

and now I want to save each data to 'train_data' and 'test_data' file that are accessible like './train_data' or './test_data'

but I don't know how. I found there is 'to_csv' but it's not for this I think cuz when I did

test_df.to_csv('./test_data')

I get error saying IsADirectoryError: [Errno 21] Is a directory: './test_data'. How should I do ?

  • In wich format would you like to save the data though? csv is very good for spreadsheet type data, where you have a certain number of columns (datapoints) that stay the same for every row (entry). – Marius Feb 01 '21 at 10:11
  • Thank you for replying Mr Marius. okay then csv is good. but I tried test_df.to_csv('./test_data') but error saying IsADirectoryError: [Errno 21] Is a directory: './test_data' – 西山功一 Feb 01 '21 at 10:14
  • Yeah because `test_data` there is referring to a directory (you need the file-extension!). I will answer with the correct code. – Marius Feb 01 '21 at 11:23

1 Answers1

0
import pandas as pd
from sklearn.model_selection import train_test_split

#creating your dataframe
df = ...

train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

train_data.to_csv(open('train_data.csv','w'))
test_data.to_csv(open('test_data.csv','w'))
#you can optionally add an encoding, on my machine I always use encoding="utf-8"

Marius
  • 94
  • 8