5

I am trying to export a pandas dataframe to .arff file to use it in Weka. I have seen that the module liac-arff can be used for that purpose. Going on the documentation here it seems I have to use arff.dump(obj,fp) Though, I am struggling with obj ( a dictionary) I'm guessing I have to create this by myself. How do you suggest me to do that properly? in a big dataset (3 000 000 lines and 95 columns) is there any example you can provide me to export from pandas dataframe to .arff file using python (v 2.7)?

mina
  • 195
  • 1
  • 2
  • 14

3 Answers3

8

First install the package: $ pip install arff

Then use in Python:

import arff
arff.dump('filename.arff'
      , df.values
      , relation='relation name'
      , names=df.columns)

Where df is of type pandas.DataFrame. Voila.

Pero
  • 1,371
  • 17
  • 18
3

This is how I did it recently using the package liac-arff. Event if the arff package is more easy to use, it doesn't allow the definition of column types and values of categorical attributes.

df = pd.DataFrame(...)
attributes = [(c, 'NUMERIC') for c in df.columns.values[:-1]]
attributes += [('target', df[t].unique().astype(str).tolist())]
t = df.columns[-1]
data = [df.loc[i].values[:-1].tolist() + [df[t].loc[i]] for i in range(df.shape[0])]

arff_dic = {
    'attributes': attributes,
    'data': data,
    'relation': 'myRel',
    'description': ''
}

with open("myfile.arff", "w", encoding="utf8") as f:
     arff.dump(arff_dic, f)

Values of categorical attributes such as target must be of type str, event if they are numbers.

M . Franklin
  • 170
  • 1
  • 9
1

Inspired by the answer of @M. Franklin which was not working very well but the idea was there.

import arff

input // your DataFrame.
attributes = [(j, 'NUMERIC') if input[j].dtypes in ['int64', 'float64'] else (j, input[j].unique().astype(str).tolist()) for j in input]


arff_dic = {
  'attributes': attributes,
  'data': input.values,
  'relation': 'myRel',
  'description': ''
}


with open("myfile.arff", "w", encoding="utf8") as f:
  arff.dump(arff_dic, f)

Following this snippet above, it outputs an arff file with the correct format wished. Good luck guys out there!

Simon Provost
  • 356
  • 1
  • 2
  • 15