2

I am trying to convert a python Dataframe to a Matlab (.mat) file.

I initially have a txt (EEG signal) that I import using panda.read_csv:

MyDataFrame = pd.read_csv("data.txt",sep=';',decimal='.'), data.txt being a 2D array with labels. This creates a dataframe which looks like this.

In order to convert it to .mat, I tried this solution where the idea is to convert the dataframe into a dictionary of lists but after trying every aspect of this solution it's still unsuccessful.

scipy.io.savemat('EEG_data.mat', {'struct':MyDataFrame.to_dict("list")})

It did create a .mat file but it did not save my dataframe properly. The file I obtain after looks like this, so all the values are basically gone, and the remaining labels you see are empty when you look into them.

I also tried using mat4py which is designed to export python structures into Matlab files, but it did not work either. I don't understand why, because converting my dataframe to a dictionary of lists is exactly what should be done according to the mat4py documentation.

ArnoBen
  • 115
  • 3
  • 12
  • 1
    This looks like a useful contribution but can you rewrite it into a question and then supply your answer as an answer? Also it might be helpful to see some example data. – nekomatic Feb 22 '18 at 16:33
  • 1
    Of course, I will do this tonight when I get back home. – ArnoBen Feb 22 '18 at 16:47
  • Cool. When I needed to do this conversion the dictionary-of-lists approach worked for me, but clearly there is something different about your data and/or your environment. What versions of python, scipy and MATLAB are you using? – nekomatic Feb 23 '18 at 13:40
  • I'm using Python 3.6, MATLAB R2017b and the version of scipy that is shipped with anaconda which I installed recently (last week). It may come either from my data or from changes in recent versions, I honestly don't know. [Here are what the first lines of my raw data look like.](https://imgur.com/qemBcI6) – ArnoBen Feb 23 '18 at 13:53
  • If you could share that data extract in text form, I'd be happy to play around with it to see if I get the same problem and if so to try and figure out what's going on. – nekomatic Feb 23 '18 at 15:06
  • With pleasure, let me know if you find anything interesting ! Although I should warn you that it's a quite huge file (1,7GB). [Here you go](https://wetransfer.com/downloads/7d2db09280ab59fb73f15939884fc3e720180223151447/4a44f2). This is the raw, untouched file. Of course, when I was trying to find a solution I only used a small fraction of this dataset, like the 10 first lines, since it seemed to be a structural problem. – ArnoBen Feb 23 '18 at 15:22

2 Answers2

5

I believe that the reason the previous solutions haven't worked for you is that your DataFrame column names are not valid MATLAB struct field names, because they contain spaces and/or start with digit characters.

When I do:

import pandas as pd
import scipy.io
MyDataFrame = pd.read_csv('eeg.txt',sep=';',decimal='.')
truncDataFrame = MyDataFrame[0:1000] # reduce data size for test purposes
scipy.io.savemat('EEGdata1.mat', {'struct1':truncDataFrame.to_dict("list")})

the result in MATLAB is a struct with the 4 fields reltime, datetime, iSensor and quality. Each of these has 1000 elements, so the data from these columns has been converted, but the rest of your data is missing.

However if I first rename the DataFrame columns:

truncDataFrame.rename(columns=lambda x:'col_' + x.replace(' ', '_'), inplace=True)  
scipy.io.savemat('EEGdata2.mat', {'struct2':truncDataFrame.to_dict("list")})

the result in MATLAB is a struct with 36 fields. This is not the same format as your mat4py solution but it does contain (as far as I can see) all the data from the source DataFrame.

(Note that in your question, you are creating a .mat file that contains a variable called struct and when this is loaded into MATLAB it masks the builtin struct datatype - that might also cause issues with subsequent MATLAB code.)

nekomatic
  • 5,988
  • 1
  • 20
  • 27
  • Yes this is exactly what I saw, these 4 fields and not the remaining data. The spaces and numbers should indeed explain this behaviour. I manually removed these characters (for the sake of it) and it worked normally. It's a file given to me by my superior so I'll warn him about this formatting. Thank you immensly for your help. – ArnoBen Feb 28 '18 at 15:24
2

I finally found a solution thanks to this post. There, the poster did not create a dictionary of lists but a dictionary of integers, which worked on my side. It is a small example, easily reproductible. Then I tried to manually add lists by entering values like [1, 2], an it did not work. But what worked was when I manually added tuples !

MyDataFrame needs to be converted to a dictionary and if a dictionary of lists doesn't work, try with tuples.

For beginners : lists are contained by [] and tuples by (). Here is an image showing both.

This worked for me:

import mat4py as mp
EEGdata = MyDataFrame.apply(tuple).to_dict()
mp.savemat('EEGdata.mat',{'structs': EEGdata})

EEGdata.mat should now be readable by Matlab, as it is on my side.

nekomatic
  • 5,988
  • 1
  • 20
  • 27
ArnoBen
  • 115
  • 3
  • 12
  • I assume your `mp.savemat` is the same thing as `scipy.io.savemat`? – nekomatic Feb 26 '18 at 16:46
  • It's actually not. In my case, since the export did not work with scipy.io.savemat, I used mat4py (0.4.0) which is a python tool specifically designed for this kind of task. It only consists of two functions : savemat and loadmat. – ArnoBen Feb 27 '18 at 13:42
  • This is a valid alternative solution to the accepted one because what you get in MATLAB this way may be a more convenient format for accessing the data by row - `mystruct(345)` contains all fields from the 345th row of the DataFrame, and so on. – nekomatic Feb 28 '18 at 17:37