1

I have a txt file as following:

sub_ID: ['sub-01','sub-02']

ses_ID: ['ses-01','ses-01']

mean: [0.3456,0.446]

I want to read this and convert it to a dataframe such as in the image -don't mind the values in mean_e_field column, it's just an example. the values should be the same as in the txt file. desired dataframe

I tried this and got this however I can't transform it to my prefered df :dataframe data = pd.read_csv(filename, sep=",", header=None) data

I appreaciate your answers in advance.

gulo1221
  • 13
  • 3
  • You said that you've tried using `read_csv`, but the example data you've provided is not in a csv format (in fact, it seems like YAML). Is your data presented in the format above, i.e. one line per column and a list of values? – filpa Jan 10 '23 at 16:05
  • yes, my data is a txt file with each list in a separate line. I want to convert it to a dataframe where the first element is the column name and the others are row values. and with read_csv in pandas, I could automatically convert my txt file into a dataframe, but the dataframe I want is different than I got. – gulo1221 Jan 10 '23 at 16:11

1 Answers1

1

So, several things here.

The reason why your previous data = pd.read_csv(filename, sep=",", header=None) did not work is that you've indicated that it should separate on , and it treats every single line as a row to be split. So, sub_ID: [ 'sub-01','sub-02' ] is split to sub_ID: ['sub-01' and 'sub-02' ].

The example data you've provided seems to be in YAML format:

sub_ID: [ 'sub-01','sub-02' ]

ses_ID: [ 'ses-01','ses-01' ]

mean: [ 0.3456,0.446 ]

If it were CSV, the data would look as follows (it does not):

sub_ID,ses_ID,mean
sub-01,ses-01,0.3456
sub-02,ses-02,0.445

To read this data into a dataframe, you will either need to preprocess it into another format (e.g. csv) or read it as YAML into a dict and pass that to pandas.DataFrame.

For example:

import yaml
with open("data.txt", "r") as file:
    try:
        # This returns a dict from the given YAML data.
        data = yaml.safe_load(file)
    except yaml.YAMLError as exc:
        print(exc)

print(data)
# {'sub_ID': ['sub-01', 'sub-02'], 'ses_ID': ['ses-01', 'ses-01'], 'mean': [0.3456, 0.446]}

After that, you can create a DataFrame from this dict:

df = pd.DataFrame(data)
df.head()


+-----+--------+--------+--------+
|     | sub_ID | ses_ID |  mean  |
+-----+--------+--------+--------+
|   0 | sub-01 | ses-01 | 0.3456 |
|   1 | sub-02 | ses-02 |  0.446 |
+-----+--------+--------+--------+

as desired.

If you have certain entries that are not valid YAML, you will need to preprocess the data before loading it into pandas.

filpa
  • 3,651
  • 8
  • 52
  • 91