0

Whenever I try to read a .mat file in Python, I get the following error message

ValueError: Unknown mat file type, version 9, 99

The particular dataset I want to open is here, named "GSE137764_HCT_GaussiansGSE137764_mooth_scaled_autosome.mat".

I have tried to follow the multiple solutions available here, but I keep getting the same error. In particular, I do not use MATLAB, so I can't really do save('myfile.mat','-v7'), for example. Any ideas?

sam wolfe
  • 103
  • 9

1 Answers1

1

It is a "normal" csv file, just not named like one.

pd.read_csv("GSE137764_HCT_GaussiansGSE137764_mooth_scaled_autosome.mat", delimiter="\t", low_memory=False)
Daraan
  • 1,797
  • 13
  • 24
  • How would you select only the entries with header `chr1`, for example? – sam wolfe Jul 24 '23 at 13:24
  • Pandas will rename the columns to be unique, so you have chr1, chr1.1 **1)** You could use `read_csv` with `header=None`, this will then write the chrX into the first row, after wards you can use boolean indexing, which is similar to the method where you **2) keep the headers**: `df.loc[:, df.columns.str.match("chr1($|\.)")]` `chr1` is the type you want to match `($|\.)` means either end of string($) or the alternative `.X` where X is the number of the duplicated column. NOTE: Likewise you can **3) rename** the columns with `df.columns = df.columns.str.extract("(chr\d+)").values.flatten()` – Daraan Jul 24 '23 at 15:45