0

I have two sets of data in separated .h5 files (Hierarchical Data Format 5, HDF5), obtained with python scripts, and I would like to perform statistical analysis to find correlations between them. My experience here is limited; I don't know any R.

I would like to load the data into SPSS, but SPSS doesn't seem to support .h5. What would be the best way to go here? I can write everything to a .csv file, but I would loose the names of the variables. Is there a way to convert the data without loosing any information? And why doesn't SPSS support h5 anyway?

I am aware of the existence of the Rpy module. Do you think it is worthwhile to learn programming in R? Would this give me the same arsenal of methods as I have in SPSS?

Thank you for your input!

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Scipio
  • 313
  • 2
  • 15
  • You can do statistical analysis directly in python using [pandas](http://pandas.pydata.org/), which also supports HDF out of the box using [pytables](http://pytables.org/). – filmor Mar 16 '14 at 12:04

1 Answers1

1

Is there a way to convert the data without losing any information?

If the HDF5 data is regular enough, you can just load it in Python or R and save it out again as CSV (or even SPSS .sav format if you're a bit more adventurous and/or care about performance).

Why doesn't SPSS support h5 anyway?

Who knows. It probably should. Oh well.

Do you think it is worthwhile to learn programming in R?

If you find SPSS useful, you may also find R useful. Since you mentioned Python, you may find that useful too, but it's more of a general-purpose language: more flexible, but less focused on math and stats.

Would R give me the same arsenal of methods as I have in SPSS?

Probably, depending on exactly what you're doing. R has most stuff for math and stats, including some fairly esoteric and/or new algorithms in installable packages. It has a few things Python doesn't have (yet), but Python also covers most of the bases for many users.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • That's a very strong claim about feature parity of R and Python which can be easily reverted in other direction. – kotrfa Feb 28 '18 at 13:57