5

I get a lot of datasets that arrive as .dat files with syntax files for converting to SPSS (.sps). I'm an R user, so I need to convert the .dat file into a .sav that R can read.

In the past, I've used PSPP to do this manually. (I can't afford SPSS!) But I'd MUCH prefer a programmatic solution.

I thought pspp-convert would do the trick, but there's something I'm not understanding about how that works in terms of inputting the syntax file:

My files are:

  • data.dat
  • data.sps (which correctly points to data.dat)

I tried

pspp-convert data.sps data.sav

But get

`data.sps' is not a system or portable file. 

Makes sense since the input is supposed to be a portable file. Am I trying to do something beyond the scope of this CLI?

Generally speaking, there MUST be some way to apply an SPS file to a DAT file to get a SAV file (or any other portable file) back, right?

Chris Wilson
  • 6,599
  • 8
  • 35
  • 71
  • 3
    What does the SPSS syntax do? And what type of file is the .dat file? It is unlikely that the .dat file cannot be imported directly into R. If the code in the sps file (except for the import of the data) does not change much between files you could translate the spss syntax to R. You could also have a look at https://github.com/lebebr01/SPSStoR – Jan van der Laan Aug 25 '16 at 14:43
  • 1
    Summarising: I am not sure that going through a .sav file is the best way, but we need more info to be sure. – Jan van der Laan Aug 25 '16 at 14:46
  • R's foreign library reads .sav files pretty well in my experience. The .sps file is just a program to convert the .dat file (a HUGE ASCII file with no variable information) into a structured data file. But I'd be fine with .POR or anything else. – Chris Wilson Aug 25 '16 at 14:49
  • 1
    I think you can use pspp from the command line without interactive mode by passing it a sps file: `pspp data.sps` – Jan van der Laan Aug 25 '16 at 15:07
  • I agree that R can read sav files well, but you are planning to patch a second tool onto R, for something that R can also do well. Importing ASCII files R also does very well. The only thing missing is the translation of the SPSS syntax to R syntax. It depends on how much the SPSS syntax changes between files and how complex the syntax is, if the ease of having to work with only one tool is worth it. – Jan van der Laan Aug 25 '16 at 15:12
  • Can you show a bit of the dat file? That's not a standard file format that I'm aware of. – Thomas Aug 25 '16 at 15:37
  • You're absolutely right -- I can execute the .sps files from the command line with pspp (which some slight modifications to paths and one syntax error.) The files are here, if interested: http://www.ussc.gov/research-and-publications/commission-datafiles#individual – Chris Wilson Aug 26 '16 at 05:22
  • Thank you for the help! In this case, there are a HUGE number of variables, so reading the ASCII directly would be a nightmare :) – Chris Wilson Aug 26 '16 at 05:23

1 Answers1

2

From an SPSS Statistics point of view, a .dat file extension most often means the data is in a fixed ASCII text format. You would need the accompanying codebook to tell you what variables to read and in what formats. The SPSS Statistics command syntax file (.sps) does this for you. But this file is simply the list of SPSS Statistics commands used to read the ASCII data. It is not a data file itself.

Elsewhere you've referenced these files as "portable files". An SPSS Statistics portable file (.por) is a very special case of an ASCII file; structured to be read and written by SPSS Statistics. In any case, if your preferred tool takes an SPSS Statistics portable file (.por), these *.dat files likely aren't it.

Assuming these *.dat files are fixed ASCII text files, you'll need to discern how the information therein is stored and then use a likely tool for reading ASCII text.

David_Dwyer
  • 166
  • 5