0

I am trying to read a Stata .dta file into either python or R so that I can work with it and it is giving me a version error in Python and R . I was wondering how I could resolve this. Here is my code:

import pandas as pd

data = pd.read_stata('file.dta')

Here is the error it is giving me

Traceback (most recent call last):
  File "/mnt/c/Users/t/projects/cg/dta_csv.py", line 11, in <module>
    data = pd.io.stata.read_stata('file.dta')
  File "/home/t/.local/lib/python3.10/site-packages/pandas/io/stata.py", line 2090, in read_stata  
    return reader.read()
  File "/home/t/.local/lib/python3.10/site-packages/pandas/io/stata.py", line 1702, in read        
    self._ensure_open()
  File "/home/t/.local/lib/python3.10/site-packages/pandas/io/stata.py", line 1176, in _ensure_open
    self._open_file()
  File "/home/t/.local/lib/python3.10/site-packages/pandas/io/stata.py", line 1206, in _open_file  
    self._read_header()
  File "/home/t/.local/lib/python3.10/site-packages/pandas/io/stata.py", line 1288, in _read_header
    self._read_old_header(first_char)
  File "/home/t/.local/lib/python3.10/site-packages/pandas/io/stata.py", line 1467, in _read_old_header
    raise ValueError(_version_error.format(version=self._format_version))
ValueError: Version of given Stata file is 70. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

I have also trying using R & RStudio with the haven and foreign libraries. No luck either

> library(foreign)
> df <- read.dta("file.dta")
Error in read.dta("file.dta") : not a Stata version 5-12 .dta file
> library(haven)
> df <- read.dta("file.dta")
Error in read.dta("file.dta") : not a Stata version 5-12 .dta file

Any suggestion for how I could possibly resolve this?

Here is a link to the file:

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Tendekai Muchenje
  • 440
  • 1
  • 6
  • 20
  • Does this answer your question (note `read_dta` vs `read.dta`)? [not a Stata version 5-12 .dta file](https://stackoverflow.com/questions/52075779/not-a-stata-version-5-12-dta-file) – I_O Aug 21 '23 at 22:18
  • @I_O Unfortunately not. – Tendekai Muchenje Aug 21 '23 at 22:36
  • [This answer](https://stackoverflow.com/a/65544083/89482) suggests using the `rio` package (edit: but I just tried on your example file and got 'This version of the file format is not supported'). – neilfws Aug 21 '23 at 22:41
  • In that case, please update your question to show that you indeed used `read_dta` from {haven} (in your example code you're loading {haven} but sticking to `read.dta` from {foreign} nonetheless). – I_O Aug 21 '23 at 22:45
  • Who generated the file and when? File version 70 sounds quite ancient, it may be that nothing can read it anymore. – neilfws Aug 21 '23 at 23:33
  • 1
    If it is Stata .dta file, it is likely to be one of newer versions. I have tried to open it with Stata 14, but received a message "file.dta not Stata format". You could try to ask someone with Stata 18 to open it and use "saveold" to save it. If Stata 18 cannot even open it, it may not be a Stata file. – Zhiqiang Wang Aug 22 '23 at 05:41
  • 1
    Even if you can't open it as such, looking at it in any text editor or trying to type it may indicate what kind of file it is. – Nick Cox Aug 22 '23 at 06:19
  • 1
    Stata 18 can make no sense of it either. So, I suggest that it isn't a Stata file. I can't say what it is beyond a binary file. – Nick Cox Aug 22 '23 at 06:24
  • There was never a version 70 of the dta file format. https://www.stata.com/help.cgi?dtaversion documents this for those without access to Stata. The problem lies upstream: what or who is indicating that this is a Stata-readable file? – Nick Cox Aug 22 '23 at 09:51

1 Answers1

0

Two facts as a contribution:

  1. Stata 18 can make no sense of it either. Stata 18 is at the time of writing the latest version. Stata's longstanding principle is that all previous versions of .dta files remain readable by any version of Stata that is contemporary or later.

  2. There was never a version 70 of the dta file format. https://www.stata.com/help.cgi?dtaversion documents this for those without access to Stata.

So, I conclude that it isn't a Stata file. I can't say what it is beyond a binary file.

The problem lies upstream: what or who is indicating that this is a Stata-readable file?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47