0

I use Stata 12.

I want to add some country code identifiers from file df_all_cities.csv onto my working data.

However, this line of code:

merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3) 

Gives me the error:

. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);

This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Victor Nielsen
  • 443
  • 2
  • 14
  • `merge` will only accept .dta files. "not working with this version of Stata" means, perhaps, that your version is older than that of the .dta file you are trying to use. – Nick Cox Feb 02 '22 at 19:09

1 Answers1

1

Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)

You need to read the .csv file with the command insheet. You can use it like this.

* Preserve saves a snapshot of your data which is brought back at "restore"
preserve 
    
    * Read the csv file. clear can safely be used as data is preserved
    insheet using "df_all_cities.csv", clear
    
    * Create a tempfile where the data can be saved in .dta format
    tempfile country_codes
    save `country_codes'

* Bring back into working memory the snapshot saved at "preserve"
restore

* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3) 

See how insheet is also using using and this command accepts .csv files.

TheIceBear
  • 2,912
  • 9
  • 23
  • `insheet` should work but has been superseded by `import delimited` as from Stata 13. – Nick Cox Feb 02 '22 at 19:11
  • 1
    Ah, I write commands targeting Stata 12 so maybe an old habit. OP seems to use an older version of Stata, so I will keep `insheet` in my answer. But good point. – TheIceBear Feb 02 '22 at 19:14
  • Absolutely. `insheet` is a better guess at what the OP needs. – Nick Cox Feb 02 '22 at 19:21
  • Indeed I use Stata 12 :) – Victor Nielsen Feb 03 '22 at 15:11
  • @TheIceBear I got an error saying "variable city not found r(111);". Both the working file and the df_all_cities.csv has a column called "city" though. – Victor Nielsen Feb 03 '22 at 15:16
  • 1
    Either one of the file don't or the csv file was not imported properly. Did you check the result after `insheet using "df_all_cities.csv", clear`. If it does not work as intended, see the helpfile. My guess is that you need the option `names`. My solution only addresses your original question about `merge` not accepting `.csv` files directly. – TheIceBear Feb 03 '22 at 15:23
  • @TheIceBear you're right, indeed the column names are on row 1 and are called v1, v2 and v3 – Victor Nielsen Feb 03 '22 at 19:12
  • @TheIceBear sorry to bother you, but now I added "names" in insheet using "df_all_cities.csv", names, clear. But it says "you must start with an empty dataset r(18);" now at the insheet line. I tried clearing first. – Victor Nielsen Feb 03 '22 at 20:07
  • 1
    If you type `help insheet` in Stata you find the help file for `insheet`. At the bottom you can see examples. there you can see that when using two options - for example `insheet using auto.raw, clear double` - you do not put a comma in-between each option. You seem to have a comma in-between `names` and `clear`. – TheIceBear Feb 03 '22 at 20:16
  • That solved it! – Victor Nielsen Feb 04 '22 at 13:41