1

I have data with very odd delimiters:

1,|ABC1|,|BUD|,|Fed Budget & Appropriations|,|t1|
2,|ABC2|,|LBR|,|Labor, Antitrust & Workplace|,|t2|
3,|ABC3|,|UNM|,|Unemployment|,|t1|

So the delimiter is a comma and each variable, but the first one (the identifier) is between two pipes. The problem is that the fourth variable also uses commas, so I can't simply use commas as delimiters and delete the pipes. I have found a way to work the data by doing some find and replace operations through the terminal, but I would like to do this through Stata. Does anyone have an idea how to?

ArOk
  • 193
  • 7

1 Answers1

2

I put your data example into a text file and found that the delimiters were detected quite well automatically. Then I dropped any variable that was all commas or all missing, using findname from the Stata Journal.

. import delimited "troublesome.txt"
(9 vars, 3 obs)

. list 

     +-------------------------------------------------------------------------+
     | v1     v2   v3    v4   v5                             v6   v7   v8   v9 |
     |-------------------------------------------------------------------------|
  1. | 1,   ABC1    ,   BUD    ,    Fed Budget & Appropriations    ,   t1    . |
  2. | 2,   ABC2    ,   LBR    ,   Labor, Antitrust & Workplace    ,   t2    . |
  3. | 3,   ABC3    ,   UNM    ,                   Unemployment    ,   t1    . |
     +-------------------------------------------------------------------------+

. findname, all(@ == ",")
v3  v5  v7

. drop `r(varlist)'

. findname, all(missing(@))
v9

. drop `r(varlist)'

. destring v1, ignore(",") replace
v1: character , removed; replaced as byte

. list 

     +-----------------------------------------------------+
     | v1     v2    v4                             v6   v8 |
     |-----------------------------------------------------|
  1. |  1   ABC1   BUD    Fed Budget & Appropriations   t1 |
  2. |  2   ABC2   LBR   Labor, Antitrust & Workplace   t2 |
  3. |  3   ABC3   UNM                   Unemployment   t1 |
     +-----------------------------------------------------+
 
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Great! That solved the problem. But I was not careful enough when looking at my data. There is some additional problems that appear when I just read in the data by allowing automatic detection of the delimiters. I have edited my question now, to show what I mean – ArOk Feb 11 '21 at 16:58
  • Sorry, again I did not specify it clear enough. In the second example I still have commas in some of the variables. I am editing the question again – ArOk Feb 11 '21 at 18:31
  • 1
    My bad, I will delete the edit and open up a new question with it. My first question has been answered. Thanks! – ArOk Feb 11 '21 at 21:16