Questions tagged [stata]

Stata is a commercial, general-purpose statistical software. It is available for Windows, Mac and Unix systems. Stata's capabilities include data management, statistical analysis and graphics. ------------------------------------------------------------ IMPORTANT: Click 'Learn more' for advice on how to ask high quality Stata-related questions on Stack Overflow.

About the Stata statistical software

Stata is an integrated package with a point-and-click interface and a command syntax. The latter is part of the ado scripting language, which allows for extensive programmability of new features, as well as automation of repetitive tasks.

In addition, Stata offers Mata. This is not only an interactive environment for manipulating matrices, but also a full development environment that can produce compiled and optimized code.

Both ado and Mata languages optionally support object-oriented programming through classes.

'Stata' is an invented word, not an acronym, and therefore should not have all its letters capitalized (i.e., 'STATA' is considered incorrect). See the last item of the Statalist FAQ.

As of 2023, Stata 18 is the most recent version.

How to ask high quality reproducible questions in Stata

For questions involving the use of macros in the context of Stata, please use the dedicated stata-macros tag, in addition to the stata tag.

The secret to writing a high quality reproducible Stata question is the successful creation of a sandboxed example. This should use the shortest possible snippet of code and the minimal amount of example data required to replicate your problem.

Stack Overflow's Stata volunteers are always happy to help but they do not spoon-feed. Lack of effort on your part will make it less likely that you get an answer and increase the chance that your Stata question will be closed and ultimately deleted.

Writing a good question is not a trivial task and requires experience. The latter comes with practice, which in turn requires perseverance. Always respond to comments requesting clarification from potential helpers.

• Can I ask a question if I am new to Stata and I do not know its commands yet?

Before posting a question, please make sure that you have read Getting Started with Stata. You can access these introductory manuals by typing help gs from Stata's command prompt.

There is simply no replacement for acquainting yourself with the basic concepts and syntax of Stata. This is particularly important as effective communication requires you to be able to speak the same language as the other more experienced Stata users on Stack Overflow.

Do not forget that these users want to answer interesting programming problems, rather than act as tutors for teaching the basics. A more general forum such as Statalist or reddit may be more appropriate for problems relating to basic command usage.

• Can volunteers here give me the code to do [something] in Stata?

Stack Overflow is geared towards solutions for specific programming problems. It is thus important that you explain as clearly as possible your situation and show us what you have tried.

Start by clearly stating your question and telling us your Stata version and platform (Windows, Mac, Linux).
Then give some context. This should focus on succinctly describing in words both your dataset and what you are trying to do.
Next tell us how you attempted to accomplish your goal. This stage includes attaching the code that you used and the produced output. You should also link to any similar questions that you consulted on-line.
Finally provide us with example data to run the code and reproduce your problem. These data should not be shown using a screenshot! See further down for help on this step.

• Can I get help to translate code from R/Python/SAS/SPSS to Stata?

Questions asking how to translate code from other languages to Stata's ado or Mata languages are only valid if and only if there is a specific problem to be addressed in the attempted Stata code. Consequently, all the items listed in the previous and next sections are also relevant here.

• Can someone explain to me why the Stata code I use does not work?

Check for typos both in the script and in the code snippet provided in your question. The Stata interpreter is unforgiving: what might seem a straightforward programming task can thus quickly become an exercise in frustration. Stata volunteers on Stack Overflow are not here to hunt down typos arising from careless typing.
```
Example:

 locla mymacro HELLO
 genrate var = 5

<!- ->

 local mymacro HELLO
 generate var = 5
```
It is best to not abbreviate commands and avoid eliminating all white-space. This makes code harder to read and it is more error-prone. Other inexperienced users may also find it difficult to recognize even basic commands.
```
Example:

 forval i=1/5 {
 loc mymacro`i' HELLO `i'
 g var`i'=`i'
 }

<!- ->

 forvalues i = 1 / 5 {
     local mymacro`i' HELLO `i'
     generate var`i' = `i'
 }
```
Do not post your entire do file or code segment, but only the problematic part. In addition, make sure you properly format your code using code blocks. If your code snippet is more than five or six lines, break this into sections if necessary. Where the names of the variables you use are not self-explanatory, please provide comments.

Example:

sysuse auto
des
sum mpg
gen mempg=r(mean)
gen smpg=r(sum)
reg mpg weight length

<!- ->
```
 /* load data */

 sysuse auto
 describe

 /* descriptive statistics */

 summarize mpg
 generate mean_mpg = r(mean)
 generate sum_education = r(sum)

 /* regression analysis */

 regress mpg weight length
```

<! ->

Make sure to check the help file for clues on why your code fails. Problems are often caused by invalid syntax. You can access the help files for commands and functions by typing help command/function name in Stata's command prompt.
```
Example:

 list, separate(0)
 option separate() not allowed
 r(198);
```
Here, typing help list reveals that this is not legal syntax. Indeed, the name of the option is separator(#) and not separate(#).

<!- ->

Try to debug the code on your own before you ask here. Stata has useful debugging commands such as set trace (see help trace for more details), which shows how the code executes in real-time. Another useful debugging command is pause, which temporarily suspends execution of the code (help pause for more information).
```
Example:

 set trace on

 forvalues i = 1 / 2 {
     display `i'
 }

 - forvalues i = 1 / 2 {
 - display `i'
 = display 1
 1
 - }
 - display `i'
 = display 2
 2
 - }
```
Use these commands if your problem is not an obvious syntax error and include in your question selected relevant output, which is likely to shed more light on the causes of the problem. In addition, always include the full error code and message that Stata reports.

<! ->

If you are using a community-contributed command that you have downloaded from SSC, the Stata Journal or another source, it is important that you indicate this early on in your question. In this way, people who might answer do not waste time looking for it in external sites and can more quickly identify problems related specifically to this command.

<!- ->

• Why is it not a good idea to attach a screenshot of my Stata dataset/results?

Please do not upload screenshots!

Screenshots are not as helpful as you hope, primarily because they do not allow people who might answer to copy and paste data into their own Stata and try to reproduce the problem.

Simple datasets can be entered with the edit command, which opens the data editor and allows the user to manually type or paste data.

In addition, there are five other ways you can provide example data for your Stata question.

Programmatically, the input command can be used:

 clear

 input id str5 name income
 1 "Tracy" 90000
 2 "Ramon" 70000
 3 "Kevin" 80000
 end

 list

    +---------------------+
    | id    name   income |
    |---------------------|
 1. |  1   Tracy    90000 |
 2. |  2   Ramon    70000 |
 3. |  3   Kevin    80000 |
    +---------------------+

If your data is confidential, you can demonstrate the problem using the sysuse command to load one of Stata's toy datasets:

 sysuse dir
 auto.dta         bplong.dta       brand2.dta       bsexper3.dta     census.dta     
 auto2.dta        bpwide.dta       bsexper1.dta     cancer.dta       citytemp.dta    
 autornd.dta      brand1.dta       bsexper2.dta     cearep.dta       citytemp4.dta

 sysuse census, clear

 list state region pop marriage in 1 / 5

    +---------------------------------------------+
    | state        region          pop   marriage |
    |---------------------------------------------|
 1. | Alabama      South     3,893,888     49,018 |
 2. | Alaska       West        401,851      5,361 |
 3. | Arizona      West      2,718,215     30,223 |
 4. | Arkansas     South     2,286,435     26,513 |
 5. | California   West     23,667,902    210,864 |
    +---------------------------------------------+

Alternatively, you can directly download an online example dataset with the use command:

 clear
 use http://fmwww.bc.edu/ec-p/data/wooldridge/vote1

 list district voteA expendA shareA in 1 / 5

    +-------------------------------------+
    | district   voteA   expendA   shareA |
    |-------------------------------------|
 1. |        7      68     328.3    97.41 |
 2. |        1      62    626.38    60.88 |
 3. |        2      73     99.61    97.01 |
 4. |        3      69    319.69     92.4 |
 5. |        3      75    159.22    72.61 |
    +-------------------------------------+

For examples with your current dataset use the dataex command:
```
 dataex mpg price foreign in 1 / 5, elsewhere 

 ----------------------- copy starting from the next line -----------------------
    * Example generated by -dataex-. To install: ssc install dataex
 clear
 input int(mpg price) byte foreign
 22 4099 0
 17 4749 0
 22 3799 0
 20 4816 0
 15 7827 0
 end
 label values foreign origin
 label def origin 0 "Domestic", modify
 ------------------ copy up to and including the previous line ------------------
```
In this case, the first five observations of variables mpg, price and foreign are requested. Note the option elsewhere, which is explained in the help file for dataex.

Copy and paste everything between the end lines and use the {} button in the Stack Overflow question editor to format the snippet.

The dataex command is especially needed when:
- We need to be clear on whether a variable shown as text is really a string variable or a numeric variable with value labels.
- You have date variables, which otherwise can be very awkward for people who might answer to handle.

<!- ->

Finally, you can also use several other Stata commands and functions to simulate data:

 /* generate data in wide form */

 // discard data in memory
 clear

 // set the number of observations in dataset
 set obs 6

 // create a simple identifier
 generate id = _n

 // set the random-number seed for reproducibility
 set seed 12345

 // create a uniformly distributed random variable with values between 0 and 1
 generate var1 = runiform()

 // create a normally-distributed random variable with mean 20 and standard deviation 5
 generate var2 = rnormal(20, 5)

 // create random indicator variable 0/1
 generate var3 = rbinomial(1, 0.5)

 // see the results
 list, separator(0)

    +---------------------------------+
    | id       var1       var2   var3 |
    |---------------------------------|
 1. |  1   .3576297   22.72038      0 |
 2. |  2   .4004426   20.00814      1 |
 3. |  3   .6893833    21.7884      1 |
 4. |  4   .5597356   29.39434      0 |
 5. |  5   .5744513   33.77373      0 |
 6. |  6   .2076905   16.93702      1 |
    +---------------------------------+

<!- ->

 // optionally create value labels for numeric variables such as id above

 label define idlabel 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 6 "six"
 label values id idlabel

 list id, separator(0)

    +-------+
    |    id |
    |-------|
 1. |   one |
 2. |   two |
 3. | three |
 4. |  four |
 5. |  five |
 6. |   six |
    +-------+

<!- ->

 // create (random) date variables
 clear
 set obs 6

 // a daily date numeric variable
 display date("25/11/2018", "DMY")
 21513

 generate var1 = 21513 + _n

 // a random date variable within a specified interval
 generate var2 = floor( ( mdy(12,31,2018) - mdy(1,1,2017)+1 ) * ///
                        runiform() + mdy(1,1,2017) )

 // a half-yearly date numeric variable
 display yh(2018, 1)
 116

 generate var3 = 116 + _n

 // see the raw results
 list var1 var2 var3, separator(0)

    +----------------------+
    |  var1    var2   var3 |
    |----------------------|
 1. | 21514   21004    117 |
 2. | 21515   21351    118 |
 3. | 21516   21529    119 |
 4. | 21517   21532    120 |
 5. | 21518   21104    121 |
 6. | 21519   21523    122 |
    +----------------------+

<!- ->

 // see formatted results

 format %tdDD/NN/CCYY var1
 format %tdDD/NN/CCYY var2
 format %th var3

 list var1 var2 var3, separator(0)

    +----------------------------------+
    |       var1         var2     var3 |
    |----------------------------------|
 1. | 26/11/2018   04/07/2017   2018h2 |
 2. | 27/11/2018   16/06/2018   2019h1 |
 3. | 28/11/2018   11/12/2018   2019h2 |
 4. | 29/11/2018   14/12/2018   2020h1 |
 5. | 30/11/2018   12/10/2017   2020h2 |
 6. | 01/12/2018   05/12/2018   2021h1 |
    +----------------------------------+

<!- ->

 /* generate data in long form */

 clear
 set obs 9

 // create an identifier increasing every three observations
 egen id = seq(), block(3)

 // create a year variable within each id
 bysort id: generate year = 2015 + _n

 // create a normally distributed random variable within each id
 bysort id: generate var = rnormal()

 // calculate the sum of var within each id
 bysort id: generate sum_var = sum(var)

 // note here the use of the `bysort` prefix, which sorts data and repeats
 // the command for each group of observations

 // see the results by id
 list, sepby(id)

    +-----------------------------------+
    | id   year         var     sum_var |
    |-----------------------------------|
 1. |  1   2016    .1973079    .1973079 |
 2. |  1   2017    1.610224    1.807532 |
 3. |  1   2018   -.8034225    1.004109 |
    |-----------------------------------|
 4. |  2   2016    1.096012    1.096012 |
 5. |  2   2017   -.4407027    .6553089 |
 6. |  2   2018   -1.011427   -.3561177 |
    |-----------------------------------|
 7. |  3   2016    1.019227    1.019227 |
 8. |  3   2017    1.871976    2.891204 |
 9. |  3   2018    .4235664     3.31477 |
    +-----------------------------------+

<!- ->

• Why Stata does not produce the results I want?

You should always include the output from Stata in your question by copying and pasting it directly from the Stata console. Then select the pasted output and click {} in the question editor.

Example:

. sysuse auto
(1978 Automobile Data)

. regress price mpg i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     14.07
       Model |   180261702         2  90130850.8   Prob > F        =    0.0000
    Residual |   454803695        71  6405685.84   R-squared       =    0.2838
-------------+----------------------------------   Adj R-squared   =    0.2637
       Total |   635065396        73  8699525.97   Root MSE        =    2530.9

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
             |
     foreign |
    Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
       _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
------------------------------------------------------------------------------

If Stata gets back to you with unexpected results, it is most likely because it was not programmed correctly. Stata simply does what the user instructs it to do.

That said, users on Stack Overflow are not mind-readers. Providing an example of the desired output will greatly increase your chances of getting a helpful response.

If it is a graph, you can post a picture that illustrates the outcome. Otherwise, a table with an adequate amount of expected results is best. This can be generated using an online table creator (such as Table Generator or ASCII Table Generator) and pasted in your question appropriately formatted in code blocks.

Example:

Country Population Mean_age Sex_Ratio GDP
United States of America 3999 23 1.01 5000
Afghanistan 544 19 0.97 457
China 10000 27 0.96 3400

+--------------------------+------------+----------+-----------+------+
|                          | Population | Mean_Age | Sex_Ratio | GDP  |
| Country                  |            |          |           |      |
+--------------------------+------------+----------+-----------+------+
| United States of America | 3999       | 23       | 1.01      | 5000 |
+--------------------------+------------+----------+-----------+------+
| Afghanistan              | 544        | 19       | 0.97      | 457  |
+--------------------------+------------+----------+-----------+------+
| China                    | 10000      | 27       | 0.96      | 3400 |
+--------------------------+------------+----------+-----------+------+

<!- ->

• Are there any examples of high quality questions?

The following questions can be considered as good examples of how you should structure your own Stata-related programming question:

<!- ->

• Where can I get further advice?

It is crucial that you also read the following pages on Stack Overflow:

Finally, you may also find helpful the information on the Statalist FAQ.

Useful Stata resources:

4571 questions

vote

3 answers

How to import SAS7BDAT database to Stata without SAS

Is there a way to import SAS7BDAT files to Stata without SAS? usesas requires SAS.

sas stata

asked Nov 04 '14 at 08:49

Tabi

vote

3 answers

Translating Stata code into R

General newbie when it comes to time series data analysis in R. I am having trouble translating a bit of Stata code into R code for a replication project I am doing. The intent of the Stata code and the Stata code (from the original analysis) are…

r stata

asked Nov 04 '14 at 03:45

Joshua

vote

2 answers

Stata: Replacing egen in a loop

I am trying to count non-missing values subject to a varying if condition. And then take the max for each month. gen xx1=. gen xx2=. forvalues i = 1/12{ bys state year month: replace xx1= 1 if month==`i' & no_monthsreport>=`i' bys state year month:…

stata

asked Oct 30 '14 at 14:31

Rodrigo

vote

1 answer

Mac OS X: Including Stata file in R leads to error

I'm running code that used to work on a different Macbook on a new one with OS X 10.9.5 R studio 0.98.1083 R just installed freshly (first via home-brew, now standard package) I'm trying to open a stata file that contains German umlauts (special…

r stata

asked Oct 29 '14 at 17:56

FooBar

15,724
19
82
171

vote

1 answer

Combining multiple graphs with a loop

I would like to create multiple graphs and combine them using a loop. I used the following code: local var Connecticut Delaware Minnesota Missouri Rhode Island Tennessee Vermont Wisconsin Hawaii local n: word count `var' forvalues i=1/`n'{ local…

stata

asked Oct 29 '14 at 16:27

rrodrigorn0

vote

2 answers

Stata delimit in command line

I am working on a .do file created by someone else. This person used a semicolon delimiter in the entire file. I am trying to go through this file and see what is going on. I like to do this by selecting a portion of the code and hitting the…

comments stata

asked Oct 28 '14 at 14:48

bill999

2,147
8
51
103

vote

2 answers

Format data for survival analysis using pandas

I'm trying to figure out the quickest way to get survival analysis data into a format that will allow for time varying covariates. Basically this would be a python implementation of stsplit in Stata. To give a simple example, with the following set…

python pandas stata survival-analysis

asked Oct 22 '14 at 23:08

Luke

6,699
13
50
88

vote

2 answers

Python-like zip function in Stata?

In Python I can easily extract pairs out of lists: >>>list1 = [1, 2, 3] >>>list2 = [4, 5, 6] >>>zip(list1, list2) [(1, 4), (2, 5), (3, 6)] How can I achieve the same result in Stata? If I have two locals, both containing the same number of…

python stata

asked Oct 21 '14 at 16:19

Parzival

2,004
4
33
47

vote

2 answers

How do I calculate the maximum or minimum seen so far in a sequence, and its associated id?

From this Stata FAQ, I know the answer to the first part of my question. But here I'd like to go a step further. Suppose I have the following data (already sorted by a variable not shown): id v1 A 9 B 8 C 7 B 7 A 5 C 4 A 3 A 2…

stata

asked Oct 16 '14 at 03:12

djas

vote

2 answers

Stata comparing lists and finding missing numbers

I have a question that is probably very simple but I can't figure it out right now. I have two long lists of index numbers, they are identical except for the fact that the first list contains some numbers that the second list does not and thus the…

list loops stata

asked Oct 14 '14 at 17:43

user3594343

vote

1 answer

Create consecutive ID based on non-consecutive ID in Stata

Given the following variables (id and partner) in Stata, I would like to create a new variable (pid) that is simply the consecutive partner counter within id (as you can see, partner is not consecutive). Here is a MWE: clear input id partner pid …

stata

asked Oct 13 '14 at 11:22

Bernd Weiss

vote

0 answers

Efficiently writing Stata files with pandas

I export a large Data Frame (18 million observations; 5 columns) called SalesData to Stata native file format using pandas to_stata: SalesData.to_stata(sales) It works but it is extremely slow to the point it is not usable in production. I think I…

python pandas stata

asked Oct 08 '14 at 13:10

Charles

vote

2 answers

Stata: check if any variables follow a naming pattern

I am about to create several temporary variables following the pattern tmp_*, and will drop (delete) them afterwards. I thought I would be clever and do... ds tmp_* assert ": word 1 of `r(varlist)'" == "" ** then I create and do stuff with tmp_bah…

stata

asked Oct 02 '14 at 19:35

Frank

66,179
8
96
180

vote

2 answers

No Access to Non-Greedy .*?

My text is: 999 blaw blaw blaw1 999 blaw blaw blaw And I want to choose: blaw blaw blaw1 Now, I could do this using: ([0-9][0-9][0-9] )(.*?)( [0-9][0-9][0-9]) But the problem is I can't use ".*?" in what I'm using. Replacing (.*?) with…

regex stata non-greedy

asked Sep 30 '14 at 00:23

Arash

vote

2 answers

Stata reference variable by column number

Can I refer to variables by the number of the column they reside in? Here's why I want to know: for each observation, I have three vectors in Euclidean space, so my columns are obsID | v1b1 v1b2 v2b1...v5b4 | tv1b1 tv1b2 tv2b1...tv5b4 | nv1b1…

stata

asked Sep 26 '14 at 21:00

Frank

66,179
8
96
180

Prev 1 2 3

…

100