How do you implement non-greedy matching in Stata using regex? Or does Stata even have this capability?
I want to extract all text that occurs between a hashtag "#" and a period ".".
Example code:
clear
set obs 3
generate var1="anything#aaabbbccc.dddeee.fff" in 1
replace var1="anything#aaabbbccc.dddeee" in 2
replace var1="anything#aaabbbccc." in 3
generate var2=regexs(1) if regexm(var1,"#(.*)\.")
list
But in Stata (v.13.1), I can't seem to be able to use the non-greedy character #(.*?)\.
. Thus, above code gives this:
+--------------------------------------------------+
| var1 var2 |
|--------------------------------------------------|
| anything#aaabbbccc.dddeee.fff aaabbbccc.dddeee |
| anything#aaabbbccc.dddeee aaabbbccc |
| anything#aaabbbccc. aaabbbccc |
+--------------------------------------------------+
But what I want is this:
+--------------------------------------------------+
| var1 var2 |
|--------------------------------------------------|
| anything#aaabbbccc.dddeee.fff aaabbbccc |
| anything#aaabbbccc.dddeee aaabbbccc |
| anything#aaabbbccc. aaabbbccc |
+--------------------------------------------------+