A regular expression is always useful to know but here the OP may not always need it. In this particular case, a combination of the functions strpos()
and substr()
will mostly do the trick.
For example:
. clear
input str50 adr
"1000 Currie AV Apt: Minneapolis MN 55403"
"1843 Polk ST NE Apt: b"
"1801 3 AV S Apt: 203 Minneapolis MN 55404"
"2900 Thomas AV S Apt: 1618 MPLS MN 55416"
"8409 Elliott AV S Apt: Bloomington MN 55420"
end
. generate adr2 = substr(adr, 1, strpos(adr, ":") - 5) + ///
substr(adr, strpos(adr, ":") + 1, .)
. list
+--------------------------------------------------------------------------------------+
| adr adr2 |
|--------------------------------------------------------------------------------------|
1. | 1000 Currie AV Apt: Minneapolis MN 55403 1000 Currie AV Minneapolis MN 55403 |
2. | 1843 Polk ST NE Apt: b 1843 Polk ST NE b |
3. | 1801 3 AV S Apt: 203 Minneapolis MN 55404 1801 3 AV S 203 Minneapolis MN 55404 |
4. | 2900 Thomas AV S Apt: 1618 MPLS MN 55416 2900 Thomas AV S 1618 MPLS MN 55416 |
5. | 8409 Elliott AV S Apt: Bloomington MN 55420 8409 Elliott AV S Bloomington MN 55420 |
+--------------------------------------------------------------------------------------+
The idea is to use the :
as a reference point in order to eliminate the sub-string Apt:
from each address, since its length is always constant.
EDIT:
@Nick Cox provides a similar but even more succinct solution:
generate adr3 = subinstr(adr, "Apt: ", "", .)
This simply replaces all instances of Apt:
with ""
.