-1

Consider the following EDGAR 10-K SEC Company Filing https://www.sec.gov/Archives/edgar/data/912382/000136231009004179/0001362310-09-004179.txt

BUSINESS ADDRESS:   
    STREET 1:       107 N PENNSYLVANIA ST
    STREET 2:       STE 600
    CITY:           INDIANAPOLIS
    STATE:          IN
    ZIP:            46204
    BUSINESS PHONE:     3172619000

MAIL ADDRESS:   
    STREET 1:       107 N PENNSYLVANIA ST
    STREET 2:       STE 600
    CITY:           INDIANAPOLIS
    STATE:          IN
    ZIP:            46204

I need a regex in SAS to capture the fields STREET 1, STREET 2, CITY, STATE and ZIP under the Business Address, but not the Mailing Address. For example for STREET 1, I use STREET\s2\s*(.*) in SAS, but it ends up capturing the STREET 1 for Mailing address. Thanks!

1 Answers1

0

This regex should work.

BUSINESS ADDRESS:\s*STREET\s1:\s*(.*)\s*STREET\s2:\s*(.*)

You can continue the pattern to capture each section you need in a new parenthesis. Basically you're just making sure that you get the first answer after business address. The problem with the pattern you were using is that it was able to match the pattern in two separate locations, and the regex engine will only return the last match it finds. Therefore you have to put something in that specifies which one you want.

In SAS you can use the prxposn function with the second argument indicating the capture buffer (parenthesis) to retrieve. For example.

address1=prxposn(regex_pattern, 1, edgar10);

Best.

lapacheco
  • 26
  • 4