1

I need to get the company name and its ticker symbol in different arrays. Here is my data which is stored in a txt file:

3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN

and so on

How would I do this using regex or some other techniques?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136

4 Answers4

1

Iterate over each line, and collect the data with a regular expression:

^(.+?)\s+([A-Z]+)$

The backreference $1 will contain the company name, $2 will contain the ticker symbol.

You can also split the string in two with a two or three-space delimiter and trim the resulting two strings. This only works if you are sure the company name and ticker symbol are always separated by enough spaces, and the company name itself doesn't contain that amount of spaces.

molf
  • 73,644
  • 13
  • 135
  • 118
1

Is the format of the text file imposed on you? If you have the choice, I'd suggest you don't use spaces to separate the fields in the text file. Instead, use | or $$ or something you can be assured won't appear in the content, then just split it to an array.

Polsonby
  • 22,825
  • 19
  • 59
  • 74
0

Try this regular expression:

(.+)\s*([A-Z]{3})$

Perhaps someone with more PHP experience could flesh out a code example using preg_split or something similar.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
0

With variable whitespaces as the delimiter between your two columns of text, there will be several ways to do this.

You could process the text file line-by-line with file() and use preg_split() to separate the text on variable spaces that are followed by a sequence of uppercase letters followed by the end of the string, or you could use file_get_contents() with preg_match_all() then extract the two captured columns with array_column(). While the latter may be a little faster since it only makes 1 preg_ function call, the decision is likely to come down to the developer's coding tastes and the complexity of the input text.

Code: (Demo)

//$lines = file('your_text_file.txt', FILE_IGNORE_NEW_LINES);
$lines = [
    '3M Company      MMM',
    '99 Cents Only Stores    NDN',
    'AO Smith Corporation    AOS',
    'Aaron\'s, Inc.   AAN',
];

foreach ($lines as $line) {
    [$names[], $symbols[]] = preg_split('~\s+(?=[A-Z]+$)~m', $line);
}
var_export($names);
echo "\n---\n";
var_export($symbols);

Or:

//$text = file_get_contents('your_text_file.txt');
$text = <<<TEXT
3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN
TEXT;

preg_match_all('~(.+?)\s+([A-Z]+)$~m', $text, $matches, PREG_SET_ORDER);
var_export(array_column($matches, 1));
echo "\n---\n";
var_export(array_column($matches, 2));
mickmackusa
  • 43,625
  • 12
  • 83
  • 136