2

The following perl program has a regex written to serve my purpose. But, this captures results present within a string too. How can I only get strings separated by spaces/newlines/tabs?

The test data I used is present below: http://sainikhil.me/stackoverflow/dictionaryWords.txt

use strict;
use warnings;

sub print_a_b {
    my $file = shift;

    $pattern = qr/(a|b|A|B)\S*(a|b|A|B)/;
    open my $fp, $file;

    my $cnt = 0;
    while(my $line = <$fp>) {
        if($line =~ $pattern) {
            print $line;
            $cnt = $cnt+1;
        }
    }
    print $cnt;
}

print_a_b @ARGV;
Sai Nikhil
  • 1,237
  • 2
  • 15
  • 39

2 Answers2

3

You could consider using an anchor like \b: word boundary

That would help apply the regexp only after and before a word.

 \b(a|b|A|B)\S*(a|b|A|B)\b

Simpler, as Avinash Raj adds in the comments:

(?i)\b[ab]\S*[ab]\b

(using the case insensitive flag or modifier)

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
1

If you have multiple words in the same line then you can use word boundaries in a regex like this:

(?i)\b[ab][a-z]*[ab]\b

Regular expression visualization

The pattern code is:

$pattern = /\b[ab][a-z]*[ab]\b/i;

However, if you want to check for lines with only has a word, then you can use:

(?i)$[ab][a-z]*[ab]$

Update: for your comment * lines that begin and end with the same character*, you can use this regex:

(?i)\b([a-z])[a-z]*\1\b

But if you want any character and not letters only like above you can use:

(?i)\b(.)[a-z]*\1\b
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123