How can I properly stop and start metacharacter interpolation in regexp in Perl

Question

Editing to be more concise, pardon.

I need to be able to grep from an array using a string that may contain one of the following characters: '.', '+', '/', '-'. The string will be captured via from the user. The array contains each line of the file I'm searching through (I'm chomping the file into the array to avoid keeping it open while the user is interfacing with the program because it is on a cron and I do not want to have it open when the cron runs), and each line has a unique identifier within it which is the basis for the search string used in the regexp. The code below shows the grep statement I am using, and I use OUR and MY in my programs to make the variables I want access to in all namespaces available, and the ones I use only in subroutines not. If you do want to try and replicate the issue

#!/usr/bin/perl -w

use strict;
use Switch;
use Data::Dumper;

our $pgm_path = "/tmp/";
our $device_info = "";

our @new_filetype1 = ();
our @new_filetype2 = ();
our @dev_info = ();
our @pgm_files = ();

our %arch_rtgs = ();

our $file = "/path/file.csv";
open my $fh, '<', $file or die "Couldn't open $file!\n";
chomp(our @source_file = <$fh>);
close $fh;

print "Please enter the device name:\n";
chomp(our $dev = <STDIN>);

while ($device_info eq "") {
    # Grep the device info from the sms file
    my @sms_device = grep(/\Q$dev\E/, @source_file);
    if (scalar(@sms_device) > 1) {
        my $which_dup = find_the_duplicate(\@sms_device);
        if ($which_dup eq "program") {
            print "\n-> $sms_dev <- must be a program name instead of a device name." .
            "\nChoose the device from the list you are working on, specifically.\n";
            foreach my $fix(@sms_device) {
                my @fix_array = split(',', $fix);
                print "$fix_array[1]\n";
                undef @fix_array;
            }
            chomp($sms_dev = <STDIN>);
            } else { $device_info = $which_dup; }
        } elsif (scalar(@sms_device) == 1) { 
            ($device_info) = @sms_device;
            @sms_device = ();
        }
}

When I try the code with an anchor:

my @sms_device = grep(/\Q$dev\E^/, @source_file);

No more activity from the program is noticed. It just sits there like it's waiting on some more input from the user. This is not what I expected to happen. The reason I would like to anchor the search pattern is because there are many, many examples of similarly named devices that have the same character order as the search pattern, but also include additional characters that are ignored in the regexp evaluation. I don't want them to be ignored, in the sense that they are included in matches. I want to force an exact match of the string in the variable.

Thanks in advance for wading through my terribly inexperienced code and communication attempts at detailing my problem.

Why are you using `our`??? You should never use `our` except when required (e.g. for `@ISA` and `@EXPORT`). — ikegami, Dec 19 '18 at 17:45
It's not clear what you're asking. Does `$regexp` contain a regex pattern, or text to match literally? What problem are you having? You have failed to demonstrate your problem. See [mcve] — ikegami, Dec 19 '18 at 17:46
What do you mean by "it's not working"? Regexp metacharacters like `$` work after `\E`. But maybe your regular expression is not what you think, or you have whitespace at the end of `$regexp` and so your input will not match your regular expression. It's hard to say without seeing concrete data. Please [edit] your question and add hardcoded data to reproduce the problem. — Corion, Dec 19 '18 at 17:54
*"but not being the same thing fundamentally*" : Please clarify what you mean. — Håkon Hægland, Dec 19 '18 at 17:59
@ikegami I use our and my because the Perl books with camels and llamas on the told me to. The regexp may or may not contain any metacharacters and needs to be a literal match, but the issue arises when I try to anchor the regexp to not match any additional characters, like: "ABC" =~ /ABC/; # This is fine. "ABCD" =~ /ABC^/; # This is fine. "ABCD" =~ /ABC/; # I don't want to match this. That is the issue. When I try to anchor, the program "freezes" during the evaluation. — WetCheerios, Dec 19 '18 at 19:50
I seriously seriously doubt that. None of the `our` in your program should be `our`. That's very very wrong. They should ALL be `my`, You should reread what was actually said in the books — ikegami, Dec 19 '18 at 19:51
I use our for "global" variables, and my for "private" variables — WetCheerios, Dec 19 '18 at 19:54
Noone of those variables should be global. You should be using `my` for all of them. — ikegami, Dec 19 '18 at 19:54
I don't have the entire program listed here, just the section that is giving me the problem. I have subroutines that I want to access values of some variables in, but don't want to pass in references or the variable itself. I understand this may not be efficient, or elegant, but that is not my focus currently. I am still learning how to use Perl, and will eventually get there. Currently, deadlines and immediate needs drive my code development, not an earnest desire to generate textbook quality code — WetCheerios, Dec 19 '18 at 20:02
Backwards. Not following proper development practices causes development to take LONGER. That's the whole point! — ikegami, Dec 19 '18 at 20:04

ikegami · Answer 1 · 2018-12-19T20:30:42.080

1

The device id followed by the start of the string? /\Q$dev\E^/ makes no sense. You want the device id to be preceded by the start of the string and followed by the end of the string.

grep { /^\Q$dev\E\z/ }

Better yet, let's avoid spinning up the regex engine for nothing.

grep { $_ eq $dev }

For example,

$ perl -e'my $dev = "ccc"; CORE::say for grep { /^\Q$dev\E\z/ } qw( accc ccc ccce );'
ccc

$ perl -e'my $dev = "ccc"; CORE::say for grep { $_ eq $dev } qw( accc ccc ccce );'
ccc

edited Dec 19 '18 at 20:30

answered Dec 19 '18 at 19:58

ikegami

367,544
15
269
518

Yes sir. That is a typo. The code should have a '$' instead of a '^'. My bad. – WetCheerios Dec 19 '18 at 20:04
No, it shouldn't. It should be `\z`. And as you can see, it works. If you still think otherwise, please feel free to provide a demonstration of your problem. See [mcve]. – ikegami Dec 19 '18 at 20:05
I really appreciate you trying to help me, but I cannot get your method to work in the perl script, or from the command line [kthmgr@q4098 bin]$ perl -e 'open $fh, "); $regex = "find+th/s-str*ng"; CORE::say $_ for grep { $_ eq $regex } @devs;' CORE::say is not a keyword at -e line 1. kthmgr@q4098 bin]$ perl -e 'open $fh, "); $regex = "find+th/s-str*ng"; print "$_\n" for grep { $_ eq $regex } @devs;' [kthmgr@q4098 bin]$ This could be due to my version of perl being only 5.8, not sure – WetCheerios Dec 19 '18 at 20:26
Re "*CORE::say is not a keyword*", You're using an old version of Perl. Use `print "$_\n"` instead of `CORE::say` – ikegami Dec 19 '18 at 20:30
Ok, using either method, I am not getting a match from the program: Please enter the program name: Match+this No matches found – WetCheerios Dec 19 '18 at 20:31
Agian, you have no demonstrated your problem. Why do you insist on hiding things? – ikegami Dec 19 '18 at 20:31
I tried it both ways, if you find the second attempt in the bundle of command lines I put up. I though indenting with 4 spaces would convert the comment to code like this – WetCheerios Dec 19 '18 at 20:32
The file. What's in the file? – ikegami Dec 19 '18 at 20:32
What more do you need from me? The command doesn't work for me – WetCheerios Dec 19 '18 at 20:33
Re "*What more do you need from me?*", The file. What's in the file? – ikegami Dec 19 '18 at 20:33
Re "*The command doesn't work for me*", Unless there's a problem opening the file, it's working. There are simply no matches in the file. – ikegami Dec 19 '18 at 20:33
Re "*I though indenting with 4 spaces would convert the comment to code like this*", To fix the question, edit the question. – ikegami Dec 19 '18 at 20:42
@WetCheerios What we need to better help you is the value of `$dev` in your program, and a representative line or trhree of the file `/path/file.csv`. The code itself works, so something must be wrong/misunderstood with your data. As we don't see your data, please tell us your data and `$dev`. For example, `print "[[$dev]]\n"` outputs what is in `$dev`. – Corion Dec 20 '18 at 12:56

score 0 · Answer 2 · answered Dec 19 '18 at 18:47

I would use quotemeta. Here is an example of how it compares:

my $regexp = '\t';
my $metaxp = quotemeta ($regexp);

while (<DATA>) {
  print "match \$regexp - $_" if /$regexp/;
  print "match \$metaxp - $_" if /$metaxp/;
}

__DATA__
This \t is not a tab
This    is a tab

(there is literally a tab in the second line)

The meta version will match line 1, as it turned "\t" into essentially "\t," and the non-meta (original) version will match line 2, which assumes you are looking for a tab.

match $metaxp - This \t is not a tab
match $regexp - This    is a tab

Hopefully you get my meaning.

I think adding $regexp = quotemeta ($regexp) (or doing it when you capture the standard input) should meet your need.

Aah, interesting... admittedly, I did not know that about \Q\E. Alas, if he uses \Q, then won't that just literally look for the string "$dev" every time? Obviously not, based on your answer... — Hambone, Dec 20 '18 at 10:51
`\Q` still allows interpolation and other string literal operations (e.g. `\L`). It even allows `\ ` to work normally in double-quoted strings literals — ikegami, Dec 20 '18 at 17:10

How can I properly stop and start metacharacter interpolation in regexp in Perl

2 Answers2