0

hi I am trying to replace strings in a file, test.txt with strings like these :

  <g
   id="g16526">
  <g

  <g
   id="gnnnnn">
  <g

and turn them into

  <g
   id="gg1">
  <g
   ...
  <g
   id="ggn">
  <g

using this perl script

    #!C:/Strawberry/perl
    open(FILE, "<test.txt") || die "File not found";
    my @lines = <FILE>;
    close(FILE);
    my $string = '<g
    id=';
    my $string2 = '<g
    <g'; 
    my $anything = ".*";

    my $replace = 'gg';
    my @newlines;
    my $counter = 1;

    foreach(@lines) {
      $_ =~ s/\Qstring$anything\Q$string2/$string$replace$string2$counter/g;
      $counter++;
      push(@newlines,$_);
    }

    open(FILE, ">test.txt") || die "File not found";
    print FILE @newlines;
    close(FILE);

but it doesnt work, any suggestions appreciated

zdim
  • 64,580
  • 5
  • 52
  • 81
user1420482
  • 147
  • 3
  • 12
  • What type of your text file? If it's XLM - you should use XML::Simple to get id value and replace it. – ilux Sep 16 '17 at 19:42
  • 1
    @ilux Re: "_you should use XML::Simple_". The `XML::Simple` has been deprecated and even its own docs recommend against its use, for many reasons. It had its place and was important a long time ago but it shouldn't be used nowadays. Standards are 'XML::LibXML` and `XML::Twig`. – zdim Sep 16 '17 at 19:46
  • Thank @ilux Re, XML::LibXML or XML::Twig are the solution. – user1420482 Sep 16 '17 at 20:00

1 Answers1

1

If this indeed has an XML-like structure as it appears, it should be processed using modules for that, either XML::LibXML or XML::Twig.

But this task as shown is easily done in an elementary way as well

perl -0777 -wpE'
    BEGIN { $cnt = 0 };
    s/<g\nid="g\K(.*?)"/q(g).(++$cnt).q(")/eg;
' input.txt

which expects the file-format to be exactly as shown. It reads the whole file into a string, by -0777, what isn't prettiest and may be unsuitable for very large files.

Another way is to set the record separator to <g, so every "line" is the block to process

perl -wpE'
    BEGIN { local $/ = "<g"; $cnt = 0 }; 
    s/id="g\K(.*?)"/q(g).++$cnt.q(")/eg; 
' input.txt

where now the regex is free to seek precisely id="..." and we can process line-by-line.

These both print the expected output. They are in one-liners for easier testing, I suggest transferring to a script.

zdim
  • 64,580
  • 5
  • 52
  • 81