1

I am working on the anonymisation of several fields in a semicolon-separated text file.

For now I have the following command:

perl -aF'(;)' -ne "s/^.{$length}/$x_string/ for @F[2*$index]; print @F" file

Where $index corresponds to the index of the string I want to substitute relatively to the semicolon split, $length is the size of the string to substitute and $x_string is a simple string of X's.

For an $index equal to 1, $size equal to 3 and $x_string equal to XXX, if file has the following content:

azerty;012;test;20181201;;wxc;
ytreza;345;demo;20160214;;nbv;

Then the perl command returns this:

azerty;XXX;test;20181201;;wxc;
ytreza;XXX;demo;20160214;;nbv;

My problem is that I want to skip and not to anonymise a potential header line. I know how to do it without the for statement - using unless $. == 1 for example - but I don't know how to manage it when combined with the -F option.

Note that I will always have an array of size 1 because of my configuration file's structure pairing the index and length variables.

I am a total newbie with perl so I am asking you for some help with this issue!

melpomene
  • 84,125
  • 8
  • 85
  • 148
Julien Camus
  • 43
  • 1
  • 4
  • The part after `-e` is just Perl code, so you can use `"if( $. > 1 ) { ... s/// }; print @F"` . But if you're always printing, use `-p` instead of `-n` ? – Corion Dec 06 '18 at 14:45
  • But unless you're on Windows, mixing double quotes and dollar signs is likely bad, as your shell will then interpolate what it thinks are shell variables while you think of them as Perl variables?! Otherwise, use `\$.` instead of `$.` so that `$.` gets passed through to Perl. Also consider converting your oneliner to a script instead. – Corion Dec 06 '18 at 14:46
  • 1
    Thank you for your solution! Actually I am doing in-place edition so the following command works well: `perl -aF'(;)' -ne 'if( $. > 1 ) { s/^.{3}/XXX/ for @F[2] }; print @F' -i file`. But I don't get how to use the `-p` option instead of the `print @F` command. Do you have any tip? Thanks! – Julien Camus Dec 06 '18 at 15:44
  • Oh - no, `-p` wouldn't help much then. You'd need `$_ = @F` so that `-p` works, so your approach is good enough using `-n`. – Corion Dec 06 '18 at 15:55

2 Answers2

1

Just condition the change (regex) itself on $. and otherwise do the same (print)

perl -aF'(;)' -ne'$F[2*$index] =~ s/^.{$length}/$x_string/ unless $.==1; print @F' file

There is no need for the for loop since you specify precisely one @F element to change. (Also, with -w you'd be seeing a warning for writing a scalar as array/list, using @ sigil.)


One other way is to change $_ directly using regex and use -p. Since -p always prints $_, even if lines aren't processed, now you can simply skip the first line

perl -pe'next if $.==1; s/(?:.*?;){$index}\K.{$length}/$x_string/' file

The regex matches $index sequences ending with ;, grouped without capture due to ?:, and then \K assertion drops all that so the substitution happens only for what is matched next. So this regex changes the $length characters following the $index-th semicolon.

zdim
  • 64,580
  • 5
  • 52
  • 81
0

-n wraps

LINE: while (<>) {
    ... # your program goes here
}

around your script so you can add next LINE if $. == 1; to your one liner to skip the header.

perl -aF'(;)' -ne "if (1 .. 1) { print; next LINE } s/^.{$length}/$x_string/ for @F[2*$index]; print @F" file

This uses the flip flop operator .. to count the first line and the block in the if statement then prints it.

JGNI
  • 3,933
  • 11
  • 21
  • This solution does not print the header line at all. What I want is not to substitute this line but print it anyway. Do you have another idea? Thanks! – Julien Camus Dec 06 '18 at 15:53