3

I want to read an input file line by line, but this file has unknown ending character.

Editor vim does not know it either, it represents this character as ^A and immediately starts with characters from new line. The same is for perl. It tried to load all lines in once, because it ignores these strange end of line character.

How can I set this character as end of line for perl? I don't want to use any special module for it (because of our strict system), I just want to define the character (maybe in hex code) of end of line.

The another option is to convert the file to another one, with good end of line character (replace them). Can I make it in some easy way (something like sed on input file)? But everything need to be done in perl.

It is possible?

Now, my reading part looks like:

open (IN, $in_file);
$event=<IN>;   # read one line
MrWhite
  • 43,179
  • 8
  • 60
  • 84
srnka
  • 1,275
  • 2
  • 12
  • 16

1 Answers1

4

The ^A character you mention is the "start of heading" character. You can set the special Perl variable $/ to this character. Although, if you want your code to be readable and editable by the guy who comes after you (and uses another editor), I would do something like this:

use English;

local $INPUT_RECORD_SEPARATOR = "\cA" # 'start of heading' character

while (<>)
{
    chomp; # remove the unwanted 'start of heading' character
    print $_ . "\n";
}

From Perldoc:

$INPUT_RECORD_SEPARATOR
$/

The input record separator, newline by default. This influences Perl's idea of what a "line" is.

More on special character escaping on PerlMonks.

Oh and if you want, you can enter the "start of heading" character in VI, in insert mode, by pressing CTRL+V, then CTRL+A.

edit: added local per Drt's suggestion

Frank Kusters
  • 2,544
  • 2
  • 21
  • 30
  • Thanks! It works very well :) Thanks for explanation of ^A character, I needed it. – srnka Feb 08 '13 at 12:41
  • while using `$/` do use it like `local $/ = '^A'`. Because it's a built-in special variable and `local` will change it for that block only and will not affect other part of script. –  Feb 08 '13 at 12:44
  • `^A` represents the "start of heading" character (`"\cA"` aka `chr(0x01)`). A line feed (`"\cJ"` aka `chr(0x0A)`) would be represented using `^J` in that notation. – ikegami Feb 08 '13 at 16:46
  • Also, `"\n"` is not platform specific. `"\n"` results in a newline (`chr(0x0A)`) on all existing platforms. – ikegami Feb 08 '13 at 16:48
  • Strange, when I tested the code above (I ran it through `hd` to check the output), I'm sure my system said `0a` for `^A`. That was on Ubuntu Hardy. Now I'm testing on Ubuntu Precise, and it says `01`, as you are stating. Although that's a really strange line separation character. By the way, in Perl, `"\n"` *is* platform specific ([source](http://perldoc.perl.org/perlport.html#Newlines)). But since it's no longer relevant, I'll leave the post as it is. – Frank Kusters Feb 08 '13 at 19:03