13

I am reading a file via the BufferedReader

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

I need to know if the lines are separated by '\n' or '\r\n' is there way I can find out ?

I don't want to open the FileInputStream so to scan it initially. Ideally I would like to ask the BufferedReader since it must know.

I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.

Thanks,

Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.

Jonas
  • 121,568
  • 97
  • 310
  • 388
chacko
  • 5,004
  • 9
  • 31
  • 39

9 Answers9

14

To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}
Antoine
  • 196
  • 1
  • 5
7

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}
arrdem
  • 2,365
  • 1
  • 16
  • 18
3

BufferedReader.readLine() does not provide any means of determining what the line break was. If you need to know, you'll need to read characters in yourself and find line breaks yourself.

You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in). LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. You could probably base something to do what you want on that. An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end.

ColinD
  • 108,630
  • 30
  • 201
  • 202
  • @gshauger: You could say that about a whole lot of problems, which doesn't mean it isn't _better_ to use one. In the case of `LineBuffer`, it's internal anyway so adding the whole library wouldn't help... he could just copy that file in though. – ColinD May 24 '11 at 17:27
  • 1
    I wouldn't say that about a lot of problems...only the ones that don't need an unnecessary dependency...which is what you're recommending. Plus this isn't the first time you've unnecessarily flogged the Guava library. – gshauger May 24 '11 at 19:45
  • @gshauger: When someone else has written code that will save you from having to write it yourself, sometimes it's useful to use that, particularly when you consider that little problems like this rarely exist in isolation. I happen to be very familiar with Guava and so I tend to suggest solutions using it when I believe they're easier or more appropriate than doing the extra work with just the JDK. Your apparent distaste for libraries doesn't affect the validity of my answers. (I was mainly suggesting that the OP might want to reference existing some code that can do what he wants.) – ColinD May 24 '11 at 19:57
  • @gshauger: I have a distaste for writing and maintaining large amounts of code that others have already written and tested and will maintain for you and the impact of _that_ on the "quality, scalability, deployability and usability of a properly engineered piece of software". I do agree that dependencies should be chosen carefully, but personally I believe that Guava has an extremely high power to weight ratio and that most Java projects can benefit from using it. In the end, though, it's up to the OP what they wish to do... I'm just providing an option they may not have been aware of. – ColinD May 24 '11 at 21:39
2

The answer would be You can't find out what was the line ending.

I am looking for what can cause line endings in the same funcion. After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'. Hardcoded, does not care about settings.

Raul
  • 106
  • 3
2

BufferedReader does not accept FileInputStreams

No, you cannot find out the line terminator character that was used in the file being read by BufferedReader. That information is lost while reading the file.

Unfornunately all answers below are incorrect.

Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.

d-live
  • 7,926
  • 3
  • 22
  • 16
1

If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

to get actual EOL string that was used in the file.

camickr
  • 321,443
  • 19
  • 166
  • 288
  • @EJP That would apply in any solution. – camickr May 26 '11 at 00:44
  • 1
    No it wouldn't. You can imagine an API where you read a line and then retrieve the terminator that was used for that line. – user207421 May 26 '11 at 06:14
  • @EJP, Based on the posters comments I thought he just wanted to know if the file was created on Windows ("\r\n") or Unix ("\n"), in which case he only cared about the first line separator. If he cares about every line, then yes every line would need to be parsed. – camickr May 26 '11 at 15:16
  • @camrickr agreed, but that applies to the *problem,* rather than to 'any solution'. – user207421 May 27 '11 at 01:17
  • @EJP, yes I meant any solution to this question, not any solution in general. – camickr May 27 '11 at 01:44
1

Maybe you could use Scanner instead.

You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

You could use this code below to convert BufferedReader to Scanner

 new Scanner(bufferedReader);
董诚怡
  • 41
  • 1
  • 3
0

Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.

In this case I use this code:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}
Leo Ufimtsev
  • 6,240
  • 5
  • 40
  • 48
-2

If you are using groovy, you can simply do:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'
Felix
  • 5,804
  • 4
  • 25
  • 37
  • Only thing , I can guess is , User was asking about `java` , looks like so by the `Tags` of question.. Not sure though. – eRaisedToX Apr 27 '17 at 08:41