0

I am making an application which involves reading text files (.txt) and one of the requirements is to keep a N (where N is specified by the user) number of longest lines in the file while maintaining the order in which they appear within the file.

So, for example if a file has 10 lines and N = 4, then the 4 longest lines in the file will be kept while all other lines will be removed. This is to be done while maintaining the order in which they appear within the file.

I'm using a LinkedHashMap to store the contents of the line as the Key and length of the line as the value.

LinkedHashMap<String, Integer> linesInFile = new LinkedHashMap<String, Integer>();
String line = "";
String newFileContent = "";
try {

       FileReader fileReader = new FileReader(file);
       BufferedReader br = new BufferedReader(fileReader);

       while ((line = br.readLine()) != null) {
           linesInFile.put(line, line.length());
       }
       br.close();

       List<Entry<String, Integer>> lineList = new ArrayList<Entry<String, Integer>>(linesInFile.entrySet());
       List<Entry<String, Integer>> lineOrder = new ArrayList<Entry<String, Integer>>(linesInFile.entrySet());
       Collections.sort(lineList, (e1, e2) -> (e2.getValue() - e1.getValue()));
       int i = 0;
       while (N > 0) {
            newFileContent += lineList.get(i).getKey() + System.lineSeparator();
            i++;
            N--;
       }
} catch (Exception e) {
            // do something
        }

The problem with the above is it doesn't preserve ordering of the lines in the file. How do I preserve the order in which the lines appear?

For example if I have the following file

Log: 123 abc
Error: 456123123 123 xyz
Log: 456 cde
Log: 1231 cde
Error: 123123 ab c
Error: 456123 123 xyz
Log: 123 cde
Error: 456 123 qrz
Error: 123 123 xyz
Log: 456 cde
Log: 456 cde

If I were to keep the 4 longest lines in the file, the file after changes would be

Error: 456123123 123 xyz
Error: 123123 ab c
Error: 456123 123 xyz
Error: 456 123 qrz
  • Instead of storing full lines `linesInFile.put(line, line.length());` you could just keep track of and save line numbers `linesInFile.put(lineNumber, line.length());`. Then you could read that file again and choose only these lines. You wouldn't waste memory to holding all that file in a map. – Krystian G Jul 26 '19 at 21:09

2 Answers2

1

If it is ok to recover the original order after processing the file:

  1. Create a class to hold the line (the string), and its line number, call it, say, Line.
  2. Create a PriorityQueue holding Lines where the Comparator compares two instances of Line and returns that the shortest line is least (see the interface for Comparator for what you have to implement there).
  3. Read your file one line at a time, putting each line into its own Line, and putting that into the priority queue. Every time the size of the priority queue exceeds N remove an item - it will be the least item (by definition of priority queue) and that will be the shortest line in the collection (by definition of your comparator). Therefore the priority queue will keep the longest N lines.
  4. When you're done reading the file you'll have (at most) N items in your priority queue. Sort them according to their original line number (kept of course, in the Line for this purpose), and there you are.

It's an absolutely ideal application for a priority queue - a nice data structure that has already been implemented for you. All you need to do is create the right Comparator for running it (described above) and another one for sorting on the original line number, and you're done.

davidbak
  • 5,775
  • 3
  • 34
  • 50
0

I would use something like an array list, iterate through and keep track of the current highest length. Add every line that has a length equal to the int that represents the current highest length. When you encounter a line that is longer than the current longest, empty your array list with the .clear() method, and proceed to add all the subsequent lines (including the current) with the new length. Posting some pseudocode below.

arraylist k = null;
int maxlength = 0;
for i in lines:
   if i.length > maxlength:
       maxlength = i.length;
       k.clear()
       k.add(i)
   if i.length == maxlength:
       k.add(i)

This should leave you with all the lines of the maximum length in an ordered list

Slenderbowman
  • 102
  • 1
  • 10