0

I have a long text file.

Now I will remove duplicates from the file. The problem is that the search parameter is the first word in the list, split by ":"

For example:

The file lines:

11234567:229283:29833204:2394803
11234567:4577546765:655776:564456456
43523:455543:54335434:53445
11234567:43455:544354:5443

Now I will have this here:

11234567:229283:29833204:2394803
43523:455543:54335434:53445

I need to get the first line from the duplicates, other will be ignored.

I tried this:

Set<String> lines11;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
    lines11 = new HashSet<>(10000); // maybe should be bigger
    String line11;
    while ((line11 = reader11.readLine()) != null) {
        lines11.add(line11);
    }
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
    for (String unique : lines11) {
        writer11.write(unique);
        writer11.newLine();
    }
}

That is working, but it removes only when the complete line is duplicated.

How can I change it so that it looks for the first word in every line and checks for duplicates here; when no duplicate is found, save the complete line; if duplicate then ignore the line?

Panther
  • 3,312
  • 9
  • 27
  • 50
Patrik
  • 23
  • 6

3 Answers3

0

You need to maintain a Set<String> that holds only the first word of each line.

List<String> lines11;
Set<String> dups;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
    lines11 = new ArrayList<>();
    dups = new HashSet<>();
    String line11;
    while ((line11 = reader11.readLine()) != null) {
        String first = line11.split(":")[0]; // assuming your separator is :
        if (!dups.contains(first)) {
            lines11.add(line11);
            dups.add(first);
        }
    }
}
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
    for (String unique : lines11) {
        writer11.write(unique);
        writer11.newLine();
    }
}
Eran
  • 387,369
  • 54
  • 702
  • 768
0

i will write the section about adding to list use HashMap

    String tmp[] = null;
    HashMap<String, String> lines = new HashMap<String, String>();
    String line11 = "";

    while ((line11 = reader11.readLine()) != null) {
        tmp = line11.split(":");
        if(!lines.containsKey(tmp[0])){
            lines.put(tmp[0], line11);
        }
    }

so the loop will add only uinuque lines , using first word as key

Yazan
  • 6,074
  • 1
  • 19
  • 33
0
    You can add the data in list and take one more set in which you will add first word in that set and try add every time first of new line if it is in set, then it will not be added and return false. On that basis you can add data in list or directly in you new bufferreader.


List<String> lines11;
     Set<String> uniqueRecords;
                try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
                    lines11 = new ArrayList<>(); // no need to give size it will increase dynamically
    uniqueRecords = new HashSet<>();
                    String line11;
                    while ((line11 = reader11.readLine()) != null) {
                           String firstWord = line11.substring(0, firstWord.firstIndexOf(" "));
                           if(uniqueRecords.add(firstWord )){
                               lines11.add(line11);
                                  }



                    }
                } // maybe should be bigger
                try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
                    for (String unique : lines11) {
                        writer11.write(unique);
                        writer11.newLine();

                    }
                }
Panther
  • 3,312
  • 9
  • 27
  • 50