2

I'm looking for the opposite of Sublime Text 2's Permute Lines -> Unique. I need to be able to display only the duplicate lines in the file (or, if possible, in two different files).

I found the HighlightDuplicates plugin, but I don't know how to then select the highlighted text to copy it to a new file.

PBwebD
  • 778
  • 11
  • 33

1 Answers1

2

You have several questions all put together and your definitions of what you want for each of them is not fully defined.

Given these issues let's start with the challenge of getting duplicate lines from one file into another file.

This simple little bit of Python should work for you.

     """ Write duplicate lines in one file to a text file. """

    fileToRead = 'read_file.txt'
    fileToWrite = 'write_file.txt'
    dupLineSet = set()

    with open(fileToRead, mode='r') as read_file:
        file_lines = read_file.readlines()
        file_lines_copy = file_lines
        for line in file_lines:
            matches = 0
            for line_copy in reversed(file_lines_copy):
                if line == line_copy:
                    file_lines_copy.remove(line_copy)
                    matches += 1
                if matches > 1:
                    if line.strip() != '':
                        dupLineSet.add(line)


    with open(fileToWrite, mode='w') as write_file:
        for line in dupLineSet:
            write_file.write(line)

    ##############################

NOTE:

  • You MUST replace read_file.txt with the name of the file you want to find duplicates in.

  • If you want to you can replace write_file.txt with the name of a file you want the duplicates written to.

Give it a run and see if you like the outcomes. Since you have not defined what a 'duplicate' means I have made some assumptions you may not like.

Anyway drop the above into ST, edit the bits you need to and use Tools -> Build to run the code.

Look at the output file and tell us how it is not like what you want.

Once we have an agreed output for a single file getting you a version that works on two files is the next big challenge.

jwpfox
  • 5,124
  • 11
  • 45
  • 42