I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.