I have a text file with thousands of lines of text consisting of lists of domain names followed by periods with different information after the domain (numbers, spaces, other information)
Some domains might have more than 1 line worth of information, with different numbers and information afterwards, such as domains 1 and 4 in this example
domain1.foo. 3600 ...
domain1.foo. 1800 ...
domain2.foo. 900 ...
domain3.foo. 60 ...
domain4.foo. 3600 ...
domain4.foo. 1200 ...
domain4.foo. 1200 ...
The duplicate listings would only be lines underneath each other (for example, lines involving domain4 could be lines 50, 51, 52, but never 50, 60, and 400).
So what I am trying to do is create use sed to delete any duplicate lines containing each domain name, regardless of what comes after - So the example would become
domain1.foo. 3600 ...
domain2.foo. 900 ...
domain3.foo. 60 ...
domain4.foo. 3600 ...
I only have a basic knowledge of regex and would appreciate some help as to how to go about this. I managed to get the list formatted so tabs and double spaces are removed, but I need a little help for this part.