What is fast and succinct way to remove dupes from within a line?
I have a file in the following format:
alpha • a | b | c | a | b | c | d
beta • h | i | i | h | i | j | k
gamma • m | n | o
delta • p | p | q | r | s | q
So there's a headword in column 1, and then various words delimited by pipes, with an unpredictable amount of duplication. The desired output has the dupes removed, as:
alpha • a | b | c | d
beta • h | i | j | k
gamma • m | n | o
delta • p | q | r | s
My input file is a few thousand lines. The Greek names above correspond to category names (e.g., "baseball"); and the alphabet corresponds English dictionary words (which might contain spaces or accents), e.g. "ball game | batter | catcher | catcher | designated hitter".
This could be programmed many ways, but I suspect there's a smart way to do it. I encounter variations of this scenario a lot, and wonder if there's a concise and elegant way to do this. I am using MacOS, so a few fancy unix options are not available.
Bonus complexity, I often have a comment at the end which should be retained, e.g.,
zeta • x | y | x | z | z ; comment here
P.S. this input is actually the output of a prior StackOverflow question: Command line to match lines with matching first field (sed, awk, etc.)