Goodmorning everyone, I have a data.ped
file made up of thousands of columns and hundreds of lines. The first 6 columns and the first 4 lines of the file look like this:
186 A_Han-4.DG 0 0 1 1
187 A_Mbuti-5.DG 0 0 1 1
188 A_Karitiana-4.DG 0 0 1 1
191 A_French-4.DG 0 0 1 1
And I have a ids.txt
file that looks like this:
186 Ignore_Han(discovery).DG
187 Ignore_Mbuti(discovery).DG
188 Ignore_Karitiana(discovery).DG
189 Ignore_Yoruba(discovery).DG
190 Ignore_Sardinian(discovery).DG
191 Ignore_French(discovery).DG
192 Dinka.DG
193 Dai.DG
What I need is to replace (in unix) the value in the first column of the data.ped
file with the value in the second column of the ids.txt
that is in the same line of the value that is going to be replaced from the data.ped
file. For example, I want to replace the "186" value from the data.ped
first column with the "Ignore_Han(discovery).DG" value from the ids.txt
second column (and this because in the first column of the same line of this value there is "186") So the output.ped
file must look like this:
Ignore_Han(discovery).DG A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG A_Karitiana-4.DG 0 0 1 1
Ignore_French(discovery).DG A_French-4.DG 0 0 1 1
The values of the first column of the data.ped file are a subset of the values present in the first column of the ids.txt file. So there is always match.
Edit:
I've tried with this:
awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]; print}' ids.txt data.ped
but when I check the result with:
cut -f 1-6 -d " " output.ped
I get this strange output:
A_Han-4.DG 0 0 1 1y).DG
A_Mbuti-5.DG 0 0 1 1y).DG
A_Karitiana-4.DG 0 0 1 1y).DG
A_French-4.DG 0 0 1 1y).DG
while if I use this command:
cut -f 1-6 -d " " output.ped | less
I get this:
Ignore_Han(discovery).DG^M A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG^M A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG^M A_Karitiana-4.DG 0 0 1 1
Ignore_French(discovery).DG^M A_French-4.DG 0 0 1 1
and I can't figure out why there is that ^M in every line.