-1

I have a tab-delimited file that I want to print the first thee columns of.

I would prefer to keep my way of doing this as simple and reproducible as possible:

awk -F" " '{print $1,"\t" ,$2, "\t", $3}' old.bed > new.bed

But when I try further analysis on the new file I get an error saying that the file is of an unexpected format...

I check the contents of the file with:

cat -A new.bed | more

chr1     3000870     3000918$
chr1     3000870     3000918$
chr1     3000872     3000920$
chr1     3000872     3000920$
chr1     3000872     3000920$

It looks normal....

what is going wrong and how can I avoid it?

which_command
  • 501
  • 1
  • 4
  • 15

2 Answers2

4

The $ are not in the file. It's the -A flag of cat that adds them in the display. Relevant parts from man cat:

   -A, --show-all
          equivalent to -vET

   -E, --show-ends
          display $ at end of each line

Simply drop the -A flag, the $ won't be displayed anymore.

In addition, I'm not sure the awk command does 100% what you intended. The output is not actually col1 tab col2 tab col3, but col1 space tab space col2 space tab space col3. It's because every , in the print command is replaced with a field separator, and on top of that, you're also adding tabs. Here's a simple way to make the columns tab separated:

awk -v OFS='\t' '{print $1, $2, $3}'
janos
  • 120,954
  • 29
  • 226
  • 236
1

Since your input is tab delimited, you can use cut as a simple and reproducible method:

cut -f 1-3 old.bed

As for your $ question, janos has that fully covered in his answer.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116