0

I want to extract readname list one of which is uniquely mapped and the other is multi-mapped.

However, after some try&error, I found

the 1st raw of

samtools view -@ 6 XXX.sorted.bam | if grep -q XS:i: ; then awk '{print $0}' ;else:;fi | head

is always broken like this:

1       28543   1       25M     =       29133   615     CTTCAATGCATTCCACAAATACCTA                                                                                                        GGAGGIGGGIGGGGIIGGIGIGIGG        AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0 NM:i:0                                                                                                    MD:Z:25  YS:i:0  YT:Z:CP

or like that:

AS:i:-6 XS:i:-6 YS:i:0  YT:Z:CP

(i.e. a partial row of sam files)

I doubt the bam file itself is broken, then I do

samtools view -@ 6 XXX.sorted.bam| grep CTTCAATGCATTCCACAAATACCTA

but it returns normally

SRR11074254.925964      99      chr1    28543   1       25M     = 28926 408    CTTCAATGCATTCCACAAATACCTA                                                                                                                                                                                                                                                                                                                              GAGGGIIIIIIIIIIIIIIIIIIII        AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0 NM:i:0

Is there any reason to insert 'end of sentence' or some other staff?

Thank you in advance,

[Modified question]

Thank you for kind orientation and explanations.

Ok, I'll re-explain the situation. Since the original file are heavy and I couldn't grasp whole process many tools did, so let me just explain.

The "samtools view" command takes .bam or .sam file and returns more than 11 fields with \t as a delimiter.

I'd like to extract rows with "XS:i:", so both "grep XS:i:" and awk '/XS:i:/' are fine.

Although it is expected each complete row is to output, a row with complete fields set has emerged.

From another trial I can understand the row isn't broken, I'm wondering it is because of samtools or grep and/or awk.

Is it possible for too long row to be extracted partially by grep or awk '/?' ? Thanks to you all, I understand grep -q isn't appropriate for my purpose.

kofuji
  • 1
  • 1
  • 2
    Welcome to SO. Please check [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Digvijay S Jul 18 '21 at 13:42
  • 1
    `if grep -q XS:i: ; then awk '{print $0}' ;else:;fi` won't produce any output, it'll just hang as you aren't running the awk script on anything, `grep` is consuming all of the output from the preceding pipe. It's not at all clear what you wanted that snippet to do - you could just run `grep 'XS:i:'` without -q and without awk or you could just run `awk '/XS:i:/'` without grep if all you want is lines that contain `XS:i:` so I assume you wanted something more than that to happen but idk what. – Ed Morton Jul 18 '21 at 14:03
  • @rowboat could you give an example of how that could happen, e.g. using a `printf` or `seq` or something piped to such an `if` statement? I'd expect `grep -q` to close the pipe when it finds the first match and so nothing would be arriving at `awk` but I could be wrong. – Ed Morton Jul 18 '21 at 14:34
  • 1
    `seq 50000 | if grep -q 1; then awk '1;NR==2{exit}';fi` w/ GNU tools prints `4` `12775` it might vary with implementation though @EdMorton –  Jul 18 '21 at 14:41
  • @rowboat You are correct. Thanks for that example. – Ed Morton Jul 18 '21 at 14:45

0 Answers0