2

i would like to cut off the first 9 characters of each 4th line. I could use cut -c 9, but i don't know how to select only every 4th line, without loosing the remaining lines.

Input:

@V300059044L3C001R0010004402
AAGTAGATATCATGGAGCCG
+
FFFGFGGFGFGFFGFFGFFGGGGGFFFGG
@V300059044L3C001R0010009240
AAAGGGAGGGAGAATAATGG
+
GFFGFEGFGFGEFDFGGEFFGGEDEGEGF

Output:

@V300059044L3C001R0010004402
AAGTAGATATCATGGAGCCG
+
FGFFGFFGFFGGGGGFFFGG
@V300059044L3C001R0010009240
AAAGGGAGGGAGAATAATGG
+
FGEFDFGGEFFGGEDEGEGF
gnikixam
  • 69
  • 6

2 Answers2

4

Could you please try following, written and tested with shown samples in GNU awk.

awk 'FNR%4==0{print substr($0,10);next} 1' Input_file

OR as per @tripleee's suggestion(in comments) try:

awk '!(FNR%4) { $0 = substr($0, 10) }1' Input_file

Explanation: Adding detailed explanation for above.

awk '                   ##Starting awk program from here.
FNR%4==0{               ##Checking condition if this line number is fully divided by 4(every 4th line).
  print substr($0,10)   ##Printing line from 10th character here.
  next                  ##next will skip all further statements from here.
}
1                       ##1 will print current Line.
' Input_file            ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    Maybe even refactor down to `awk '!(FNR%4) { $0 = substr($0, 10) }1'` – tripleee Dec 17 '20 at 13:27
  • Worked perfectly to resolve the first issue, thanks!!! – gnikixam Dec 17 '20 at 13:28
  • @gnikixam, I think your cut off 9 characters on every 4th line + performance issue both should be addressed by this one IMHO. – RavinderSingh13 Dec 17 '20 at 13:30
  • Yeah that's right. But second aim is, to remove additionally XY characters at the end of this line. For example: line 4 the last 3 characters, line 8 the last 5 characters and so on. This is very time consuming – gnikixam Dec 17 '20 at 13:44
  • @gnikixam, I really thought both are same requirement only :) Do lines where you want to remove characters at last have any specific sequence or logic in their line number? Kindly do let me know. – RavinderSingh13 Dec 17 '20 at 13:51
  • I use `sed 's/AAAAAAAAA.*//'`to remove A-tails (analog for Gs, Ts, and Cs). Then i would compare the untrimmed seq and the trimmed to count the difference and would like to trim this amount of characters at the end of the QS row (4th). – gnikixam Dec 17 '20 at 14:05
  • @gnikixam, IMHO this looks 2 questions(kind of different questions) joined into one. IMHO I would like to request you to please open a new question for this one with clear details and you could keep only 1st question here, so that no one gets confuse. Honestly it was not clear(your 2nd question) so its better not to confuse current/future readers and open a new question(with your already done efforts, already shown here) for better understanding, cheers, thank you – RavinderSingh13 Dec 17 '20 at 14:10
2

GNU sed can choose every 4th line with 4~4, e.g.:

sed -E '4~4s/.{9}//'
Thor
  • 45,082
  • 11
  • 119
  • 130
  • Thanks for your reply! The command removed all characters except 9, but it's ok. RavinderSingh13 command worked very well! – gnikixam Dec 17 '20 at 13:32
  • 2
    should be `sed -E '4~4s/.{9}//'` to delete first 9 characters – Sundeep Dec 17 '20 at 13:41
  • @gnikixam: That is what you asked for ... I suggest reading and creating an [MCVE](https://stackoverflow.com/help/minimal-reproducible-example) – Thor Dec 18 '20 at 09:38
  • @Sundeep: indeed, I had it the wrong way around – Thor Dec 22 '20 at 23:57