-2

I have a file contains this (actually it has much more lines it is only a part):

@SRR12345678.1
GAGCCATATGACCACGCCGGAGAATCTCGCCAAGCAGGCAAAGCTGATGGAAGGCTACGGTGCGCCCTGTTTTTAT
+SRR12345678.1
-@CCCGGGGFFGGCFGGGEEDFDFFGDFCEE,:@FDC8FE8,@FC8FFC,EFDFGE@FA,C9CE99F@7B7+CCE,CF,,6C,,C,+8++8++
@SRR12345678.5
CTTTATGCCCCCACAGTGCGATCAGGAAGTACATCGGCACCAGCATCATTTCCCAGAAGAAGAAGAACATGAACAT
+SRR12345678.5
CCCCCGFGGGGGDGDCFCFEDFEEDC?CDE9FAFGECF>FF8,C,FE8CEEFFF,,,,,,,,,,,,,C,,,,,:,:
@SRR12345678.6
GTCGATGGCCTGAACTACTCACGCTTCGAGAAGCAGATGCCTGCGCTGGCAGGTTTTGCTGAGCAAAATATTTCGT
+SRR12345678.6
-ACCCGFFGGGFCFGGGGGGGGCFGEGD8C878FAFGGCEFFEF7CFC7@,A+CEFD,CF,,,:,,,,:,

And I have a code to add this to dictionary:

file = open("test.fastq")

d={}
for i in file:
    d_key, *d_value = i.split()
    d[d_key] = d_value

Can I somehow write the cycle in one line in a comprehension view (to have one-line code)? I need to use dictionary only because the file is really big.

  • 1
    What is the expected output? (Here you will just create a dictionary with the full strings as keys and empty lists as value). Are you aware that there are specialized tools to handle fastq files? – mozway Sep 05 '22 at 13:37
  • Don't strive for one-line code just "because of it". A future you (or a colleague) will be quite annoyed trying to decipher it. – AKX Sep 05 '22 at 13:38
  • @mozway, sure. This is a learning ex. I`m trying to handle out a big file with python. The output - is a dictionary. – Breathe of fate Sep 05 '22 at 13:39
  • @AKX, it was just a question. Yes or now. – Breathe of fate Sep 05 '22 at 13:40
  • 1
    Yes you can, but I believe the output is a bit weird. Do you really expect something like`{'@SRR12345678.1': [], 'GAGCCATATGACCACGCCGGAGAATCTCGCCAAGCAGGCAAAGCTGATGGAAGGCTACGGTGCGCCCTGTTTTTAT': [], '+SRR12345678.1': [], ...}` as output? Doesn't really make much sense… – mozway Sep 05 '22 at 13:42
  • @vaizki, there is a syntax error somewhere... – Breathe of fate Sep 05 '22 at 13:43
  • @vaizki this is not valid code, you can't assign in a comprehension like you did – mozway Sep 05 '22 at 13:43
  • @mozway, I need to have in dictionary every 2-nd line – Breathe of fate Sep 05 '22 at 13:44
  • 1
    @Breatheoffate your current code doesn't do any "every second line" stuff either. – AKX Sep 05 '22 at 13:46
  • @AKX, I know this. This code just add every line to the dic with [""] values. It was just a try to add smth in a dict if the file is really big. – Breathe of fate Sep 05 '22 at 13:47
  • 1
    @Breatheoffate So you say "you have this code to add this to a dictionary", but it _doesn't even do what you want it to do_? – AKX Sep 05 '22 at 13:49

1 Answers1

1

If you want every second line as key/data, one way could be to take advantage of the file iterator:

with open("test.fastq") as f:
    d = {i.strip(): next(f).strip() for i in f}

NB. this requires an even number of lines!

To handle odd numbers of lines.

  • Setting up a default value:
with open("/tmp/test.fastq") as f:
    d = {i.strip(): next(f, '').strip() for i in f}
  • dropping the lone key:
with open("/tmp/test.fastq") as f:
    d = {i.strip(): s.strip() for i in f if (s:=next(f, None))}

output:

{'@SRR12345678.1': 'GAGCCATATGACCACGCCGGAGAATCTCGCCAAGCAGGCAAAGCTGATGGAAGGCTACGGTGCGCCCTGTTTTTAT',
 '+SRR12345678.1': '-@CCCGGGGFFGGCFGGGEEDFDFFGDFCEE,:@FDC8FE8,@FC8FFC,EFDFGE@FA,C9CE99F@7B7+CCE,CF,,6C,,C,+8++8++',
 '@SRR12345678.5': 'CTTTATGCCCCCACAGTGCGATCAGGAAGTACATCGGCACCAGCATCATTTCCCAGAAGAAGAAGAACATGAACAT',
 '+SRR12345678.5': 'CCCCCGFGGGGGDGDCFCFEDFEEDC?CDE9FAFGECF>FF8,C,FE8CEEFFF,,,,,,,,,,,,,C,,,,,:,:',
 '@SRR12345678.6': 'GTCGATGGCCTGAACTACTCACGCTTCGAGAAGCAGATGCCTGCGCTGGCAGGTTTTGCTGAGCAAAATATTTCGT',
 '+SRR12345678.6': '-ACCCGFFGGGFCFGGGGGGGGCFGEGD8C878FAFGGCEFFEF7CFC7@,A+CEFD,CF,,,:,,,,:,'}
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Yigh, using `next()` on the same iterable that's being iterated over by a dictcomp sounds like it will break sooner or later. – AKX Sep 05 '22 at 13:49
  • @AKX it will only break if the number of lines is odd. But sure, I would rather write a clean parser for real work (not what is being asked here though). – mozway Sep 05 '22 at 13:51
  • I added example of how to handle odd number of lines – mozway Sep 05 '22 at 13:57
  • @mozway, is it possible to have like this: 1:next(f)? (number of entry:a line) – Breathe of fate Sep 05 '22 at 14:00
  • Not sure what this means – mozway Sep 05 '22 at 14:02
  • @mozway `{1: GAGCCATATGACCACGCCGGAGAATCTCGCCAAGCAGGCAAAGCTGATGGAAGGCTACGGTGCGCCCTGTTTTTAT, 2: CTTTATGCCCCCACAGTGCGATCAGGAAGTACATCGGCACCAGCATCATTTCCCAGAAGAAGAAGAACATGAACAT, 3: GTCGATGGCCTGAACTACTCACGCTTCGAGAAGCAGATGCCTGCGCTGGCAGGTTTTGCTGAGCAAAATATTTCGT}` etc... – Breathe of fate Sep 05 '22 at 14:04
  • Yes, just read every 4th line. Using a dictionary with a range as key is however useless a simple list would be better. – mozway Sep 05 '22 at 14:10
  • @mozway, sure. I made a solution with list but is took really a long time if the file is really big (several thousands entries). How can I read every 4th line? – Breathe of fate Sep 05 '22 at 14:12
  • https://stackoverflow.com/questions/36487709/how-to-iterate-over-every-n-th-line-from-a-file – mozway Sep 05 '22 at 14:15