I have a FASTA file and it is a huge file I want to take those sequences which has Homo sapiens. There are methods like dictionary and list where we can use to get the results. But because of the huge size we cannot use memory. We have to write the results to file. My sample FASTA file is as follows
gi|489223532|ref|WP_003131952.1| 30S ribosomal protein S18 [Lactococcus lactis] MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDLTRYYDG
gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 [Homo sapiens] MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEYKEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQKVQKLLNPDQ
gi|66818355|ref|XP_642837.1| hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4] MKTKSSNNIKKIYYISSILVGIYLCWQIIIQIIFLMDNSIAILEAIGMVVFISVYSLAVAINGWILVGRMKKSSKKAQYEDFYKKMILKSKILLSTIIIVIIVVVVQDIVINFILPQNPQPYVYMIISNFIVGIADSFQMIMVIFVMGELSFKNYFKFKRIEKQKNHIVIGGSSLNSLPVSLPTVKSNESNESNTISINSENNNSKVSTDDTINNVM
gi|446106212|ref|WP_000184067.1| MULTISPECIES: antibiotic transporter [Homo sapiens] MTNPFENDNYTYKVLKNEEGQYSLWPAFLDVPIGWNVVHKEASRNDCLQYVENNWEDLNPKSNQVGKKILVGKR
gi|494110381|ref|WP_007051162.1| MULTISPECIES: argininosuccinate lyase [Bifidobacterium] MTENNEHLALWGGRFTSGPSPELARLSKSTQFDWRLADDDIAGSRAHARALGRAGLLTADELQRMEDALDTLQRHVDDGSFAPIEDDEDEATALERGLIDIAGDELGGKLRAGRSRNDQIACLIRMWLRRHSRVIAGLLLDLVNALIEQSEKAGRTVMPGRTHMQHAQPVLLAHQLMAHAWPLIRDVQRLIDWDKRINASPYGSGALAGNTLGLDPEAVARELGFIDGAD
Expected output
gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 [Homo sapiens] MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEYKEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQKVQKLLNPDQ
gi|446106212|ref|WP_000184067.1| MULTISPECIES: antibiotic transporter [Homo sapiens] MTNPFENDNYTYKVLKNEEGQYSLWPAFLDVPIGWNVVHKEASRNDCLQYVENNWEDLNPKSNQVGKKILVGKR