My string is "Escherichia coli str Nissle 1917" and i want to extract from a df all the rows containing a similar string in a specific column (column organism name), the result should be the following:
# assembly_accession bioproject biosample wgs_master refseq_category
1: GCF_000333215.1 PRJNA224116 SAMEA2272139 CAPM00000000.1 na
2: GCF_000714595.1 PRJNA224116 SAMN02794012 na
3: GCF_003546975.1 PRJNA224116 SAMN07451663 na
4: GCF_019967895.1 PRJNA224116 SAMN18749717 na
taxid species_taxid organism_name infraspecific_name isolate
1: 316435 562 Escherichia coli Nissle 1917 strain=Nissle 1917
2: 316435 562 Escherichia coli Nissle 1917 strain=Nissle 1917
3: 316435 562 Escherichia coli Nissle 1917 strain=Nissle 1917
4: 316435 562 Escherichia coli Nissle 1917 strain=Nissle 1917
i tried with agrep
but don't works because of "str" word.
is there a way to do a fuzzy match or something similar in order to extract these rows from my data frame given my input string?
Thanks a lot