2

I have a data frame that has 3 variables that I would like to split into 4 variables. The data frame looks like this:

Species_Name    SIXTEENS_Title                           SIXTEENS_Sequence
Daphnia magna   LC382445.1 Daphnia magna mitochondrial   TTCGGAGAAAAGGGGTAC...
Daphnia magna   KY694374.1 Daphnia magna mitochondrial   TTCGGAGAAAAGGGGTAC...

From this data frame, under the SIXTEENS_Title, I want to extract everything before Daphnia (the alphanumerical LC382445.1, KY694374.1 ; but there are over 100 observations with different numbers).

I've tried using str_extract() and str_detect() but I can't seem to be able to do it. I want to use the accession numbers (LC382445.1) to create another column in R.

zx8754
  • 52,746
  • 12
  • 114
  • 209
temsandroses
  • 311
  • 1
  • 3
  • 11
  • Related, possible duplicate of https://stackoverflow.com/questions/33683862/first-entry-from-string-split – zx8754 Oct 22 '18 at 11:50

1 Answers1

3

You can do this with sub and a regular expression.

df = read.table(text="Species_Name    SIXTEENS_Title         SIXTEENS_Sequence
'Daphnia magna'   'LC382445.1 Daphnia magna mitochondrial'   'TTCGGAGAAAAGGGGTAC...'
'Daphnia magna'   'KY694374.1 Daphnia magna mitochondrial'   'TTCGGAGAAAAGGGGTAC...'",
header=T, stringsAsFactors=F)

sub("\\s*Daphnia.*", "", df$SIXTEENS_Title)
[1] "LC382445.1" "KY694374.1"
G5W
  • 36,531
  • 10
  • 47
  • 80