I hope everyone is having a blast I have come to face this challange:
I want to be able to extract one portion of a string in the folliing manner:
- The string may or may not have a dot or may have plenty of them
- I want to extract the string part that is before the first dot, if there is no dot then I want the whole string
- I want to use a regex to achieve this
test<-c("This_This-This.Not This",
"This_This-This.not_.this",
"This_This-This",
"this",
"this.Not This")
since I need to use a regex, I have been trying to use this expression:
str_match(test,"(^[a-zA-Z].+)[\\.\\b]?")[,2]
but what I get is:
> str_match(test,"(^[a-zA-Z].+)[\\.\\b]?")[,2]
[1] "This_This-This.Not This" "This_This-This.not_this"
[3] "This_This-This" "this"
[5] "this.Not This"
>
My desired output is:
"This_This-This"
"This_This-This"
"This_This-This"
"this"
"this"
This is my thought process behind the regex
str_match(test,"(^[a-zA-Z].+)[\\.\\b]?")[,2]
(^[a-zA-Z].+)= this to capture the group before the dot since the string starts always with a letter cpas or lowers case, and all other strings after that thats why the .+
[\.\b]?=a dot or a world boundary that may or may not be thats why the ?
Is not giving what I want and I will be so happy if yo guys can help me out to understand my miskte here thank you so much!!!