I have a large dataframe containing bank characteristica from which i want so filter certain banks based on their names.
Assume df
is my large dataframe, containing inter alia the bank names:
df <- data.frame(bank = c("BNP PARIBAS", "DEUTSCHE BANK AG", "HSBC HOLDINGS PLC",
"BANCO SANTANDER SA", "BOCE GROUP", "BARCLAYS BANK PLC",
"HSBC INVEST PLC"))
> df
bank
1 BNP PARIBAS
2 DEUTSCHE BANK AG
3 HSBC HOLDINGS PLC
4 BANCO SANTANDER SA
5 BOCE GROUP
6 BARCLAYS BANK PLC
7 HSBC INVEST PLC
A second dataframe names
contains the names of all banks that i want to filter from df
:
names <- data.frame(name = c("HSBC", "BANK OF AMERICA", "DEUTSCHE BANK", "BANK OF CANADA",
"COMMERZBANK", "SANTANDER"))
> names
name
1 HSBC
2 BANK OF AMERICA
3 DEUTSCHE BANK
4 BANK OF CANADA
5 COMMERZBANK
6 SANTANDER
For each element in names
i want to know the index of the rows in df
that match for example the first 5 letters of the name. I imagined something like this
library(dplyr)
names %>%
mutate(index = grep(substr(name, 1, 5), df$bank))
which results in
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 6 or 1
In addition: Warning message:
In grep(substr(c(5L, 1L, 4L, 2L, 3L, 6L), 1, 5), df$bank) :
argument 'pattern' has length > 1 and only the first element will be used
I can use grep
for single elements, giving me the desired results, for the first row of name
e.g.:
> grep(substr(names$name[1], 1, 5), df$bank)
[1] 3 7
How can get a solution that works for the whole vector name
? Notice that some banks in bank
have multiple entries (for different companies belonging to the same group), which i want to include all.