I am looking for a very efficient solution for for loop in R
where data_papers is
data_papers<-c(1,3, 47276 77012 77012 79468....)
paper_author:
paper_id author_id
1 1 521630
2 1 972575
3 1 1528710
4 1 1611750
5 2 1682088
I need to find the authors which are present in paper_author for a given paper in data_papers.There are around 350,000 papers in data_papers to around 2,100,000 papers in paper_author.
So my output would be a list of author_id for paper_ids in data_paper
authors:
[[1]]
[1] 521630 972575 1528710 1611710
[[2]]
[1] 826 338038 788465 1256860 1671245 2164912
[[3]]
[1] 366653 1570981 1603466
The simplest way to do this would be
authors<-vector("list",length(data_papers))
for(i in 1:length(data_papers)){
authors[i]<-as.data.frame(paper_author$author_id[which(paper_author$paper_id%in%data_papers[i])])}
But the computation time is very high
The other alternative is something like below taken from efficient programming in R
i=1:length(data_papers)
authors[i]<-as.data.frame(paper_author$author_id[which(paper_author$paper_id%in%data_papers[i])])
But i am not able to do this.
How could this be done.thanks