I have two bibliographic datasets A & B (.bib files, WoS export, full record & cited references). Both of them contain relevant and irrelevant results. The first dataset A has been cleaned so that I have the relevant results A(r) and irrelevant results A(i) as two different datasets (.bib files). The second dataset B encompasses my first dataset A completely. visualisation of my two datasets
Goal: I am looking for a way to remove the irrelevant results A(i), which I have already identified in my first dataset, from my second dataset B.
Approach: If I were to merge the datasets B & A(i) I could trace the irrelevant results A(i) in B by using a remove duplicate function since A(i) would occur twice in B. However, this would only remove the duplicates of A(i) and not all instances of A(i).
Functions to remove duplicats:
package revtools
matches <- find_duplicates(data, match_variable = "title")
data_unique <- extract_unique_references(data, matches)
package bibliometrix
duplicatedMatching(M, Field = "TI", tol = 0.95)
•Q1: Is there a way to remove all instances of duplicates (the duplicates and the originals) identified through a find/remove duplicate function?
•Q2: Is there a better way for removing A(i) from B? i.e. remove all instances of duplicates in a dataset
•Q3: More generally asking: can I search for a larger amount of specific bibliographic data in my dataset (a list of papers) and remove it from that dataset?
Thank you so much for your help!