Cross data matching algorithm (seperate datasets) in R or any machine learning platform

Question

I have two datasets. One with details of contracts and other with details of organizations. For eg: One dataset has details- Company name, description, company type. Other datasets has details- Contract name, Contract description, CPV code. I want an algorithm that can 1) given a company can we find the top 10 contracts that are most closely related or potentially interesting to this company. 2. Or given a contract can we find the companies most likely to bid or win the contract. This might be a one off, real time algorithm to match one row of the first dataset to a best match cluster in the second dataset. Is it possible to do this type of row by row cross matching in two different datasets? Is it possible to use text descriptions for this kind of matching? It would be of great help if someone has code examples. Thank you. I am also attaching example datasets here.

Company data

Contract data

Please add an example of what your two datasets look like. – Barker Aug 17 '16 at 16:03 — Barker, Aug 17 '16 at 16:03
@Barker I have attached the datasets. Please check. – Joe Aug 17 '16 at 17:25 — Joe, Aug 17 '16 at 17:25

score 2 · Answer 1 · answered Aug 17 '16 at 18:05

2

Your question is effectively "Will someone do ~10K worth of data science for me for free?" What you are looking for is a recommender system and what seems more specifically to be a content based filtering system. In order for these to work, you are going to have to look at your two datasets and develop features that can be used to quantitatively describe the contracts and the clients. If you have information about previous contracts the organizations were interested in you can use a hybrid algorithm that incorporates aspects of collaborative filtering.

R has a package recommenderlab that can help you to work on these types of problems. I haven't used it, but skimming over it, it seems to be solid. If you are wanting something a little more plug and play though with fewer options, I would recommend checking out AzureML. It uses GUI interfaces to help guide users through the data science process including a recommender tutorial. You may also be able to use some of their text classifier tutorial to help engineer features from your fields containing free form text.

Best of luck.

answered Aug 17 '16 at 18:05

Barker

2,074
2
17
31

I have developed algorithms to use the descriptions from the data set. I have used word2vec, h2o and other text mining features to use the words in the description to classify tenders to specific categories. For eg: I can classify a health related tender to "health category" from the description. But I don't have a method to match a company description to this category. – Joe Aug 17 '16 at 19:18
I just need ideas or examples. I don't require a complete solution. I just want a method to cross match words in two datasets "for free" :). And yes. I will check out Azure ML for solutions. – Joe Aug 17 '16 at 19:24
If you are looking for general approaches to problems and algorithms to use, this question would be better in "Cross-Validated". "Stack Overflow" is more for technical help with implementation. As a more direct answer to your question, check out the links I posted to the wikipedia article on recommender systems as it includes many examples of classes of methods and the type of data they work on. If you can get your hands on information about past contracts and if they were accepted, that should really help you as you may not need to collect features for the clients. – Barker Aug 17 '16 at 19:34

Cross data matching algorithm (seperate datasets) in R or any machine learning platform

1 Answers1