5

I need to do a pretty complex matching of phrases. I have large bodies of text in files which exceed 1000 words each.

The phrases I am searching for (searchphrase) are like this:

Investment does not mean: i. Claims to money that arise solely from: 1. Commercial contracts for the sale of goods or services by a national or an enterprise of a party to an enterprise in the territory of the other party, or 2. The extension of credit in connection with a commercial transaction, such as trade financing other than loans or claims to money previously covered.

I want to know if the phrase occurs in each of the files I have. However, the files will not have content that are exact replicas of the phrase. Instead the file (textfile) will be a large document with a paragraph like:

But investment does not mean claims to money derived solely from commercial transactions designed exclusively for the sale of goods or services by a national or legal person in the territory of one Contracting Party to a national or legal person in the territory of the other Contracting Party, credits to finance commercial transactions such as trade financing, and other credits with a duration of less than three years, as well as credits granted to the State or to a State enterprise.

As you can see, searchphrase is pretty similar in actual meaning to this paragraph from textfile. There is also considerable overlap in the keywords. Hence, I should get a match.

What sort of algorithm should I try and use to code this? Are pre-coded modules available anywhere that do this job?

amirouche
  • 7,682
  • 6
  • 40
  • 94
shoi
  • 167
  • 1
  • 3
  • 7

0 Answers0