I have a bunch of documents where some of the documents are a copy of other documents with their text jumbled up and some of the words replaced by their synonyms. Mentioned below is one such example of a sentence:
Article 1 (original) : I caught up with John Snow in town making purchases at Kingslanding Hardware store to repair a broken tractor. Snow has farmed soybeans his entire life, as did his father and their fathers. I asked him about his life on the farm.
Article 2 (duplicate) : I obtained John Snow which in city in purchases make rise of the hardware at Kingslanding to repair a broken motor tractor. Snow have soya broad beans complete life have been treated, such as its father and their fathers. I asked him concerning its life on the agriculture company.
Article 3 (duplicate) : I took for above with John Snow in the city that made purchases in the warehouse of the hardware of Kingslanding to repair an broken tractor. Snow has cultivated the soybeans its whole life, like its father and his parents. I asked to him about its life in the farm.
Article 4 (duplicate) : I caught up with myself compared to John Snow downtown making of the purchases to the kingslanding store of material to repair a broken tractor. Snow cultivated soya its life whole, just as his/her father and their fathers. I questioned it about his life with the farm.
I want to do a document similarity which ends up tagging all these documents in the same group. Any suggestions along with examples or tutorials will be greatly appreciated.