I am trying to calculate the semantic coherence in a given paragraph/transcript, ie. if somebody goes off track while talking about a thing or topic - more specifically describing a picture (the picture might have many sub details).
For example -
Transcript 1: I like to play sports. There are so many sports fans in the world.
Transcript 2: I like to play sports. There is a deadly virus spreading across the world.
Semantic coherence should be high for Transcript 1 and low for Transcript 2. I am using BERT (bert-as-service) to generate sentence embeddings for the sentences. I then try to compare sentence i and i+1 in a given transcript by calculating the cosine similarity between the sentence embedding vectors. I have also tried using a sliding window, with and without overlap to calculate cosine similarity.
The problem I am running into is, that the cosine similarities are very close for two sentences, for example the examples above whereas I would expect a greater difference between the two.
I am thinking of using an LSA Model trained on Wikipedia data next to see if I can see better differentiation. Is there a better method of doing this?