1

For example, I get I document that contains 2 sentences: I am a person. He also likes apples. Do we need to count the cooccurrence of "person" and "He" ?

Jing Gu
  • 439
  • 1
  • 3
  • 9

1 Answers1

1

Each document is separated with a line break. Context windows of cooccurrences are limited to each document.

Based on the implementation here.

A newline is taken as indicating a new document (contexts won't cross newline).

So, depending on how you prepare sentences, you may get different results:

Setting 1: ('He', 'person') cooccurred

...
I am a person. He also likes apples.
...

Setting 2: ('He', 'person') not cooccurred

...
I am a person. 
He also likes apples.
...
Mehdi
  • 4,202
  • 5
  • 20
  • 36
  • HI, I wonder if you could answer my another question. In the original paper, best window size is 8, why the default window size in the implementation is 15? Thanks for your time. – Jing Gu Jun 29 '19 at 19:32
  • The original paper compared symmetric and asymmetric context-window with 5 different sizes {2,4,6,8,10}. I think the conclusion was the larger window size increases the accuracy on semantic tasks, but this is not the case for syntactic tasks. I don't know if there is a specific reason behind number 15 in their released code. – Mehdi Jun 29 '19 at 21:30