1

I'm using a tutorial (https://www.tidytextmining.com/nasa.html?q=correlation%20ne#networks-of-keywords) to learn about tidy text mining. I am hoping someone might be able to help with two questions:

  1. in this tutorial, the correlation used to make the graph is 0.15. Is this best practice? I can't find any literature to help choose a cut off.
  2. In the graph attached from the tutorial, how are clusters centrality chosen? Are more important words closer to the centre?

Thanks very much enter image description here

Gabriella
  • 421
  • 3
  • 11

1 Answers1

0
  • I am not aware of any literature on a correlation threshold to use for this kind of network analysis; this will (I believe) depend on your particular dataset and how language is used in your context. This is a heuristic decision. Given what a correlation coefficient measures, I would expect 0.15 to be on the low side of what you might use.

  • The graph is represented visually in a two-dimensional plot via the layout argument of ggraph. You can read more about that here but the very high-level takeaways are that there are a lot of options, they have a big impact on what your graph looks like, and often it's not clear what is the best choice.

Julia Silge
  • 10,848
  • 2
  • 40
  • 48