Correlation and graph layout in widyr and ggraph when tidy text mining

Question

I'm using a tutorial (https://www.tidytextmining.com/nasa.html?q=correlation%20ne#networks-of-keywords) to learn about tidy text mining. I am hoping someone might be able to help with two questions:

in this tutorial, the correlation used to make the graph is 0.15. Is this best practice? I can't find any literature to help choose a cut off.
In the graph attached from the tutorial, how are clusters centrality chosen? Are more important words closer to the centre?

Thanks very much

score 0 · Answer 1 · answered Oct 15 '21 at 18:24

I am not aware of any literature on a correlation threshold to use for this kind of network analysis; this will (I believe) depend on your particular dataset and how language is used in your context. This is a heuristic decision. Given what a correlation coefficient measures, I would expect 0.15 to be on the low side of what you might use.
The graph is represented visually in a two-dimensional plot via the layout argument of ggraph. You can read more about that here but the very high-level takeaways are that there are a lot of options, they have a big impact on what your graph looks like, and often it's not clear what is the best choice.

1 Answers1