In my previous post on sentiment analysis I used a dataframe to plot the trajectory of sentiment across the novel Picnic at Hanging Rock. In this post I will use the same dataframe of non-unique, non-stop, greater than 3 character words (red line from an earlier post) to create a network of associated words. Words can be grouped by sentence, paragraph, or chapter. I have already removed stop words and punctuation, so I will use my previous grouping of every 15 words in the order they appear in the novel. Looking at my dataframe rows 10 to 20:
1 | > d2[10:20,] |
You can see the column “group” has grouped every 15 words. First I create a table of word cooccurences using the pair_count function, then I use ggraph to create the network graph. The number of cooccurences are reflected in edge opacity and width. At the time of this writing, ggraph was still in beta and had to be downloaded from github and built locally. The igraph package provides the graph_from_data_frame function.
1 |
|
Lets regroup every 25 words:
1 |
|
And now include only words with 5 occurences or more:
1 |
|