1

Okay, so I want to have a single plot where I would have a "x" number of novels and we would be able to see the dispersion of a particular word throughout all novels. Every novel has a different length (number of total words), so the "x" axis would have to be the novels and the "y" axis would have to be the length of each novel. Right now, I am able to create a separate plot for every novel, but I want to have all of them together. Here's what I have so far:

input.dir<-("corpus2")
files.v<-dir(input.dir, "\\.txt$")

corpus<-corpus(files.v, input.dir)

tiempo<-tiempo(corpus)

noche<-palabra("día", corpus, tiempo)

dispersion(noche)

#corpus

corpus<-function(files.v, input.dir){
  text.word.vector.l<-list()
  for(i in 1:length(files.v)){
    text.v <- scan(paste(input.dir, files.v[i], sep="/"), what="character", sep="\n")
    Encoding(text.v)<-"UTF-8"
    text.v <- paste(text.v, collapse=" ")
    text.lower.v <- tolower(text.v)
    text.words.v <- strsplit(text.lower.v, "\\W")
    text.words.v <- unlist(text.words.v)
    text.words.v <- text.words.v[which(text.words.v!="")]
    text.word.vector.l[[files.v[i]]] <- text.words.v
  }
  return(text.word.vector.l)
}

#tiempo

tiempo <- function(argument1){
  tiempo.l<-list()
  for (i in 1:length(argument1)){
    time<-seq(1:length(argument1[[i]]))
    tiempo.l[[files.v[i]]]<-time
  }
  return(tiempo.l)
}

#palabra

palabra<-function(keyword, argument1, argument2){
  hits.l<-list()
  for (i in 1:length(argument1)) {
    hits.v<-which(argument1[[i]]==keyword)
    hits.keyword.v<-rep(NA, length(argument2[[i]]))
    hits.keyword.v[hits.v]<-1
    hits.l[[files.v[i]]]<-hits.keyword.v
  }
  return(hits.l)
}

#dispersion

dispersion<-function(argument1){
  options(scipen=5)
  for (i in 1:length(argument1)) {
    plot(argument1[[i]], main="Dispersion plot",
         xlab="time", ylab="keyword", type="h", ylim=c(0,1), yaxt='n')
  }
}

How can I plot this together? Here's a picture of what I feel it should look like:

example

What I am trying to do is more or less having all these plots together: enter image description here

  • What do you mean by "dispersion?" Are you trying to plot the frequency with which a word appears in each novel, using a barchart? Please read [How to Create a Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve) and update your question. – Len Greski Sep 03 '18 at 22:47
  • What I am trying to do is to plot "where" a specific word appears along a novel. Image that we set how long a novel is by the total number of words. Do I make myself clear? – Marcus Vinícius Barbosa Sep 03 '18 at 22:56
  • Thank Marcus.Looks like eipi10 provided an answer. – Len Greski Sep 04 '18 at 01:36

1 Answers1

4

Your example isn't reproducible, so the code below uses novels by Jane Austen to plot word locations using ggplot2. Hopefully you can adapt this code to your needs

library(tidyverse)
library(janeaustenr)
library(scales)

# Function to plot dispersion of a given vector of words in novels by Jane Austen
plot.dispersion = function(words) {

  pattern = paste(words, collapse="|")

  # Get locations of each input word in each text
  # Adapted from Text Mining with R (https://www.tidytextmining.com/tfidf.html)
  texts = austen_books() %>% 
    group_by(book) %>% 
    mutate(text = str_split(tolower(text), "\\W")) %>% 
    unnest %>% 
    filter(text != "") %>% 
    mutate(word.num = 1:n(),
           pct = word.num/n()) %>% 
    filter(grepl(pattern, text)) %>% 
    mutate(text = str_extract(text, pattern))

  # Plot the word locations
  ggplot(texts, aes(y=book, x=pct)) +
    geom_point(shape="|", size=5) +
    facet_grid(text ~ .) +
    scale_x_continuous(labels=percent) +
    labs(x="Percent of book", y="") +
    theme_bw() +
    theme(panel.grid.major.x=element_blank(),
          panel.grid.minor.x=element_blank())
}

plot.dispersion(c("independent", "property"))

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Hi. Thanks for you answer. It is really what I'm looking for. I have only one problem. My books are protected by copyrights, so that's why I cannot shared them with you so you can reproduce what I'm trying to do. I always use them in plain text. Is there anyway I can use them replacing the Austen books? How would I do that? – Marcus Vinícius Barbosa Sep 04 '18 at 14:02