0

I'm moderately experienced using R, but I'm just starting to learn to write functions to automate tasks. I'm currently working on a project to run sentiment analysis and topic models of speeches from the five remaining presidential candidates and have run into a snag.

I wrote a function to do a sentence-by-sentence analysis of positive and negative sentiments, giving each sentence a score. Miraculously, it worked and gave me a dataframe with scores for each sentence.

    score        text
1     1        iowa, thank you.
2     2        thanks to all of you here tonight for your patriotism, for your love of country and for doing what too few americans today are doing.  
3     0        you are not standing on the sidelines complaining. 
4     1        you are not turning your backs on the political process.
5     2        you are standing up and fighting back.

So what I'm trying to do now is create a function that takes the scores and figures out what percentage of the total is represented by the count of each score and then plot it using plotly. So here is the function I've written:

scoreFun <- function(x){{
  tbl <- table(x)
  res <- cbind(tbl,round(prop.table(tbl)*100,2))
  colnames(res) <- c('Score', 'Count','Percentage')
  return(res)
}
  percent = data.frame(Score=rownames, Count=Count, Percentage=Percentage)
  return(percent)
}

Which returns this:

saPct <- scoreFun(sanders.scores$score)
saPct

     Count Percentage
-6     1       0.44
-5     1       0.44
-4     6       2.64
-3    13       5.73
-2    20       8.81
-1    42      18.50
0     72      31.72
1     34      14.98
2     18       7.93
3      9       3.96
4      6       2.64
5      2       0.88
6      1       0.44
9      1       0.44
11     1       0.44

What I had hoped it would return is a dataframe with what has ended up being the rownames as a variable called Score and the next two columns called Count and Percentage, respectively. Then I want to plot the Score on the x-axis and Percentage on the y-axis using this code:

d <- subplot(
  plot_ly(clPct, x = rownames, y=Percentage, xaxis="x1", yaxis="y1"),
  plot_ly(saPct, x = rownames, y=Percentage, xaxis="x2", yaxis="y2"),
  margin = 0.05,
  nrows=2
) %>% layout(d, xaxis=list(title="", range=c(-15, 15)),
             xaxis2=list(title="Score", range=c(-15,15)),
             yaxis=list(title="Clinton", range=c(0,50)),
             yaxis2=list(title="Sanders", range=c(0,50)),showlegend = FALSE)
d

I'm pretty certain I've made some obvious mistakes in my function and my plot_ly code, because clearly it's not returning the dataframe I want and is leading to the error Error in list2env(data) : first argument must be a named list when I run the `plotly code. Again, though, I'm not very experienced writing functions and I've not found a similar issue when I Google, so I don't know how to fix this.

Any advice would be most welcome. Thanks!

ldlpdx
  • 61
  • 1
  • 13
  • what does str(saPct) return? – MLavoie Apr 29 '16 at 22:27
  • `num [1:15, 1:2] 1 1 6 13 20 42 72 34 18 9 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:15] "-6" "-5" "-4" "-3" ... ..$ : chr [1:2] "Count" "Percentage"` – ldlpdx Apr 29 '16 at 23:19
  • 1
    so it's list, you need to turn it into a data frame (clPct as well) – MLavoie Apr 30 '16 at 00:43
  • Thanks! I turned it into a data frame, but I lost the row names, which are the `scores` that I want for my plot. Then, I tried to backtrack and rerun the `scoreFun` function and it's not working now. It's giving me this error: `'names' attribute [3] must be the same length as the vector [2]`. My `prop.table` isn't returning the row names as part of the vector. I got partial code from [this question](http://stackoverflow.com/questions/9623763/in-r-how-can-i-compute-percentage-statistics-on-a-column-in-a-dataframe-tabl). There are other code examples there, so I will try them. Thanks again! – ldlpdx Apr 30 '16 at 12:29

1 Answers1

0

@MLavoie, this code from the question I referenced in my comment did the trick. Many thanks!

scoreFun <- function(x){
  tbl <- data.frame(table(x))
  colnames(tbl) <- c("Score", "Count")
  tbl$Percentage <- tbl$Count / sum(tbl$Count) * 100
  return(tbl)
}
ldlpdx
  • 61
  • 1
  • 13