2

I am using the qdap package for polarity analysis. In the CSV file I have a sentence without punctuation like "Sucks to not be removable" (no period). After using sentsplit on the dataframe, this row is showing NA.

How do I add endmarks to the incomplete sentences in R? Is there a way to stop this?

Dason
  • 60,663
  • 9
  • 131
  • 148
Dutta
  • 663
  • 2
  • 11
  • 30

1 Answers1

4

Many of the qdap functions expect properly formatted/structured data forms. This generally means sentences with endmarks and often only one sentence per row. This is how the algorithms determine what is a sentence. If the sentences are indeed incomplete sentences qdap expects the pipe sign "|" to denote this. So here's an example where detect missing endmarks with end_mark function and then paste a | at the end:

dat <- DATA
dat[1, 4] <- "Sucks to not be removable"
missing <- end_mark(dat[["state"]]) == "_"
dat[["state"]][missing] <- paste0(dat[["state"]][missing], "|")

sentSplit(dat, "state")

##        person  tot sex adult code                       state
## 1         sam  1.1   m     0   K1  Sucks to not be removable|
## 2        greg  2.1   m     0   K2     No it's not, it's dumb.
## 3     teacher  3.1   m     1   K3          What should we do?
## 4         sam  4.1   m     0   K4        You liar, it stinks!
## 5        greg  5.1   m     0   K5     I am telling the truth!
## 6       sally  6.1   f     0   K6      How can we be certain?
## 7        greg  7.1   m     0   K7            There is no way.
## 8         sam  8.1   m     0   K8             I distrust you.
## 9       sally  9.1   f     0   K9 What are you talking about?
## 10 researcher 10.1   f     1  K10           Shall we move on?
## 11 researcher 10.2   f     1  K10                  Good then.
## 12       greg 11.1   m     0  K11                 I'm hungry.
## 13       greg 11.2   m     0  K11                  Let's eat.
## 14       greg 11.3   m     0  K11                You already?

Incidentally, the dev version of qdap (version >= 2.1.1) contains a new line of data formatting functions including check_text to automatically check for potential formatting issues and print a report that gives the location of potential problems and possible fixes.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519