0

I have a dataset with two columns(character) - text, source , I want to create a dataset such that if a particular value in text column is present less than 5 times. I am using below code but it is giving only first occurrence instead I need all the observations which satisfies that condition. For example: 'iphone' is present for 4 times in text , so I need all the 4 observations and their corresponding source but I am getting only one observation for that.

test = sqldf('select * , count(*) as count from base group by text having count < 5')

I am familiar with proc sql in SAS and I have recently started learning R. Does sqldf works in a different way?

3 Answers3

0
# Let me assume a data frame
base <- data.frame(source = 1:100, text = sample(letters,replace=T,100))    
base2 <- merge(base, aggregate(base$text, by=list(base$text), length), by.x="text", by.y="Group.1")
your_output <- base2[base2$x<5, ]

I have used merge and aggregate instead of sqldf.

Using sqldf:

sqldf("select * from base where text in (select text from base group by text having count(*) < 5)")
Kartheek Palepu
  • 972
  • 8
  • 29
0

This query gives you those "text" for which the count of observations is less than 5. You need to write another query which selects all attributes from the master data set with all those "text" present in test. Hope this helps.

Nirvik Banerjee
  • 335
  • 5
  • 16
0

The query in the question works according to standard SQL rules and would work that way in any system that supports SQL.

It seems you really want this:

library(sqldf)
base <- data.frame(text = rep(letters[1:3], 4:6), record = 1:15)  # test input

sqldf('select * 
       from (select text, count(*) as count 
             from base 
             group by text 
             having count < 5)
       join base using(text)')

which for the test input gives:

  text count record
1    a     4      1
2    a     4      2
3    a     4      3
4    a     4      4

(In the future please provide complete self contained code including all inputs that readers can run to reproduce the problem.)

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks a lot! May be sas sql behaves in a different way, I should learn standard sql rules first. I got your point, i will provide self contained code from next time onward. – chaitanya pradeep Oct 27 '15 at 08:52