How to count the frequency in binary tranactions per column, and adding the result after last row in R?

Question

I have a txt file (data5.txt):

1   0   1   0   0

1   1   1   0   0

0   0   1   0   0

1   1   1   0   1

0   0   0   0   1

0   0   1   1   1

1   0   0   0   0

1   1   1   1   1

0   1   0   0   1

1   1   0   0   0

I need to count the frequency of one's and zero's in each column

if the frequency of ones >= frequency of zero's then I will print 1 after the last row for that Colum

I'm new in R, but I tried this, and I got error:

Error in if (z >= d) data[n, i] = 1 else data[n, i] = 0 : 

  missing value where TRUE/FALSE needed

my code:

data<-read.table("data5.txt", sep="")

m =length(data)

d=length(data[,1])/2

n=length(data[,1])+1

for(i in 1:m)
{

    z=sum(data[,i])

    if (z>=d) data[n,i]=1 else data[n,i]=0
}

+1 for providing a minimal, reproducible example, the code you have tried and what went wrong in your first question on SO. Cheers! — Henrik, Nov 10 '13 at 10:50
I'm so grateful, thanks a lot sir. I’m really sorry, for not being clear; English is not my first language, and I tried my best. What do you mean by my first question? You mean this: http://stackoverflow.com/questions/19848676/clustering-transactional-data-using-pam-in-r Actually, this was a different question. I was asking for clustering some transactions (data mining: clustering + association rules mining) . — Meem, Nov 10 '13 at 23:14
For example, I have several transactions (each row represents transaction):
1,2,5,8
1,3,5,9
2,5,9,11
2,4,5,8
2,4,5,9
So, what I did is: I applied clustering method (I used: pam), where the number of clusters =2, and the similarity function is jaccard. After clustering: I got in txt file:
“x”
“1” 1
“2” 1
“3” 2
“4” 2
“5” 2
Which means: the 1st , and 2nd transactions are in cluster number 1, where the 3rd, 4th, and 5th transactions are in cluster 2 — Meem, Nov 10 '13 at 23:22
But I want to save the itemsets (item on each transaction) within its cluster in txt file. I mean I want the output file like:
C1, 1,2,5,8
C1, 1,3,5,9
C2, 2,5,9,11
C2, 2,4,5,8
C2, 2,4,5,9 — Meem, Nov 10 '13 at 23:24

Henrik · Accepted Answer · 2013-11-10T11:04:52.573

2

You may try this:

rbind(df, ifelse(colSums(df == 1) >= colSums(df == 0), 1, NA))
#    V1 V2 V3 V4 V5
# 1   1  0  1  0  0
# 2   1  1  1  0  0
# 3   0  0  1  0  0
# 4   1  1  1  0  1
# 5   0  0  0  0  1
# 6   0  0  1  1  1
# 7   1  0  0  0  0
# 8   1  1  1  1  1
# 9   0  1  0  0  1
# 10  1  1  0  0  0
# 11  1  1  1 NA  1

Update, thanks to a nice suggestion from @Arun:

rbind(df, ifelse(colSums(df == 1) >= ceiling(nrow(df)/2), 1, NA)

or even:

rbind(df, ifelse(colSums(df == 1) >= nrow(df)/2, 1, NA)

Thanks to @SvenHohenstein.

Possibly I misinterpreted your intended results. If you want 0 when frequency of ones is not equal or larger than frequency of zero, then this suffice:

rbind(df, colSums(df) >= nrow(df) / 2)

Again, thanks to @SvenHohenstein for his useful comments!

edited Nov 10 '13 at 11:04

answered Nov 10 '13 at 10:39

Henrik

65,555
14
143
159

1

@Arun You don't need `ceiling` here. – Sven Hohenstein Nov 10 '13 at 10:52
2

If I understand the OP correctly, the generated values should be 0 and 1, not `NA`. Hence, `rbind(dat, colSums(dat) >= nrow(dat) / 2)` is sufficient. – Sven Hohenstein Nov 10 '13 at 10:55

How to count the frequency in binary tranactions per column, and adding the result after last row in R?

1 Answers1