1

I have a txt file (data5.txt):

1   0   1   0   0

1   1   1   0   0

0   0   1   0   0

1   1   1   0   1

0   0   0   0   1

0   0   1   1   1

1   0   0   0   0

1   1   1   1   1

0   1   0   0   1

1   1   0   0   0

I need to count the frequency of one's and zero's in each column

if the frequency of ones >= frequency of zero's then I will print 1 after the last row for that Colum

I'm new in R, but I tried this, and I got error:

Error in if (z >= d) data[n, i] = 1 else data[n, i] = 0 : 

  missing value where TRUE/FALSE needed

my code:

data<-read.table("data5.txt", sep="")

m =length(data)

d=length(data[,1])/2

n=length(data[,1])+1

for(i in 1:m)
{

    z=sum(data[,i])

    if (z>=d) data[n,i]=1 else data[n,i]=0
}
Henrik
  • 65,555
  • 14
  • 143
  • 159
Meem
  • 47
  • 5
  • +1 for providing a minimal, reproducible example, the code you have tried and what went wrong in your first question on SO. Cheers! – Henrik Nov 10 '13 at 10:50
  • I'm so grateful, thanks a lot sir. I’m really sorry, for not being clear; English is not my first language, and I tried my best. What do you mean by my first question? You mean this: http://stackoverflow.com/questions/19848676/clustering-transactional-data-using-pam-in-r Actually, this was a different question. I was asking for clustering some transactions (data mining: clustering + association rules mining) . – Meem Nov 10 '13 at 23:14
  • For example, I have several transactions (each row represents transaction):
    1,2,5,8
    1,3,5,9
    2,5,9,11
    2,4,5,8
    2,4,5,9
    So, what I did is: I applied clustering method (I used: pam), where the number of clusters =2, and the similarity function is jaccard. After clustering: I got in txt file:
    “x”
    “1” 1
    “2” 1
    “3” 2
    “4” 2
    “5” 2
    Which means: the 1st , and 2nd transactions are in cluster number 1, where the 3rd, 4th, and 5th transactions are in cluster 2
    – Meem Nov 10 '13 at 23:22
  • But I want to save the itemsets (item on each transaction) within its cluster in txt file. I mean I want the output file like:
    C1, 1,2,5,8
    C1, 1,3,5,9
    C2, 2,5,9,11
    C2, 2,4,5,8
    C2, 2,4,5,9
    – Meem Nov 10 '13 at 23:24

1 Answers1

2

You may try this:

rbind(df, ifelse(colSums(df == 1) >= colSums(df == 0), 1, NA))
#    V1 V2 V3 V4 V5
# 1   1  0  1  0  0
# 2   1  1  1  0  0
# 3   0  0  1  0  0
# 4   1  1  1  0  1
# 5   0  0  0  0  1
# 6   0  0  1  1  1
# 7   1  0  0  0  0
# 8   1  1  1  1  1
# 9   0  1  0  0  1
# 10  1  1  0  0  0
# 11  1  1  1 NA  1

Update, thanks to a nice suggestion from @Arun:

rbind(df, ifelse(colSums(df == 1) >= ceiling(nrow(df)/2), 1, NA)

or even:

rbind(df, ifelse(colSums(df == 1) >= nrow(df)/2, 1, NA)

Thanks to @SvenHohenstein.

Possibly I misinterpreted your intended results. If you want 0 when frequency of ones is not equal or larger than frequency of zero, then this suffice:

rbind(df, colSums(df) >= nrow(df) / 2)

Again, thanks to @SvenHohenstein for his useful comments!

Henrik
  • 65,555
  • 14
  • 143
  • 159