I have a dataset as below.
Requirement is to count the number of IP only - Both Parent & Child in a subcluster within a cluster is IP, P&T only - Both Parent & Child in a subcluster within a cluster is P&T IP->P&T - When Parent is IP & Child is P&T in a subcluster within a cluster P&T->IP - When Parent is P&T & Child is IP in a subcluster within a cluster
Final_cluster Relation Subcluster Category
5 Parent 1 IP
5 Child 1 IP
5 Child 1 IP
5 Child 4 IP
5 Parent 4 P&T
5 Parent 5 IP
5 Child 5 P&T
5 Child 5 P&T
5 Child 5 P&T
5 Child 5 P&T
7 Parent 1 P&T
7 Child 1 P&T
7 Parent 2 IP
7 Child 2 IP
7 Parent 3 P&T
7 Child 3 P&T
7 Child 7 IP
7 Child 7 P&T
7 Parent 7 P&T
So, final result would be like:
Cluster IP-> IP P&T->P&T IP-> P&T P&T->IP
5 1 1 2
7 1 2 1
I was able to create count of single category using below sqldf
single_cat <- sqldf("SELECT Final_Cluster, Subcluster, category, COUNT(distinct(category)) AS count_single
FROM final_output_csv
GROUP BY Final_cluster, Subcluster
HAVING COUNT(distinct(category)) = 1")
single_cat_final <- sqldf("SELECT Final_Cluster,category, count(count_single) As total_count
FROM single_cat
GROUP BY Final_cluster,category ")