According to the this published page BCubed precision and recall, thus F1-Measure calculation is the best technique for evaluating clustering performance. See Amigó, Enrique, et al. "A comparison of extrinsic clustering evaluation metrics based on formal constraints." Information retrieval 12.4 (2009): 461-486.
It shows BCubed calculation as seen below image
So as far as i understood we calculate precision and recall for the each item and then take average of their sum?
However my understanding is not matching their given evaluation as can be seen at the image below
According to the image above cluster homogeneity example - left side, I calculate Precision of BCubed as below but not matching
black : 4/4
gray: 4/7
Other three each one : 1/7
so average precision is : (4/4 + 4/6 + 1/7 + 1/7 + 1/7) / 5
However this is not matching with their result in image which is 0.59
BCubed precision of an item is the proportion of items in its cluster which have the item’s category (including itself). The overall BCubed precision is the averaged precision of all items in the distribution. Since the average is calculated over items, it is not necessary to apply any weighting according to the size of clusters or categories. The BCubed recall is analogous, replacing “cluster” with “category”.