0

My problem is best explained by example.

Imagine you have a data set that has the details of 100,000 vehicles giving the vehicle type (car, motorcycle, truck) and their colour (red, green, blue) as well as top speed, weight, etc.

Let's say 40,000 are cars. My question is, given the grouping of all cars, what volume of 'green cars' needs to exist in the data for me to be able to extract 'green cars' as a category in its own right and assess the 'top speed' or 'weight' for 'green cars' on their own? Or put another way, when does the volume of 'green cars' become too small that the categorization of 'green cars' is no longer statistically significant? I want to run this in Python.

I really hope that makes sense. Thanks.

0 Answers0