In response to @j.jerrod.taylor's answer, let me rephrase my question to clear any misunderstanding.
I'm new to Data Mining and am learning about how to handle noisy data by smoothing my data using the Equal-width/Distance Binning method via "Bin Boundaries". Assume the dataset 1,2,2,3,5,6,6,7,7,8,9. I want to perform:
- distance binning with 3 bins, and
- Smooth values by Bin Boundaries based on values binned in #1.
Based on definition in (Han,Kamber,Pei, 2012, Data Mining Concepts and Techniques, Section 3.2.2 Noisy Data):
In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.
- Interval width = (max-min)/k = (9-1)/3 = 2.7
Bin intervals = [1,3.7),[3.7,6.4),[6.4,9.1]
original Bin1: 1,2,2,3 | Bin boundaries: (1,3) | Smooth values by Bin Boundaries: 1,1,1,3
- original Bin2: 5,6,6 | Bin boundaries: (5,6) | Smooth values by Bin Boundaries: 5,6,6
- original Bin3: 7,7,8,9 | Bin boundaries: (7,9) | Smooth values by Bin Boundaries: 7,7,8,9
Question: - where does 8 belong to in Bin3 when binned using Bin boundaries method, since it's +1 from 7 and -1 from 9?