I am working on a data cleaning exercise where I need to remove special characters like '$#@' from the 'price' column, which is of object type (string). After that, I need to convert it to float type. However, the decimal point position changes when I run the code. For example, 9.99 becomes 999.00. I have tried different sets of codes, but some of them change the values to NaN.
#Step 1 I created a data frame with special data to clean it.
#Create a dictionary of wine data
wine_data = { ' country': ['Italy ', 'It aly ', ' $Chile ', 'Sp ain', '$Spain', 'ITALY', '# Chile', ' Chile', 'Spain', ' Italy'], 'price ': [24.99, np.nan, 12.99, '$9.99', 11.99, 18.99, '@10.99', np.nan, '#13.99', 22.99], '#volume': ['750ml', '750ml', 750, '750ml', 750, 750, 750, 750, 750, 750], 'ran king': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'al cohol@': [13.5, 14.0, np.nan, 12.5, 12.8, 14.2, 13.0, np.nan, 12.0, 13.8], 'total_PHeno ls': [150, 120, 130, np.nan, 110, 160, np.nan, 140, 130, 150], 'color# _INTESITY': [10, np.nan, 8, 7, 8, 11, 9, 8, 7, 10], 'HARvest_ date': ['2021-09-10', '2021-09-12', '2021-09-15', np.nan, '2021-09-25', '2021-09-28', '2021-10-02', '2021-10-05', '2021-10-10', '2021-10-15'] }
Create a Pandas data frame from the dictionary
df = pd.DataFrame(wine_data)
Step 2
To clean the 'price' column and remove special characters, a new column named 'price' was created. The str.replace() method was employed with the regular expression '\D' to remove any non-numeric characters. However, in positions 3, 6, and 8, the decimal point was shifted to the right resulting in values like 999.00 instead of 9.99.
I tried the next codes:
#Great! but, it changes the decimal point in some of the values
df['price'] = df['price'].replace({'\D': ''}, regex=True).astype(float)
#Not Working! price values are changed into NaN
df['price'] = df['price'].str.replace('\D', '')
#Not Working
#I tried to fill it with '0' NaN. But, other values were changed into NaN
df['price'] = df['price'].fillna('0').str.replace(r'\D', r'') df['price'] = df['price'].fillna('0').str.replace(r'\D', r'', regex=True).astype(float)
I make a conscious effort to practice and improve my data cleaning skills by creating problems for myself. However, there are times when I am unable to solve them on my own.your text