4

I am working with this csv file. It's a small dataset of laptop information.

laptops = pd.read_csv('laptops.csv',encoding="Latin-1")
laptops["Operating System"].value_counts()

Windows      1125
No OS          66
Linux          62
Chrome OS      27
macOS          13
Mac OS          8
Android         2
Name: Operating System, dtype: int64

I want to merge the variations of macOS and Mac OS under a single value "macOS".

I have tried this, which works.

mapping_dict = {
    'Android': 'Android',
    'Chrome OS': 'Chrome OS',
    'Linux': 'Linux',
    'Mac OS': 'macOS',
    'No OS': 'No OS',
    'Windows': 'Windows',
    'macOS': 'macOS'
}

laptops["Operating System"] = laptops["Operating System"].map(mapping_dict)

laptops["Operating System"].value_counts()

Windows      1125
No OS          66
Linux          62
Chrome OS      27
macOS          21
Android         2
Name: Operating System, dtype: int64

Is this is the only way or the best way of doing it? Assume such requirement might arise for multiple values (and not just macOS).

Ravindra S
  • 6,302
  • 12
  • 70
  • 108
  • 1
    I think `map` is good enough in your case. If there are multiple values, the only thing you need to alter is just the dictionary, not the `map` function at all. –  Jun 03 '22 at 15:49
  • @RavindraS check out my solution. I think it will give you the flexibility you might be looking for. – 965311532 Jun 04 '22 at 15:35

4 Answers4

1
laptops['Operating System'] = laptops['Operating System'].str.replace(r'(?i)(mac|mc).*os', 'macOS', regex=True)
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
  • As commented earlier, this just solves this particular case of Mac Os. What if there are many more such cases? Looking to improve upon the solution I posted. – Ravindra S Jun 03 '22 at 07:05
  • @RavindraS: Updated. It is now more general. Is this what you are looking for? Could you be more specific and list a few more variations on the exact names you want to replace with 'macOS'? – Timur Shtatland Jun 03 '22 at 15:11
1

This code would do the trick. But you have to know the possible variants in advance. In case it is not feasible to know them in advance, it is going to be another issue not to be discussed here under python and pandas tags.

df['Operating System'][df['Operating System'].str.lower().isin(['mac', 'osx', 'macos'])] = 'Mac OS'

0

You can simply do

laptops['Operating System'] = laptops['Operating System'].replace('Mac OS', 'macOS')
Ach113
  • 1,775
  • 3
  • 18
  • 40
  • This just solves one particular case of replacing "Mac OS" with "macOS". There can be other variations of mac os as well. MAC OS, MC OS. And as I said, variations to other values might exist as well. Looking for a generic solution and trying to improve the solution I posted. – Ravindra S Jun 02 '22 at 17:11
  • You can pass it as a list like `.replace(['Mac OS', 'mac OS'], 'macOS')` – Ach113 Jun 02 '22 at 17:13
  • Do you have all the variations in hand or you're complaining about the number of them being huge? – MohammadReza Hosseini Jun 03 '22 at 22:12
0

I'd do this:

# Generate a dict of list, where each key is the name you want
# to assign and the lists contain the variations of the main name
aliases = {
    "macOS": ["mac", "osx", "Mac OS"],
    "Windows": ["win", "windows", "Windows"],
}

# Create a map so it's easier to lookup all the names
aliases_map = {v: k for k, v in aliases.items() for v in v}

# Replace all of the aliases with its respective main name
laptops["Operating System"] = laptops["Operating System"].replace(aliases_map)

Output of laptops["Operating System"].value_counts():

Windows      1125
No OS          66
Linux          62
Chrome OS      27
macOS          21
Android         2
Name: Operating System, dtype: int64
965311532
  • 506
  • 3
  • 14