I have CSV data in the following format:
+-----------------+--------+-------------+
| reservation_num | rate | guest_name |
+-----------------+--------+-------------+
| B874576 | 169.95 | Bob Smith |
| H786234 | 258.95 | Jane Doe |
| H786234 | 258.95 | John Doe |
| F987354 | 385.95 | David Jones |
| N097897 | 449.95 | Mark Davis |
| H567349 | 482.95 | Larry Stein |
| N097897 | 449.95 | Sue Miller |
+-----------------+--------+-------------+
I would like to add a feature (column) to the DataFrame called 'rate_per_person'. It would be calculated by taking the rate for a particular reservation number and dividing it by the total number of guests who have that same reservation number associated with their stay.
Here is my code:
#Importing Libraries
import pandas as pd
# Importing the Dataset
ds = pd.read_csv('hotels.csv')
for index, row in ds.iterrows():
row['rate_per_person'] = row['rate'] / ds[row['reservation_num']].count
And the error message:
Traceback (most recent call last):
File "<ipython-input-3-0668a3165e76>", line 2, in <module>
row['rate_per_person'] = row['rate'] / ds[row['reservation_num']].count
File "/Users/<user_name>/anaconda/lib/python3.6/site-packages/pandas/core/frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "/Users/<user_name>/anaconda/lib/python3.6/site-packages/pandas/core/frame.py", line 2069, in _getitem_column
return self._get_item_cache(key)
File "/Users/<user_name>/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 1534, in _get_item_cache
values = self._data.get(item)
File "/Users/<user_name>/anaconda/lib/python3.6/site-packages/pandas/core/internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "/Users/<user_name>/anaconda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2395, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)
File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)
File "pandas/_libs/hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)
File "pandas/_libs/hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)
KeyError: 'B874576'
Based on the error message, clearly there is an issue with the ds[row['reservation_num']].count
portion of the last line of code. However, I am unsure the right way to obtain the number of guests per reservation in a manner that will allow me to programmatically create the new feature.