1

I have a subset of a dataframe (actual dataframe is a lot larger with over 3000 rows) in Python such that

df <- data.frame(Name= c["John", "Karla", "Sandy", "John", "John", "Sandy"],       
Course Title = c["Training 2", "Training", "Training 2", "Training 5", "Training 2", "Training 2"],              
 Start Date= c["2022-11-08", "2022-11-25", "2023-02-09", "2023-03-15", "2023-03-19", "2023-03-19"], 
Completion Date= c["2022-11-09", "2022-11-28", "2023-02-09", "2023-03-20", "2023-03-21", "2023-03-19"]))

I want to pivot the dataframe in such a way that it groups by the Name, and each name has one single row, and the values in 'Course Title' are the column names, with the column values indicating whether each person completed the training, and with the start & completion dates following each Training column.

I want the result to be this with the columns in this order:

df_pivot <- data.frame(Name= c["John", "Karla", "Sandy"],       
Training = c[1, 1, 1],              
 Training_Start Date= c["2022-11-08", "2022-11-25", "2023-03-19"], 
Completion Date= c["2022-11-09", "2022-11-28", "2023-03-19"],
Training 2 = c[1, NaN, 1], 
Training 2_Start Date= c["2023-03-29", NaN, "2023-02-09"],
Training 2_Completion Date= c["2023-03-21", NaN, "2023-02-09"],
Training 5 = c[1, NaN, NaN],
Training 5_Start Date = c["2023-03-15", NaN, NaN]),
Training 5_Completion Date = c["2023-03-20", NaN, NaN])

This is what I've tried, but the last part of the code gives me an error.

# Pivot the dataframe
df_pivot = df.pivot_table(index='Name', columns='Course Title',
                          values=['Course Title', 'Start Date', 'Completion Date'],
                          aggfunc={'Course Title': 'count', 'Start Date': 'first', 'Completion Date': 'first'})

# Flatten the column names
df_pivot.columns = [f'{col[1]}_{col[0]}' if col[0] != '' else col[1] for col in df_pivot.columns]

# Reset the index
df_pivot = df_pivot.reset_index()

# Reorder the columns
columns = ['Name']
for title in df['Course Title'].unique():
    columns.append(title)
    columns.append(f'{title}_Start Date')
    columns.append(f'{title}_Completion Date')

df_pivot = df_pivot[columns]

KeyError Traceback (most recent call last) in ----> 1 df_pivot = df_pivot[columns]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 2906 if is_iterator(key): 2907 key = list(key) -> 2908 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1] 2909 2910 # take() does not accept boolean indexers

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing) 1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr) 1253 -> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing) 1255 return keyarr, indexer 1256

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing) 1302 if raise_missing: 1303 not_found = list(set(key) - set(ax)) -> 1304 raise KeyError(f"{not_found} not in index") 1305 1306 # we skip the warning on Categorical

Donut
  • 11
  • 4
  • Please post code, data, and results as text, not screenshots ([how to format code in posts](https://stackoverflow.com/help/formatting)). [Why should I not upload images of code/data/errors?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors) http://idownvotedbecau.se/imageofcode – Barmar Jun 08 '23 at 16:47
  • Please post the full traceback of the error. – Barmar Jun 08 '23 at 16:49
  • posted full error – Donut Jun 08 '23 at 17:03
  • `take() does not accept boolean indexers` is not an error message. It's a comment before some other part of the code. – Barmar Jun 08 '23 at 17:05
  • The error means that you have column names in `columns` that aren't columns of the dataframe. – Barmar Jun 08 '23 at 17:06
  • Figured it out, thank you! – Donut Jun 08 '23 at 17:56

0 Answers0