So in the screenshot below, we have 3 different energy sites, ID01, ID18, and ID31. They're in a dummy variable type of format, and for visualization purposes I want to just create a column named 'Sites' that I can use. You'll see the loop I quickly made to do this, but it seems super inefficient. Any pointers on how to achieve this in the fastest way possible?
Asked
Active
Viewed 2,632 times
1
-
3If you want to maximize the chances of getting an answer, it's always best to share text-based datasets... We can't play with the data in your image without having to recreate it, which is too much of a hassle. – Julien Marrec May 14 '18 at 14:06
-
idxmax worked, thank you. – Justin May 14 '18 at 14:17
1 Answers
5
Setup
data = pd.DataFrame([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 1, 0]
], columns=['ID01', 'ID18', 'ID31']).assign(A=1, B=2)
data
ID01 ID18 ID31 A B
0 1 0 0 1 2
1 0 1 0 1 2
2 0 0 1 1 2
3 1 0 0 1 2
4 0 1 0 1 2
dot
product with strings and objects.
This works if these are truly dummy values 0
or 1
def undummy(d):
return d.dot(d.columns)
data.assign(Site=data.filter(regex='^ID').pipe(undummy))
ID01 ID18 ID31 A B Site
0 1 0 0 1 2 ID01
1 0 1 0 1 2 ID18
2 0 0 1 1 2 ID31
3 1 0 0 1 2 ID01
4 0 1 0 1 2 ID18
argmax
slicing
This works but can produce unexpected results if data is not as represented in question.
def undummy(d):
return d.columns[d.values.argmax(1)]
data.assign(Site=data.filter(regex='^ID').pipe(undummy))
ID01 ID18 ID31 A B Site
0 1 0 0 1 2 ID01
1 0 1 0 1 2 ID18
2 0 0 1 1 2 ID31
3 1 0 0 1 2 ID01
4 0 1 0 1 2 ID18

piRSquared
- 285,575
- 57
- 475
- 624
-
-
-
1@JustinZint make sure to upvote the answer if it was helpful, and if it solved your problem please also mark it as accepted so we can flag the thread as resolved. – Julien Marrec May 14 '18 at 14:42
-
1on a large dataframe, it is a bit faster to use [simple indexing with `np.where`](https://stackoverflow.com/a/51275990/6671176) than `dot` (I timed both methods with a 100000 row dataframe, `dot` took about 0.016 seconds on average, and `np.where` took about 0.00681 seconds on average) – sacuL Jul 11 '18 at 02:06