4

I have a pandas database of some sports data. The columns are name, age, birth city, birth country, rookie, weight, and problem. The original data had birthcity as "City,State" for American players, so when I used a comma delimiter the result was two variables. So now all the American players are shifted over, and I needed to make a "Problem" variable to account for the excess.

How can I shift just the Americans over to the left across thousands of observations? Thanks!

What I have (please excuse the table formatting):

Name Age BirthCity BirthCountry Rookie Weight Problem

Frank 32 Seattle   WA           USA    N      200
Jake  24 Geneva    Switzerland   Y     210

Desired:

Name Age BirthCity BirthCountry Rookie Weight

Frank 32 Seattle   USA           N     200
Jake  24 Geneva    Switzerland   Y     210
JayBee
  • 133
  • 1
  • 5
  • 2
    Consider asking how to fix dataframe's source instead of ad-hoc needs. Web scrape? XML/JSON migration? – Parfait Feb 17 '18 at 19:45

2 Answers2

3

One way is to first delete the 3rd (remember Python counts 0 first) column selectively, simultaneously adding an extra column NaN. Then delete the final Problem series.

# df, start with this dataframe
#
#     Name  Age BirthCity BirthCountry Rookie Weight  Problem
# 0  Frank   32   Seattle           WA    USA      N    200.0
# 1   Jake   24    Geneva  Switzerland      Y    210      NaN

def shifter(row):
    return np.hstack((np.delete(np.array(row), [3]), [np.nan]))

mask = df['Rookie'] == 'USA'
df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)

df = df.drop(['Problem'], axis=1)

#     Name  Age BirthCity BirthCountry Rookie Weight
# 0  Frank   32   Seattle          USA      N    200
# 1   Jake   24    Geneva  Switzerland      Y    210
jpp
  • 159,742
  • 34
  • 281
  • 339
2

Not so easy:

#get all rows by mask
mask = df['Rookie'] == 'USA'
c = ['BirthCountry','Rookie','Weight','Problem']
#shift columns, but necessary converting to strings
df.loc[mask, c] = df.loc[mask, c].astype(str).shift(-1, axis=1)
#converting column Weight to float and then int
df['Weight'] = df['Weight'].astype(float).astype(int)
#remove column Problem
df = df.drop('Problem', axis=1)
print (df)
    Name  Age BirthCity BirthCountry Rookie  Weight
0  Frank   32   Seattle          USA      N     200
1   Jake   24    Geneva  Switzerland      Y     210
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252