0

I have a df that looks like this

country       country_abbr region year v1       v2      v3      ... v100
India         IN            IND   1999 327.0    7058.0  8856.0  ... 7900.0
Saint Lucia   LC            LCA   1990 NaN      NaN     NaN     ... NaN
Samoa         WS            WSM   1985 NaN      NaN     NaN     ... NaN
Somalia       SO            SOM   1999 136.0    643.0   678.0   ... 175.0

I'm looking for a simple way to collapse all of the vi variables into a single variable v (transform the table from wide to long) and then create a new column to hold the values of vi by country and v.

I've tried stuff like this:

col_vals = df.columns.difference(['country', 'country_abbr', 'region', 'year'])
df2 = melt(df, col_vals, 'v', 'count')
df2['count'].astype(int)

But I'm noticing a lot of my data dropping out and I'm not sure why.

Edit to add expected output:

country     year    v   count
India       1999    v1  327.0
India       1999    v2  7058.0
India       1999    v3  8856.0
Somalia     1999    v1  136.0
Somalia     1999    v2  643.0
Somalia     1999    v3  678.0
LMGagne
  • 1,636
  • 6
  • 24
  • 47
  • “combine all of the `vi` variables into a single variable `v`“ ... combine how? (sum, mean, std, string concat, etc ...) – S3DEV Sep 14 '20 at 22:23
  • @S3DEV Essentially transform the table from wide to long, collapsing the `vi` variables into a single column. – LMGagne Sep 14 '20 at 22:30
  • Got it. Thanks for the feedback. May I suggest updating the question with this detail for future readers. (Already done I see.) – S3DEV Sep 14 '20 at 22:36
  • kindly post your expected output. The wide to long part is easy : ``df.melt(["country", "country_abbr", "region", "year"], var_name="v")``. The count is better explained with an expected output – sammywemmy Sep 14 '20 at 22:42

2 Answers2

0
pd.wide_to_long(df, ["v"], i = ["country", "country_abbr", "region", "year"], j ="V").rename(columns = {'v':'Count'})

I think this is what you mean, but I'm not 100% sure, let me know if that what you meant by your count column.

Ben Pap
  • 2,549
  • 1
  • 8
  • 17
0

Is this what you have in mind?

(df.melt(["country", "country_abbr", "region", "year"], 
         var_name="v")
 .dropna()
 .sort_values("country"))

    country country_abbr    region  year    v       value
0   India   IN               IND    1999    v1      327.0
4   India   IN               IND    1999    v2      7058.0
8   India   IN               IND    1999    v3      8856.0
12  India   IN               IND    1999    v100    7900.0
3   Somalia SO               SOM    1999    v1      136.0
7   Somalia SO               SOM    1999    v2      643.0
11  Somalia SO               SOM    1999    v3      678.0
15  Somalia SO               SOM    1999    v100    175.0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31