1

I was reshaping a DataFrame from wide to long format. However I get different results in two cases which should be identical - see below

import pandas as pd
import numpy as np

d_test = pd.DataFrame({"id": [1,2,3,5], "q1": [1,4,4,2], "q2": [4,np.nan,9,0]}, index=["a","b","c","d"])

# Gives an empty DataFrame as result
pd.wide_to_long(d_test,stubnames=["q"], suffix=r"\\d+",  i="id", j="time") 

# This works:
pd.wide_to_long(d_test,stubnames=["q"],  i="id", j="time") 

Note that both lines are the same: In the documentation you can see that the default value for the suffix argument is identical to the one I specified explicitly.

Can someone help me in understanding what went wrong here?

P.Jo
  • 532
  • 3
  • 9

1 Answers1

2

You incorrectly escape your regex: r"\\d+" means a literal \ followed by one or many d.

Note that the wide_to_long documentation uses '\\d+', not r'\\d+'.

pd.wide_to_long(d_test, stubnames=['q'], suffix=r'\d+', i='id', j='time')

# or
pd.wide_to_long(d_test, stubnames=['q'], suffix='\\d+', i='id', j='time')

Output:

           q
id time     
1  1     1.0
2  1     4.0
3  1     4.0
5  1     2.0
1  2     4.0
2  2     NaN
3  2     9.0
5  2     0.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • You are correct - thank you! The weird thing though is that when I glimpse with the cursor over the function in my IDE (PyCharm) it says the default value is r"\\d+". How does this happen? – P.Jo Jun 16 '23 at 08:17
  • I don't know, not using PyCharm, this might be a bug? – mozway Jun 16 '23 at 08:19