2

I have a Pandas series with 16000 rows with some apartments description. I try to write a function that takes a string and extracts a digit number of rooms. Some lines doesn't contain any information about rooms.

line_example = "Apartment · 121m² · 4 rooms · 2 parking lots"

def rooms_digit_extraction(line):
    # extracts digit number of rooms    
        
        pattern = r"\d{1,2} room?s"
    
    try:
        
        rooms = re.findall(pattern, line) @ returns a list with rooms info if there are any['4 rooms' is case of example]
    
        digit = [int(sub.split(' ')[0]) for sub in rooms] @ extracts the digit from rooms
    
    except TypeError:
        
        pass
    
    return digit

my_pandas_series = my_pandas_series.map(lambda x: rooms_digit_extraction(x))

And then next error appears:

UnboundLocalError: local variable 'digit' referenced before assignment

What's wrong with my function? Any help will be really appreciated!

Thank you!

audiotec
  • 121
  • 1
  • 10

1 Answers1

1

You may use

my_pandas_series.str.extract(r'(\d+)\s*rooms?\b')

See the regex demo.

The .str.extract method searches for a regex match in the input string and returns the value captured with a capturing group.

  • (\d+) - Capturing group 1: one or more digits
  • \s* - 0+ whitespaces
  • rooms? - room or rooms
  • \b - word boundary.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563