0

I have dataframe:

df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})

Question: To find values where value has recurrent sequence from 7 and above numbers

number repeat
1111112357896 0
45226212354444 0
150000000064 1
5485329999999 1
4589622567431 0
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53

2 Answers2

1

Use a regex with str.contains:

df1['repeat'] = df1['number'].str.contains(r'(\d)\1{6}').astype(int)

Regex:

(\d)     # match and capture a digit
\1{6}    # match the captured digit 6 more times

Output:


           number  repeat
0   1111112357896       0
1  45226212354444       0
2    150000000064       1
3   5485329999999       1
4   4589622567431       0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • *NB. You can get a `UserWarning` but it's safe to ignore.* – mozway Dec 14 '22 at 05:34
  • 1
    To remove that warning, one can use extract instead of contains `np.where(df1['number'].str.extract(r'(\d)\1{6}').isnull(), 0, 1)` :) – Pawan Jain Dec 14 '22 at 05:36
  • @PawanJain yes but I dislike the idea of extracting something to discard it immediately after. `str.contains` should be more efficient. But it's a nice workaround. – mozway Dec 14 '22 at 05:39
  • 1
    Obviously, `contains` is far better for this. I just added that if someone is annoyed with warnings like me, xD – Pawan Jain Dec 14 '22 at 05:47
0

Here's an approach:

def find_repeats(numbers, cutoff=7):
    repeated_numbers = []
    curr_n = None
    count = 0
    for n in str(numbers):
        if n == curr_n:
            count += 1
            continue
            
        if count >= cutoff:
            repeated_numbers.append(curr_n)
        curr_n = n
        count = 1

    # check the end of the string as well
    if count >= cutoff:
        repeated_numbers.append(curr_n)
        
    return len(repeated_numbers)

df1 = pd.DataFrame({'number': ['1111112357896', '45226212354444', '150000000064', '5485329999999', '4589622567431']})
df1['repeat'] = df1.number.apply(find_repeats)
Jeff
  • 29
  • 3