0

can anyone is help me with that:

I need to find all words from list containing letters [t OR d] AND [k OR c] but not any of [s,z,n,m]

I figured out first part, but don't know how to include stop list:

\w*[t|d]\w*[k|c]\w*

in Python notation

Thank you in advance

AKarpun
  • 321
  • 2
  • 6
  • 14

7 Answers7

2

You can use 2 steps. First find t|d AND k|c, then filter out matches with unwanted letters.

Since you said you figured out first part, here is the second:

matches = [i for i in matches if not re.search(r'[sznm]', i)]    
print(matches) 
user
  • 5,370
  • 8
  • 47
  • 75
1

If you need the t or d appearing before k or c, use : [^sznm\s\d]*[td][^sznm\s\d]*[kc][^sznm\s\d]*.

[^sznm\s\d] means any character except z, n, m, s, whitespace characters (\s) or numbers (\d).

Math
  • 2,399
  • 2
  • 20
  • 22
1
s = "foobar foo".split()

allowed = ({"k", "c"}, {"r", "d"})
forbid = {"s","c","z","m"}

for word in s:
    if all(any(k in st for k in word) for st in allowed) and all(k not in forbid for k in word):
        print(word)

Or using a list comp with set.intersection:

words = [word for word in s if all(st.intersection(word) for st in allowed) and not denied.intersection(word)]
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
1

Based on answer of Padraic

EDIT We both missed this condition

[t OR d] AND [k OR c]

So - fixed accordingly

s = "detected dot knight track"

allowed = ({"t","d"},{"k","c"})
forbidden = {"s","z","n", "m"}

for word in s.split():
    letter_set = set(word)
    if all(letter_set & a for a in allowed) and letter_set - forbidden == letter_set:
        print(word)

And the result is

detected
track
Community
  • 1
  • 1
volcano
  • 3,578
  • 21
  • 28
0

Use this code:

import re
re.findall('[abcdefghijklopqrtuvwxy]*[td][abcdefghijklopqrtuvwxy]*[kc][abcdefghijklopqrtuvwxy]*', text)
M.javid
  • 6,387
  • 3
  • 41
  • 56
0

I really like the answer by @padraic-cunningham that does not make use of re, but here is a pattern, which will work:

pattern = r'(?!\w*[sznm])(?=\w*[td])(?=\w*[kc])\w*'

Positive (?=...) and negative (?!...) lookahead assertions are well documented on python.org.

tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108
0

You need to use lookarounds.

^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$

ie,

>>> l = ['fooktz', 'foocdm', 'foobar', 'kbard']
>>> [i for i in l if re.match(r'^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$', i)]
['kbard']
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274