I have a format in a file called file.txt which has lines like:
babies:n:baby
flies:n:fly
ladies:n:lady
sheep:n:sheep
furniture:n:furniture
luggages:n:luggage
etc.
Now,i need to extract only the common pattern between f1 and f3 and want to write in the format:
example:babies
here, babies have common pattern till 'bab' and 'ies' is addition in the following words also.
Format:<e lm="babies"><i>bab</i><par n="bab"/></e>
Asked
Active
Viewed 399 times
0

keerthana B.
- 11
- 2
1 Answers
0
your question is not clear, it would be great if you can explain more.
but, i think that you want to use regex (regular expression). here is a nice website to play with regex: https://regex101.com/
in python you can use the re module (import re).
if you have string like "babies:n:baby" , you can extract the similarity with the regex: (\w+).*:n:(\1).*
which mean:
(\w+) - find sequence of alphabet chars
:n: - and then find :n:
(\1) - and then the same word that we catch in the first ()
python sample:
for one search:
import re
pattern = r"(\w+).*:n:(\1).*"
result = re.search(pattern, word)
return result.group()
and for many searches:
import re
pattern = r"(\w+).*:n:(\1).*"
result = re.findall(pattern, word)
return result

ggcarmi
- 458
- 4
- 17
-
I am making the question clearer. For example, I have a paradigm like this: babies:n:baby; ladies:n:baby; fries:n:baby. I wanted ro make in the following format:
bab ;lad fr similarly, I have multiple paradigms. I want a general rule to find the maximal match of words. – keerthana B. Mar 07 '19 at 05:47