0

I have a format in a file called file.txt which has lines like: babies:n:baby flies:n:fly ladies:n:lady sheep:n:sheep furniture:n:furniture luggages:n:luggage etc. Now,i need to extract only the common pattern between f1 and f3 and want to write in the format: example:babies here, babies have common pattern till 'bab' and 'ies' is addition in the following words also. Format:<e lm="babies"><i>bab</i><par n="bab"/></e>

1 Answers1

0

your question is not clear, it would be great if you can explain more.

but, i think that you want to use regex (regular expression). here is a nice website to play with regex: https://regex101.com/

in python you can use the re module (import re).

if you have string like "babies:n:baby" , you can extract the similarity with the regex: (\w+).*:n:(\1).* which mean: (\w+) - find sequence of alphabet chars :n: - and then find :n: (\1) - and then the same word that we catch in the first ()

python sample:

for one search:

import re

pattern = r"(\w+).*:n:(\1).*"
result = re.search(pattern, word)
return result.group()

and for many searches:

import re

pattern = r"(\w+).*:n:(\1).*"
result = re.findall(pattern, word)
return result
ggcarmi
  • 458
  • 4
  • 17
  • I am making the question clearer. For example, I have a paradigm like this: babies:n:baby; ladies:n:baby; fries:n:baby. I wanted ro make in the following format: bab; lad fr similarly, I have multiple paradigms. I want a general rule to find the maximal match of words. – keerthana B. Mar 07 '19 at 05:47