How to replace and insert a new substring in python?

Question

This is a working code and mabybe not very effcient code to replace a substring with another substring previously modified

Input string :

text = ["part1 Pirates (2006)",
        "part2 Pirates (2006)"
]

Output string:

 Pirates PT1 (2006)

 Pirates PT2 (2006)

It has to replace substring like 'part1' 'part2 an so on , with 'PT' and copy it between title and year substring Code:

#'''''''''''''''''''''''''
# are there parenthesis?
# 
def parenth(stringa):
   count = 0
  for i in stringa:
     if i == "(":
        count += 1
     elif i == ")":
        count -= 1
     if count < 0:
        return False
  return count == 0 


#'''''''''''''''''''''''''
# extract 'year' from 
# the string
# 
def getYear(stringa):
     if parenth(stringa) is True:
      return stringa[stringa.find("(")+1:stringa.find(")")]



#Start
for title in text:

  #Does the year exist ? try to Get it ---------> '2006'
  yearStr = getYear(title) 

  #Get integer next to 'part' substring  -------> '1'
  intPartStr = re.findall(r'part(\d+)', title)

  #Delete 'part' Substring  --------------------> 'Pirates (2006)
  partStr = re.sub(r'part(\d+)',"",title)

  #Build a new string  -------------------------> "PT1 (2006)"  
  newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"

  #Update title with new String  newStr --------> "Pirates PT1 (2006)"
  result = re.sub(r'\(([0-9]+)\)',newStr,partStr)

  #End
print (result)

but when the list is like this

text = ["pt1 Pirates (2006)",
        "part 2 Pirates (2006)"
]

I dont know how to extract the integer next to 'part' , 'pt' or 'part 2' and so on

EDIT:

I assumed this string was the same , but it doesn't, sry

How to solve ?

"part 2 the day sports stood still (2021)"

\w+ doesn't grab all the words

How about extract last 5 digits of string. i.e `strName[-1:-6]` will give you (2006) — Ahmad Anis, Apr 03 '21 at 17:27
I mean the integer next to 'part' that's in this case : 1 and 2 — , Apr 03 '21 at 17:29

thibsc · Accepted Answer · 2021-04-03T20:15:07.380

You can do all the substitution at the same time:

import re

text = [
    "part1 Pirates (2006)",
    "part2 Pirates (2006)",
    "pt1 Pirates (2006)",
    "part 2 Pirates (2006)",
    "part 1 The day sports stood still (2021)"
]

pattern = r'(?:part|pt)\s?(\d+)\s?(\b[\w\s]+\b)\s?\((\d+)\)'
substitute = r'\2 PT\1 (\3)'

for title in text:
    title = re.sub(pattern, substitute, title)

# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]

Regex explanation:

(?:part|pt)\s?(\d+) ignore text and capture the value (group 1)
(\b[\w\s]+\b) capture the title (group 2)
\((\d+)\) capture the year in parenthesis (group 3)
'\2 PT\1 (\3)' recreate your string with group number

good solution. OP wants the list to be modified. so would prefer if you can expand the solution to address that. — Joe Ferndz, Apr 03 '21 at 17:47

marcos · Answer 2 · 2021-04-03T21:35:41.320

2

You can simply do a replacement with groups, for example, we extract the part in group 1, the movie name in group 2, and the year in group 3, for example:

import re

movies = [
    "part1 Pirates (2006)",
    "part2 Pirates (2006)",
    "part 3 Pirates (2006)",
]


pattern = r"^([a-zA-Z]+\s?)(\d) (\w+) (\(\d{4}\))$"
replacement = r"\3 PT\2 \4"


replaced = [
    re.sub(pattern, replacement, movie) for movie in movies
]

print(replaced)
>>> ['Pirates PT1 (2006)', 'Pirates PT2 (2006)', 'Pirates PT3 (2006)']

edited Apr 03 '21 at 21:35

answered Apr 03 '21 at 17:34

marcos

4,473
1
10
24

Hi Marcos your code is easy to read but you forgot this case :'part 1 Pirates (2006)' with space it doesn't work. may be I need to add \s – Apr 03 '21 at 18:08
1

ohh, right my bad, i didn't see that. check it now – marcos Apr 03 '21 at 19:23
Yes, but I just realized your solution and the others solution don't work . I didn't think that this string was a bit different "part 2 the day sports stood still (2021)", maybe it's a bit hard to solve. see edited questions please – Apr 03 '21 at 20:10

score 1 · Answer 3 · answered Apr 03 '21 at 17:31

Using Regex:

Ex:

import re


text = ["part1 Pirates (2006)", "part2 Pirates (2006)", "pt1 Pirates (2006)","part 2 Pirates (2006)" ]

ptrn = re.compile(r"(part|pt)\s*(\d+)")
for i in text:
  m = ptrn.match(i)
  if m:
    # print(m.group(2))   # Integer part. 
    nstring = ptrn.sub(f"PT {m.group(2)}", i)
    print(nstring)

How to replace and insert a new substring in python?

3 Answers3