2

This is a working code and mabybe not very effcient code to replace a substring with another substring previously modified

Input string :

text = ["part1 Pirates (2006)",
        "part2 Pirates (2006)"
]

Output string:

 Pirates PT1 (2006)

 Pirates PT2 (2006)

It has to replace substring like 'part1' 'part2 an so on , with 'PT' and copy it between title and year substring Code:

#'''''''''''''''''''''''''
# are there parenthesis?
# 
def parenth(stringa):
   count = 0
  for i in stringa:
     if i == "(":
        count += 1
     elif i == ")":
        count -= 1
     if count < 0:
        return False
  return count == 0 


#'''''''''''''''''''''''''
# extract 'year' from 
# the string
# 
def getYear(stringa):
     if parenth(stringa) is True:
      return stringa[stringa.find("(")+1:stringa.find(")")]



#Start
for title in text:

  #Does the year exist ? try to Get it ---------> '2006'
  yearStr = getYear(title) 

  #Get integer next to 'part' substring  -------> '1'
  intPartStr = re.findall(r'part(\d+)', title)

  #Delete 'part' Substring  --------------------> 'Pirates (2006)
  partStr = re.sub(r'part(\d+)',"",title)

  #Build a new string  -------------------------> "PT1 (2006)"  
  newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"

  #Update title with new String  newStr --------> "Pirates PT1 (2006)"
  result = re.sub(r'\(([0-9]+)\)',newStr,partStr)

  #End
print (result)

but when the list is like this

text = ["pt1 Pirates (2006)",
        "part 2 Pirates (2006)"
]

I dont know how to extract the integer next to 'part' , 'pt' or 'part 2' and so on

EDIT:

I assumed this string was the same , but it doesn't, sry

How to solve ?

"part 2 the day sports stood still (2021)"

\w+ doesn't grab all the words

3 Answers3

4

You can do all the substitution at the same time:

import re

text = [
    "part1 Pirates (2006)",
    "part2 Pirates (2006)",
    "pt1 Pirates (2006)",
    "part 2 Pirates (2006)",
    "part 1 The day sports stood still (2021)"
]

pattern = r'(?:part|pt)\s?(\d+)\s?(\b[\w\s]+\b)\s?\((\d+)\)'
substitute = r'\2 PT\1 (\3)'

for title in text:
    title = re.sub(pattern, substitute, title)

# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]

Regex explanation:

  • (?:part|pt)\s?(\d+) ignore text and capture the value (group 1)
  • (\b[\w\s]+\b) capture the title (group 2)
  • \((\d+)\) capture the year in parenthesis (group 3)
  • '\2 PT\1 (\3)' recreate your string with group number
thibsc
  • 3,747
  • 2
  • 18
  • 38
  • good solution. OP wants the list to be modified. so would prefer if you can expand the solution to address that. – Joe Ferndz Apr 03 '21 at 17:47
2

You can simply do a replacement with groups, for example, we extract the part in group 1, the movie name in group 2, and the year in group 3, for example:

import re

movies = [
    "part1 Pirates (2006)",
    "part2 Pirates (2006)",
    "part 3 Pirates (2006)",
]


pattern = r"^([a-zA-Z]+\s?)(\d) (\w+) (\(\d{4}\))$"
replacement = r"\3 PT\2 \4"


replaced = [
    re.sub(pattern, replacement, movie) for movie in movies
]

print(replaced)
>>> ['Pirates PT1 (2006)', 'Pirates PT2 (2006)', 'Pirates PT3 (2006)']
marcos
  • 4,473
  • 1
  • 10
  • 24
  • Hi Marcos your code is easy to read but you forgot this case :'part 1 Pirates (2006)' with space it doesn't work. may be I need to add \s –  Apr 03 '21 at 18:08
  • 1
    ohh, right my bad, i didn't see that. check it now – marcos Apr 03 '21 at 19:23
  • Yes, but I just realized your solution and the others solution don't work . I didn't think that this string was a bit different "part 2 the day sports stood still (2021)", maybe it's a bit hard to solve. see edited questions please –  Apr 03 '21 at 20:10
1

Using Regex:

Ex:

import re


text = ["part1 Pirates (2006)", "part2 Pirates (2006)", "pt1 Pirates (2006)","part 2 Pirates (2006)" ]

ptrn = re.compile(r"(part|pt)\s*(\d+)")
for i in text:
  m = ptrn.match(i)
  if m:
    # print(m.group(2))   # Integer part. 
    nstring = ptrn.sub(f"PT {m.group(2)}", i)
    print(nstring)
Rakesh
  • 81,458
  • 17
  • 76
  • 113