0

I am new to python and I am not familiar with regex pattern. I am using re package to get particular text in my code. but it doesn't seems to work. please help!


import re

text = '<pre><a href="1.sh">1.sh'

filename = re.match(r'\D+="[*]"\D', text)

print(text)
print(filename)

output:

<pre><a href="1.sh">1.sh
None

I am expecting the filename '1.sh', it can be either the text within double quote or the text after '>'

1.sh also in my scenario, the filename varies, it may be filename.txt or number.ps1 or number.sh
Jack
  • 1
  • 1
  • 1
    sorry, forgot to add my expected output... I am expecting the filename --> 1.sh also filename varies in my scenario, it may have *.txt text.ps1, its not always, number.sh – Jack Aug 15 '20 at 05:59
  • 1
    Please edit the question to show the expected output, rather than (only) putting it in the comments. – alani Aug 15 '20 at 06:00
  • Also please confirm that what you are trying to match is the link target rather than the link text (as they are both the same here). – alani Aug 15 '20 at 06:03
  • @alaniwi Sorry, I don't see an edit option.. I am still searching... – Jack Aug 15 '20 at 06:06
  • Under the question you should see some links (share, edit, follow, flag). – alani Aug 15 '20 at 06:08
  • @alaniwi done, Thanks – Jack Aug 15 '20 at 06:39

2 Answers2

1
import re

text = '<pre><a href="1.sh">1.sh'

filename = re.search(r'(?<=href=")[^"]+', text).group()

print(text)
print(filename)

Output:

<pre><a href="1.sh">1.sh
1.sh
Vishal Singh
  • 6,014
  • 2
  • 17
  • 33
0

Try this

import re
text='<pre><a href="1.sh">1.sh'
filename=re.sub('^<[ ]*a[ ]+.*href[ ]*=[ ]*',  '', re.sub('.*>$', '', text).strip('"')
Kuldip Chaudhari
  • 1,112
  • 4
  • 8