0

I would like to retrieve filenames of the images (img tag) from a string containing html code.

Example below: a string (html code) containing 2 img html tags. Need to get filenames (not full url) of the images.

<p>One two thre four</p>

<img src="http://localhost:5000/uploads/360e2b55a984178fd102a6cff9d70bc943936461.jpg" 
style="width: 300px; display: block; vertical-align: top; margin: 5px auto; 
text-align: center;">

<p>Five six seven</p>

<img src="http://localhost:5000/uploads/a77381fa354a067ed128bc8fe5cdfc8f85aaedea.jpg" 
style="width: 300px; display: block; vertical-align: top; margin: 5px auto; 
text-align: center;">

<p>Eight nine ten</p>

Maybe this is feasable with a regular expression but I'm not an expert.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Bronzato
  • 9,438
  • 29
  • 120
  • 212
  • 2
    Perhaps it would be better to use a html parser. – The fourth bird Jun 15 '19 at 13:10
  • Why is this marked as c# please clarify why that is the case here. Without that indication i.e. with JavaScript this would be a simple task to retrieve the `src` attribute here. – Mark Schultheiss Jun 15 '19 at 13:23
  • This is marked as C# because I need to code this stuff with C#. Needed to clarify sorry. – Bronzato Jun 15 '19 at 13:32
  • I provided a somewhat generic answer here given it is not clear how this HTML is obtained or contained which may play into a better answer. – Mark Schultheiss Jun 15 '19 at 13:45
  • Note, this HTML for the `img` element as you have it also appears to be invalid as "An img element must have an alt attribute, except under certain conditions." - be sure you meet those condition. There is a link regarding that in the explanation here in the specification: https://www.w3.org/TR/2012/WD-html-markup-20120329/img.html – Mark Schultheiss Jun 15 '19 at 13:55

4 Answers4

0

Instead of using regex I recommend html agility pack https://html-agility-pack.net/

Jake Steffen
  • 415
  • 2
  • 11
0

Use this question to get the src Regular Expression to get the SRC of images in C# THEN use this one for the base name new FileInfo(path).Name versus Path.GetFileName(path)

SO it is "kind of a duplicate" but combination of both questions code to accomplish what you want. Generally, it s not the best idea to use regex to parse HTML, so many things in the way HTML can constructed and then be placed on a page, be sure to test all your definitions.

Mark Schultheiss
  • 32,614
  • 12
  • 69
  • 100
0

you can use the following regex that capture file name with extension of image element

<img\s+.*?src=['\"]?.*\/(.*?\..{3,4})['\"]?

also you can check its matches in the following link

Ahmed Yousif
  • 2,298
  • 1
  • 11
  • 17
0

You can use this regex pattern: ([^\/]+jpg)

You can check: link