0

So I have an html file that contains the following somewhere in the middle:

<span dir="ltr">http:(...).com</span>

I'm attempting to extract the url, but am having some issues doing so. Because that "ltr" is the only one that exists in the html, I came up with this regex:

(?<=ltr">)(.*)(?=<\/span>)

Using regex101 I confirmed that the regex expression works. However, because of how ansible deals with quotes and double quotes, I think it may be causing some issues.

I'm trying it like this:

    - set_fact:
       regex_test: " {{ htmlres.content | regex_search('(?<=ltr">)(.*)(?=<\/span>)') }}"  

Where htmlres.content is the html content received from an http get request done previously in the same playbook. However, running it:

    - set_fact:
       regex_pubdest: " {{ htmlres.content | regex_search('(?<=ltr">)(.*)(?=<\/span>)' }}"
                                                                    ^ here

Is there any way to circumvent this issue with quotes in regex in ansible? I've managed to achieve the desired output by doing something slightly different, which is this:

 shell:  grep -oP 'ltr">\K.*?(?=</span>)' /dir/htmlcontent.txt

The issue is the previous only works when reading from a file, and I'm trying to avoid saving the html.content to a file before passing a regex through it. I've tried replacing the path to the folder in the grep with "{{html.content}}", but unfortunately that causes ansible to not run correctly due to the quotes.

Any ideas?

Thank you!

Ress
  • 45
  • 1
  • 2
  • 8

0 Answers0