I am on Windows and I am using the "Git for windows" tools in batch files. My etracted code from html site looks like this:
<a xmlns="http://www.w3.org/2000/svg" class="ZLl54 Dysyo" href="./g/git-for-windows/c/jgZ6P7bo7Fo"><div class="t17a0d"><span class="o1DPKc">[ANNOUNCE] Git for Windows 2.41.0</span></div><div class="WzoK">Dear Git users, I hereby announce that Git for Windows 2.41.0 is available from: https://</div></a>
and I want to extract /g/git-for-windows/c/jgZ6P7bo7Fo with sed or awk. The first part is always the same /g/git-for-windows/c/ but the ending of the url part differs.
What I did:
sed 's/^.*\("./g/".*"><div\").*$/\1/' text.txt | tee text2.txt
but it doesn't work.
What I want: I want to extract the upper most (always latest) link to a new release of "Git for Windows" from website https://groups.google.com/g/git-for-windows. The decription shows Announce. Here are my steps:
xidel https://groups.google.com/g/git-for-windows --printed-node-format html -e "//'Links:',//a" | tee text.txt
to get the website as text.
Then I used cat text.txt | grep -F "announce" | head -1 | tee text1.txt
.
The result is the exctracted code I posted above.
My questions: How to use sed or awk correctly to extract the link /g/git-for-windows/c/jgZ6P7bo7Fo from the code? Or how to use xidel in a better way to get better extractable results in text file.
Thank you for your help.