2

i want to filter a specific word from a website using wget.

the word i want to filter out is hPa and the value of it.

see: https://www.foreca.de/Deutschland/Berlin/Berlin

i can't find useful information on how to filter out a specific string.

this is what i've tried so far:

#!/bin/bash

LAST=$(wget -l1 https://www.foreca.de/Deutschland/Berlin/Berlin -O - | sed -e 'hPa')
echo $LAST

thanks for helping me out.

David
  • 1,084
  • 12
  • 36
  • 1
    suggestion: 1) add few lines that contain `hPa` to question and show what is the expected output for those lines... 2) most likely you'll be better off using xml parsers instead of trying to solve using regex – Sundeep Feb 23 '18 at 14:27
  • post the final expected value of `$LAST` – RomanPerekhrest Feb 23 '18 at 14:27

1 Answers1

1

A fully fledged solution using :

Command :

$ saxon-lint --html --xpath '//div[contains(text(), "hPa")]/text()' \
    'https://www.foreca.de/Deutschland/Berlin/Berlin'

Output :

1026 hPa

Notes :


enter image description here if what I wrote bores you and you just want a quick and dirty command even if it's evil, then use curl -s https://www.foreca.de/Deutschland/Berlin/Berlin | grep -oP '\d+\s+hPa'

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223