Bash wget filter specific word

Question

i want to filter a specific word from a website using wget.

the word i want to filter out is hPa and the value of it.

see: https://www.foreca.de/Deutschland/Berlin/Berlin

i can't find useful information on how to filter out a specific string.

this is what i've tried so far:

#!/bin/bash

LAST=$(wget -l1 https://www.foreca.de/Deutschland/Berlin/Berlin -O - | sed -e 'hPa')
echo $LAST

thanks for helping me out.

suggestion: 1) add few lines that contain `hPa` to question and show what is the expected output for those lines... 2) most likely you'll be better off using xml parsers instead of trying to solve using regex — Sundeep, Feb 23 '18 at 14:27

Gilles Quénot · Accepted Answer · 2018-02-23T17:21:51.247

1

A fully fledged solution using xpath :

Command :

$ saxon-lint --html --xpath '//div[contains(text(), "hPa")]/text()' \
    'https://www.foreca.de/Deutschland/Berlin/Berlin'

Output :

1026 hPa

Notes :

Don't parse HTML with regex, use a proper XML/HTML parser like we do here. Check: Using regular expressions with HTML tags
Check https://github.com/sputnick-dev/saxon-lint (my own project)

if what I wrote bores you and you just want a quick and dirty command even if it's evil, then use curl -s https://www.foreca.de/Deutschland/Berlin/Berlin | grep -oP '\d+\s+hPa'

edited Feb 23 '18 at 17:21

answered Feb 23 '18 at 14:42

Gilles Quénot

173,512
41
224
223

This is not _that_ reliable as other one, but it's OP/your choice ;) – Gilles Quénot Feb 23 '18 at 17:08

Bash wget filter specific word

1 Answers1

Command :

Output :

Notes :