2

I've been searching for a ling time, and have not been able to find a working answer for my problem.

I have a line from an HTML file extracted with sed '162!d' skinlist.html, which contains the text

<a href="/skin/dwarf-red-beard-734/" title="Dwarf Red Beard">.

I want to extract the text Dwarf Red Beard, but that text is modular (can be changed), so I would like to extract the text between title=" and ".

I cannot, for the life of me, figure out how to do this.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
JonaK
  • 129
  • 1
  • 8

5 Answers5

2
awk 'NR==162 {print $4}' FS='"' skinlist.html
  • set field separator to "
  • print only line 162
  • print field 4
Zombo
  • 1
  • 62
  • 391
  • 407
1

Solution in sed

sed -n '162 s/^.*title="\(.*\)".*$/\1/p' skinlist.html

Extracts line 162 in skinlist.html and captures the title attributes contents in\1.

koola
  • 1,616
  • 1
  • 13
  • 15
0

The shell's variable expansion syntax allows you to trim prefixes and suffixes from a string:

line="$(sed '162!d' skinlist.html)"   # extract the relevant line from the file
temp="${line#* title=\"}"    # remove from the beginning through the first match of ' title="'
if [ "$temp" = "$line" ]; then
    echo "title not found in '$line'" >&2
else
    title="${temp%%\"*}"   # remote from the first '"' through the end
fi
Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
0

You can pass it through another sed or add expressions to that sed like -e 's/.*title="//g' -e 's/">.*$//g'

abasu
  • 2,454
  • 19
  • 22
0

also sed

sed -n '162 s/.*"\([a-zA-Z ]*\)"./\1/p' skinlist.html
Endoro
  • 37,015
  • 8
  • 50
  • 63