How to get simple text from HTML page with goquery?

Question

I am new to Go. I am using goquery to extract data from an HTML page. But the problem is the data I am looking for is not bounded by any HTML tag. It is simple text after a <br> tag. How can I extract it?

Edit : Here is HTML code.

<div class="container">
    <div class="row">
      <div class="col-lg-8">
        <p align="justify"><b>Name</b>Priyaka</p>
        <p align="justify"><b>Surname</b>Patil</p>
        <p align="justify"><b>Adress</b><br>India,Kolhapur</p>
        <p align="justify"><b>Hobbies&nbsp;</b><br>Playing</p>
        <p align="justify"><b>Eduction</b><br>12th</p>
        <p align="justify"><b>School</b><br>New Highschool</p>
       </div>
    </div>
</div>

From this I want "Priyanka" and "12th".

When it is not to be taken (Contraindications):
Contraindicated in patients with severe liver impairment, and hypersensitivity. — Priyanka, Jul 20 '15 at 10:48

siongui · Answer 1 · 2016-04-17T14:04:12.363

3

The following is what you want:

doc.Find(".container").Find("[align=\"justify\"]").Each(func(_ int, s *goquery.Selection) {
    prefix := s.Find("b").Text()
    result := strings.TrimPrefix(s.Text(), prefix)
    println(result)
})

import strings in front of your code. If you need complete code example, check here.

edited Apr 17 '16 at 14:04

answered Apr 17 '16 at 13:07

siongui

101
7

score 0 · Answer 2 · answered Jul 21 '15 at 08:33

0

Try query for
and get its siblings

http://godoc.org/github.com/PuerkitoBio/goquery#Selection.Siblings

answered Jul 21 '15 at 08:33

Tao Wen

21
6

How to get simple text from HTML page with goquery?

2 Answers2