0

<div class="island biz-owner-reply clearfix">

    <div class="biz-owner-reply-header arrange arrange--6">
        <div class="arrange_unit biz-owner-reply-photo">
            <div class="photo-box pb-30s">
                <a href="https://s3-media1.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/o.jpg">
                    <img alt="Beckie F." class="photo-box-img" height="30" src="https://s3-media4.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/30s.jpg" srcset="https://s3-media4.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/90s.jpg 3.00x,https://s3-media4.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/ss.jpg 1.33x" width="30">
                </a>
            </div>
        </div>
        <div class="arrange_unit arrange_unit--fill embossed-text-white">
            <strong>
                Comment from Beckie F. of Yard House
            </strong>
            <br>
            Business Customer Service
        </div>
    </div>
    <span class="bullet-after">4/4/2018</span>

    Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!

    <div class="review-footer clearfix"></div>
</div>

I'm trying to get the class biz-owner-reply's value with selenium and python. I first find the element and then try to get its value as below:

response = ""
responses = review_wrappers[0].find_elements_by_class_name("biz-owner-reply")
if len(responses) > 0:
    response = responses[0].text

However, the result also contains the values from its child elements:

'response':'Comment from Beckie F. of Yard House\nBusiness Customer Service\n4/4/2018 Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!'

How can I get only:

Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!
bleepmeh
  • 967
  • 5
  • 17
  • 36
  • You hope to only get `Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!`, is right? – yong Apr 13 '18 at 23:29
  • 2
    What is your desired output : 1. Comment from Beckie F. of Yard House 2. Business Customer Service 3. Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome! – cruisepandey Apr 14 '18 at 07:14
  • Which is/are the text(s) you are trying to retrieve? Does your usecase have a constraint to use only `biz-owner-reply` class? Or you can use any other class as well? – undetected Selenium Apr 14 '18 at 07:46
  • Here's an answer in javascript. You could do something similar in python: https://stackoverflow.com/questions/8505375/getting-text-from-a-node – Mark Lapierre Apr 14 '18 at 11:27

2 Answers2

1

Because selenium can't return TextNode, only ElementNode. We need javascript's assistance to use HTML DOM API to archive your goal.

script = """
    return Array.from(arguments[0].childNodes)
        .filter(function(node){return node.nodeType === 3;})
        .map(function(node){return node.nodeValue;})
        .join('');
"""
// childNodes get all child node of parent
// nodeType === 3, means it's a TextNode, like text inside html Tag
// nodeType === 1, means it's a ElementNode, like html tag
// nodetype === 2, means it's a AttributeNode, like attribute of html tag 

ele = driver.find_element_by_css_selector("div.biz-owner-reply");

txt = driver.execute_script(script, ele)

More detail about HTML DOM Node

More detail about HTML DOM NodeList

yong
  • 13,357
  • 1
  • 16
  • 27
0

It seems a bit unclear. Yong and I think the same. So far you have to recall just the core text of your message, your answer include all the reply from your visitor.

If you have only 3 table in your sql for instance:

id, date, text

and you want to pull just the text like you are actually doing... You will get all the text.

If you want to pull just the comment, I reckon you need to have :

A sql or xml file with #core_message

responses = $core_message

I would need more information, but this is the idea of calling just a single element rather than all the information...

Arsenil98
  • 16
  • 6