-2

I'm trying to find certain keywords in the html source code of multiple websites. I want my crawler to find these keywords regardless whether they are written uppercased, or lowercased in the website's html source code. To get this done I've tried using the .lower() query in this script:

from selenium import webdriver
import csv

def keywords():
    with open('urls.csv') as csv_file:
        csv_reader = csv.reader(csv_file)

        driver = webdriver.Chrome(executable_path=r'C:\Users\Peter\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')

        list_1 = ['keyword 1', 'keyword 2', 'keyword 3']
        list_2 = ['keyword 4', 'keyword 5', 'keyword 6']
        list_3 = ['keyword 7', 'keyword 8']

        keywords = [list_1, list_2, list_3]

        for row in csv_reader:
            driver.get(row[0])
            html = driver.page_source

            for searchstring in keywords:
                if searchstring.lower() in html.lower():
                    print (row[0], searchstring, 'found')

                else:
                    print (row[0], searchstring, 'not found')

print keywords()

Error:

AttributeError: 'list' object has no attribute 'lower'

So i found out that .lower() doesn't work on lists, works only with strings.

I've googled the error and my issue but didn't found a solution to my problem. Any suggestion how i can solve this with my current script?

jakeT888
  • 123
  • 2
  • 17
  • I assume .lower() turns the string it's used on lowercase; If so, you should run it on every item in the list..You can easily use a for loop for that – Amedeo Baragiola Mar 12 '17 at 15:07

2 Answers2

1

You can make your keywords as a list of strings in list of list of strings. Here i am already lowering the keywords.

from selenium import webdriver
import csv

def keywords():
    with open('urls.csv') as csv_file:
        csv_reader = csv.reader(csv_file)

        driver = webdriver.Chrome(executable_path=r'C:\Users\Peter\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')

        list_1 = ['keyword 1', 'keyword 2', 'keyword 3']
        list_2 = ['keyword 4', 'keyword 5', 'keyword 6']
        list_3 = ['keyword 7', 'keyword 8']

        lower_list = lambda x: x.lower()
        keywords = map(lower_list, list_1 + list_2 + list_3)

        for row in csv_reader:
            driver.get(row[0])
            html = driver.page_source
Bipul Jain
  • 4,523
  • 3
  • 23
  • 26
  • Why not simply `keywords = map(lower_list, list_1 + list_2 + list_3)`? Also it's not clear whether OP is using Python 2 or 3(they're using `print()` as function), hence it is always better to use a list-comprehension(it's fast compared to map with lambda). `map` would exhaust after first iteration in Python 3. – Ashwini Chaudhary Mar 12 '17 at 15:16
1

You can use the map function, like so

l = ['Item 1', 'ITEM 2', 'ITEM 3', 'ItEM 4']
m = map(str.lower, l)
print(list(m))

This gets you ['item 1', 'item 2', 'item 3', 'item 4']

map applies a function to every element of an iterable and returns a map object, which is itself an iterable. You can just do map(str.lower, keywords) in your for searchstring in map(str.lower, keywords)

EDIT: Oops, didn't notice that you wanted to combine three lists that way. You could flatten the lists with [item.lower() for sublist in keywords for item in sublist] and get the results you want.