1

I'm new here and pretty new to programming as well so please be mindful of that.

I am working on building a big database and need images to go with the data I already have in my database. I've got a sql file that looks a bit like this:

CREATE TABLE `processor` (
  `id` int(11) NOT NULL,
  `name` text NOT NULL,
  `Product Collection` text,

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `processor` (`id`, `name`, `Product Collection`) VALUES
(361, 'Intel Pentium D Processor 830 (2M Cache, 3.00 GHz, 800 MHz FSB)', 'Legacy Intel Pentium Processor'),
(362, 'Intel Pentium D Processor 840 (2M Cache, 3.20 GHz, 800 MHz FSB)', 'Legacy Intel Pentium Processor'),
(363, 'Intel Pentium D Processor 915 (4M Cache, 2.80 GHz, 800 MHz FSB)', 'Legacy Intel Pentium Processor'),

Now I need to get an image for every single row in my database. So I did some searching and started working with beautifulsoup to search something on google and download an image for it. Although the tutorial I followed wasn't using a seperate file for his search term and as I said, I'm still a newbie with python and beautifulsoup. So I was wondering if I could use my sql file and tell bs to take the name of every row I have, and use that as the keyword to search for in google images. Maybe I could use the id in a for-loop?

for i in range (1, 2642):
    id = i
    keyword = #get the keyword (processor name) that belongs to the id
    i += 1

I know that I might get images that are different from the name of the processor but that's not as big of a deal as this isn't for professional use or anything. It's for my school project so if my scraper downloads a few wrong images, it won't matter that much.

Thanks already!

EDIT:

I tried this (only relevant part of the code):

def run(query, save_directory, num_images=100):
    query = '+'.join(query.split())
    logger.info("Extracting image links")
    images = extract_images(query, num_images)
    logger.info("Downloading images")
    download_images_to_dir(images, save_directory, num_images)
    logger.info("Finished")

def main():
    parser = argparse.ArgumentParser(description='Scrape Google images')
    parser.add_argument('-s', '--search', default= myresult, type=str, help='search term')
    parser.add_argument('-n', '--num_images', default=1, type=int, help='num images to save')
    parser.add_argument('-d', '--directory', default=r'C:\xampp\htdocs\dashboard\IT\GIP\other\ImageDownloader-master\image', type=str, help='save directory')
    args = parser.parse_args()
    run(args.search, args.directory, args.num_images)

if __name__ == '__main__':
    for i in range(1, 2642):
        id = i
        habe = mysql.connector.connect(
        host="localhost",
        user="root",
        passwd="",
        database="habe"
        )
        mycursor = habe.cursor()
        mycursor.execute("SELECT name FROM processor WHERE id=1")
        myresult = mycursor.fetchone()
        i += 1
        main()

But now I'm getting an AttributeError: 'tuple' object has no attribute 'split' on the second line: query = '+'.join(query.split())

Sebastien
  • 11
  • 2

0 Answers0