I'm new here and pretty new to programming as well so please be mindful of that.
I am working on building a big database and need images to go with the data I already have in my database. I've got a sql file that looks a bit like this:
CREATE TABLE `processor` (
`id` int(11) NOT NULL,
`name` text NOT NULL,
`Product Collection` text,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `processor` (`id`, `name`, `Product Collection`) VALUES
(361, 'Intel Pentium D Processor 830 (2M Cache, 3.00 GHz, 800 MHz FSB)', 'Legacy Intel Pentium Processor'),
(362, 'Intel Pentium D Processor 840 (2M Cache, 3.20 GHz, 800 MHz FSB)', 'Legacy Intel Pentium Processor'),
(363, 'Intel Pentium D Processor 915 (4M Cache, 2.80 GHz, 800 MHz FSB)', 'Legacy Intel Pentium Processor'),
Now I need to get an image for every single row in my database. So I did some searching and started working with beautifulsoup to search something on google and download an image for it. Although the tutorial I followed wasn't using a seperate file for his search term and as I said, I'm still a newbie with python and beautifulsoup. So I was wondering if I could use my sql file and tell bs to take the name of every row I have, and use that as the keyword to search for in google images. Maybe I could use the id in a for-loop?
for i in range (1, 2642):
id = i
keyword = #get the keyword (processor name) that belongs to the id
i += 1
I know that I might get images that are different from the name of the processor but that's not as big of a deal as this isn't for professional use or anything. It's for my school project so if my scraper downloads a few wrong images, it won't matter that much.
Thanks already!
EDIT:
I tried this (only relevant part of the code):
def run(query, save_directory, num_images=100):
query = '+'.join(query.split())
logger.info("Extracting image links")
images = extract_images(query, num_images)
logger.info("Downloading images")
download_images_to_dir(images, save_directory, num_images)
logger.info("Finished")
def main():
parser = argparse.ArgumentParser(description='Scrape Google images')
parser.add_argument('-s', '--search', default= myresult, type=str, help='search term')
parser.add_argument('-n', '--num_images', default=1, type=int, help='num images to save')
parser.add_argument('-d', '--directory', default=r'C:\xampp\htdocs\dashboard\IT\GIP\other\ImageDownloader-master\image', type=str, help='save directory')
args = parser.parse_args()
run(args.search, args.directory, args.num_images)
if __name__ == '__main__':
for i in range(1, 2642):
id = i
habe = mysql.connector.connect(
host="localhost",
user="root",
passwd="",
database="habe"
)
mycursor = habe.cursor()
mycursor.execute("SELECT name FROM processor WHERE id=1")
myresult = mycursor.fetchone()
i += 1
main()
But now I'm getting an AttributeError: 'tuple' object has no attribute 'split' on the second line: query = '+'.join(query.split())