Currently I am working on a Web Scraping project for automating manual tasks at work. I have setup something that gets me to where I need on the website, but now I am trying to parse the data that is represented on the page.
I am trying to get all the labels and the following form value. The tricky part is that the way the form value is represented varies between a Checkbox, Input, Dropdown, and TextArea. Website is running on Angular so there are a lot of ng-xxx
tags.
There are 100+ forms I am trying to scrape, but not all of them have the same elements visible. It varies for each form on what labels will be shown. Hence why I am trying to automate looping through the labels and get their corresponding input value.
I get all my labels via: driver.find_elements(By.XPATH, "//div[@class='panel-body']/form//label"
I want to try and loop through each label and get the corresponding value associated with it, but am running into issues with HOW to do such.
The paths to the different types may look like:
x_inputType1 = ".//following::input[@type='text']"
x_inputType2 = ".//following::textarea[not(@class = 'ace_text-input')]"
x_inputType3 = ".//following::div//div[@class='ace_content']"
I tried the following, but something I was running into was that the element would grab the first element it finds and use that as the output for inputValue
.
# Example Input Types
x_inputType1 = ".//following::input[@type='text']"
x_inputType2 = ".//following::textarea[not(@class = 'ace_text-input')]"
x_inputType3 = ".//following::div//div[@class='ace_content']"
# Labels
labels = driver.find_elements(By.XPATH, "//div[@class='panel-body']/form//label")
# Loop through each label and get the input value following.
for label in labels:
labelName = label.text
inputValue = label.find_element(By.XPATH, f"{x_inputType1} | {x_inputType2} | {x_inputType3}")
print(labelName, inputValue)
Maybe I am approaching this wrong?
The expected result is basically: FOR EACH LABEL GET THEIR CORRESPONDING INPUT THAT FOLLOWS