I am trying to automate data uploads to a private website using python mechanize.
I successfully login and navigate to the uploading page, which offers 3 possible ways of providing data: database connection (source-sql), file upload (source-file), or remote hosted csv (source-url, which is the one I need).
I successfully navigated to that page with python-mechanize, and also modified the uploading form controls (sourceType
, sourceName
, url
and add
) needed for my data uploading.
When using a Chrome browser, I submit those data by clicking the add
button (value "Add"), and the page navigates to the target script ('addsource.do').
I have already tried the same step with Chrome javascript disabled, and it works (the browser arrives to the target script and it shows the submitted data: looks like javascript is not needed in form submission step).
So I guess my current situation is similar to this example in python-mechanize github examples/forms/example.py:
request2 = f.click() # mechanize.Request object
try:
response2 = mechanize.urlopen(request2)
except mechanize.HTTPError as response2:
pass
Those lines of code are very similar to the end part of my code, which raises an error:
br = mechanize.Browser()
# ... many lines of code (producing and filling in form contents) ...
print("before submit: ", br.geturl())
# OUTPUT CURRENT SELECTED FORM CONTROLS AND VALUES:
f = br.form
print(f)
form_info = (" -- Form name: {}\n" \
+" -- Form action: {}\n" \
+" -- Form attrs: {}") \
.format(f.name,f.action,f.attrs)
print (form_info)
# QUESTION 1: How should I now submit this form?
# f.submit()
# ... or ...
# f.click(name="add", type="submit")
# I tried the 2nd option, and then the example above:
myrequest = f.click(name="add", type="submit")
# QUESTION 2: how to print out the 'action' submitted within myrequest ???
try:
response = mechanize.urlopen(myrequest)
except mechanize.HTTPError as response:
print("EXCEPTION:", response)
This is the output of my code (server name changed to example.com):
before submit: https://example.com/manage/resource.do?r=test-occ
<post https://example.com/manage/addsource.do multipart/form-data
<HiddenControl(r=test-occ)>
<HiddenControl(validate=false)>
<SelectControl(sourceType=[(), source-sql, source-file, *source-url])>
<FileControl(file=<No files added>)>
<TextControl(sourceName=AUTO_test-occ_occ_url)>
<TextControl(url=https://another.example.com/datasets/datasource.csv)>
<SubmitControl(add=Add)>
<SubmitControl(clear=Clear)>>
-- Form name: None
-- Form action: https://example.com/manage/addsource.do
-- Form attrs: {'action': 'addsource.do', 'method': 'post', 'enctype': 'multipart/form-data'}
EXCEPTION: HTTP Error 404: Not Found
Although form.attrs['action']
value 'addsource.do' looks correct to me, 'HTTP Error 404: Not Found' suggests the form action was targeted to a wrong url (?). Or perhaps I am submitting the form the wrong way.
So my questions:
Which is the proper way to submit this form? I am a bit confused between these two options:
br.submit()
or
br.form.click(name="add",type="submit")
If I choose the 2nd option: is there any way of checking the 'action' actually submitted within
myrequest
, before entering the try-catch code?
As for the aforementioned example,myrequest
is amechanize.Request
object.
There I see methods like get_data(), get_header() or get_method() from the request ... but no way to get_action(). Is there a way to do that?
Thanks
EDIT: this is my form upload html code
<form action="addsource.do" method="post" enctype="multipart/form-data">
<input name="r" type="hidden" value="test-occ">
<input name="validate" type="hidden" value="false">
<select id="sourceType" name="sourceType" class="form-select form-select-sm my-1">
<option value="" disabled="" selected="">Select source type</option>
<option value="source-sql">Database</option>
<option value="source-file">File</option>
<option value="source-url">URL</option>
</select>
<div class="row">
<div class="col-12">
<input type="file" name="file" id="file" class="form-control form-control-sm my-1" style="display: none;">
<input type="text" id="sourceName" name="sourceName" class="form-control form-control-sm my-1" placeholder="Source Name" style="">
<input type="url" id="url" name="url" class="form-control form-control-sm my-1" placeholder="URL" style="">
</div>
<div class="col-12">
<input type="submit" value="Add" id="add" name="add" class="btn btn-sm btn-outline-info-primary my-1" style="">
<input type="submit" value="Clear" id="clear" name="clear" class="btn btn-sm btn-outline-secondary my-1" style="">
</div>
</div>
</form>