i using mechanize & python crawl website , data. far able submit form , content page. unable trigger click on "next page" link , data. code follows:
import re import mechanize bs4 import beautifulsoup br = mechanize.browser() br.set_handle_robots(false) br.open("http://portal.uspto.gov/employeesearch/") br.select_form(name="searchemployeedatabean") br.form['name'] = 'a' response = br.submit() soup = beautifulsoup(response) table = soup.find_all('table')[16] rows = table.find_all('tr') data = [[td.findchildren(text=true) td in tr.findall("td")] tr in rows] in data: if a: examiner = " ".join(a[0][1].split()) phone = a[1][1] extension_office = a[3][1] office_description = "|".join(re.findall(r'\d+', a[4][1])) # print(examiner, phone, extension_office, office_description)
now on results page there button having text "next page >>". tried clicking using following code:
button html:
<a onclick="javascript:gotopage('currentpage', '3')" href="#">next page >></a>
python code:
req = br.click_link(text_regex='next page >>') r2 = br.open(req) r2soup = beautifulsoup(r2)
but no success.
please me how click on next button , data there till no next page there.
i found problem in mechanize doesn't support javascript. whenever mechanize reach page after submit, javascript not working due pagination click not triggered. have achieved want using selenium. , beautiful soup using following selenium selector:
elem1 = driver.find_element_by_link_text("next page >>") elem1.click()