Use Beautifulsoup To Loop Through And Retrieve Specific Urls
I want to use BeautifulSoup and retrieve specific URLs at specific position repeatedly. You may imagine that there are 4 different URL lists each containing 100 different URL links
Solution 1:
This is a good problem to use recursion. Try to call a recursive function to do this:
def retrieve_urls_recur(url, position, index, deepness):
if index >= deepness:
return True
else:
plain_text = requests.get(url)
soup = BeautifulSoup(plain_text)
links = soup.find_all('a'):
desired_link = links[position].get('href')
print desired_link
return retrieve_urls_recur(desired_link, index+1, deepness)
and then call it with the desired parameters, in your case:
retrieve_urls_recur(url, 2, 0, 4)
2 is the url index on the list of urls, 0 is the counter, and 4 is how deep you want to go recursively
ps: I am using requests instead of urllib, and I didnt test this, although I recentely used a very similar function with sucess
Solution 2:
Just get the link from find_all()
by index:
while count < num:
context = ssl._create_unverified_context()
htm = urllib.request.urlopen(url, context=context).read()
soup = BeautifulSoup(htm)
url = soup.find_all('a')[position].get('href')
count += 1
Post a Comment for "Use Beautifulsoup To Loop Through And Retrieve Specific Urls"