Scrapy Ignoring Content After # Tag In The Url
Hi i am scraping a site which has the URl like below http://www.example.com/categories-Mobile-Phones.aspx#RSS=pgZZ1QQdivZZctl00_ContentPlaceHolder1_ctl00_ctl03 i had placed this i
Solution 1:
It doesn't matter. With or without the hash the URI refers to exactly the same page.
The stuff after the hash is a fragment identifier. Your browser will use it to scroll the page to that specific part of the page.
Like this...
http://www.w3.org/TR/html4/intro/intro.html#h-2.1.2
...and this...
http://www.w3.org/TR/html4/intro/intro.html
..both retrieve the same page. The former simply tells you where on the page to start reading.
EDIT:
start_urls = ['themobilestore.in/home-mobiles-&-tablet/?page=1', 'themobilestore.in/home-mobiles-&-tablet/?page=2', ]
Post a Comment for "Scrapy Ignoring Content After # Tag In The Url"