Scrapy Ignoring Content After # Tag In The Url

March 31, 2024 Post a Comment

Hi i am scraping a site which has the URl like below http://www.example.com/categories-Mobile-Phones.aspx#RSS=pgZZ1QQdivZZctl00_ContentPlaceHolder1_ctl00_ctl03 i had placed this i

Solution 1:

It doesn't matter. With or without the hash the URI refers to exactly the same page.

The stuff after the hash is a fragment identifier. Your browser will use it to scroll the page to that specific part of the page.

Like this...

http://www.w3.org/TR/html4/intro/intro.html#h-2.1.2

...and this...

http://www.w3.org/TR/html4/intro/intro.html

..both retrieve the same page. The former simply tells you where on the page to start reading.

EDIT:

start_urls = ['themobilestore.in/home-mobiles-&-tablet/?page=1', 'themobilestore.in/home-mobiles-&-tablet/?page=2', ]

Python Library

Scrapy Ignoring Content After # Tag In The Url

Solution 1:

Post a Comment for "Scrapy Ignoring Content After # Tag In The Url"