Sitemap Scraper Addon
Extract URL’s from Sitemaps
The ScrapeBox Sitemap Scraper addon is included free with ScrapeBox, and it allows you to extract URL’s from .xml or .axd sitemaps. Sitemaps generally list all of a sites pages, so being able to gather every URL belonging to a site via a sitemap is a far easier and faster way to gather this information rather than harvesting it from search engines using various site: operators.
The sitemap scraper addon also has a “Deep Crawl” facility where it will visit every URL listed in the sitemap, then fetch any further new URL’s listed on those pages that are not contained in the sitemap. Occasionally sites only list the most important pages in their sitemap, so the deep crawl can dig deep extracting thousands of extra URL’s.
You can also use keyword filters to control what URL’s are crawled and not crawled, this is ideal on large sites that may contain thousands of unnecessary pages like a calendar or files such as .pdf documents you wish to avoid. As seen here you can also opt to skip URL’s using https to avoid secure sections of a website listed in the sitemap file
Once the sitemap URL’s are extracted, they can be viewed or exported to a text file for further use in ScrapeBox such as checking the Pagerank of all URL’s, creating a HTML sitemap, extracting the page Titles, Descriptions and Keywords, checking the Google cache dates or even scanning the list in the ScrapeBox malware checker addon to ensure all your pages are clean. ScrapeBox also has a Sitemap Creator which enables you to create a sitemap from a list of URL’s.
Sitemap Scraper Tutorial
View our video tutorial showing the Sitemap Scraper in action. This is a free addon included with ScrapeBox, and is also compatible with our Automator Plugin.
We have hundreds of video tutorials for ScrapeBox.
View YouTube Channel