6 Exclude Pages from crawling
Everyone needs to exclude pages with duplicate content via a robots.txt file from being indexed more than once. If Google fetches multiple duplicate contents then it may decrease the site ranking. Private pages or the pages with not enough and efficient contents should be blocked from the Google crawling and indexing. In case of WordPress for example you might also want to exclude wp-admin (where the webmaster logs in to WordPress) from crawling and indexing since wp-admin is not for the public view.
You can disallow any of your private pages from the Google web crawler. This can be done with the robots.txt file. The robots.txt file is the file that needs to be uploaded on the root directory of your project. It contains the filters for which links, folders or files should (not) be allowed to be in sitemap.
Here is how you can get those links, folders or files to not get indexed by Google:
- Create robots.txt file and add it into Google Search Console. If you want to prevent any of your site page (private/not public) from the crawl and want to not show in Google search then you have to specify that page in the robots.txt file.
- Create file named “robotx.txt” in the root directory of your project and add following line in the file.
Eg: if you want to prevent this page from crawl (www.example.com/private-page) then the line in robots.txt file would be as follows: “Disallow: /private-page”
This is how that would look in the Search Console: