Sitemap contains urls which are blocked by robots.txt : Way to fix

sitemap contains urls which are blocked by robots.txt; googlebot blocked by robots.txt; indexed though blocked by robots.txt blogger; indexed though blocked by robots.txt; google search console;

You may have noticed an error in the search console stating that the sitemap contains URLs which are blocked by robots.txt. This means that there are some posts or pages on your site that are blocked by robots.txt. I will discuss the solution in this post.

The robots.txt file plays a very important role for each site. Following this file, search engines index your site's posts or pages. This means that the search engines are instructed through this file on which parts of the site to index.

If you are a new Google Search Console user and have encountered this type of error in the robots.txt file, keep reading for solutions.


What is the robots.txt file?

The code of conduct of a website is the robots.txt file. This file instructs search bots to index or avoids posts or pages. So you can also consider it as a set of instructions for search engine bots.

The robots.txt file is usually attached to the source file of the website. Web crawlers follow its instructions for managing specific site activities. The robots.txt file contains a link to the XML sitemap. Here you can learn how to create blogger sitemap.  Search engine bots automatically come to a website and access the robots.txt file first. From here bots get instructions on which part will crawl and which part will not crawl.

It is important to note that not all search engine bots follow the robots.txt file. It basically follows the bots of giant search engines like Google and Bing.

You may like:

Create Effective Blog Titles in Just 9 Simple Steps

How to write a comment for quick approval

How to delete blogger blog permanently

What is the reason for the "Indexed, though blocked by robots.txt" warning?

There are two possible reasons why a URL to your site may be blocked by robots.txt.

  1. The robots.txt file contains the disallow rule
  2. The HTML page has noindex tags

If this happens, you are giving the crawler two directions at once, e.g.

  1. Do not crawl the page
  2. Do not index it.

In most cases, this is the reason why the error is blocked by robots.txt. If you do not want a page to be indexed by Google, you must notify Google Bot of the correct rules. And if you do this, the bot will not see the noindex instructions in HTML and will not be able to remove the page from the index.


How do you fix the "sitemap contains URLs which are blocked by robots.txt"?

The main reason why URLs are blocked on sitemaps is that the robots.txt file is configured incorrectly. So when you want to keep a URL out of the index, you should point it to the Google bot.

To fix the "Sitemap contains URLs that have been blocked by robots.txt" you should check the following.


How to fix Indexed, though blocked by robots.txt for blogger Error

If you are a blogger user and if indexed, though blocked by robots.txt error occurs, then follow the steps below to resolve it.

  1. Go to Settings > Crawlers and indexing on the left side of the Blogger dashboard. Now activate the “Enable custom robots header tags” option under the Crawlers and indexing section.

  1. Now activate “all” and “noodp” by clicking on the “Home page tags” option and save.

  1. Activate and save “noindex” and “noodp” by clicking on the "Archive and search page" tags option.

  1. Activate and save “all” and “noodp” by clicking on the "Post and page" tags option.

After completing the above tasks you will need to create and submit a valid robots.txt file. And that's why

  1. Copy the code below.

User-agent: Media partners-Google
Disallow:
User-agent: *
Disallow: / search
Disallow: / label
Disallow: / category
Allow: /
Sitemap: https:// your site.com/sitemap.xml

Type the URL of your site instead of your site.com.

  1. Go to the Blogger Dashboard Settings> Crawlers and indexing section.

  1. Activate the Enable custom robots.txt button.

  1. Click on the Custom robots.txt option. At this stage, a text box will open. Paste the code here and save it.

After completing the above tasks you will need to go to Google Search Console. And from the Blogger dashboard, click Settings> Google Search Console. Click on the Coverage option on the left side of the search console dashboard. At this stage, select “Valid with Warning”> Indexed, though blocked by robots.txt option, and click on the “Validate Fix” button. You got the job done right.


Last Word

If you can correct the robots.txt file by following the steps correctly, Google will crawl your site again. You must wait a few days for this. We hope that you can easily solve the problem of "Indexed, though blocked by robots.txt" by following certain steps. If you can do this, the "sitemap contains URLs which are blocked by robots.txt" problem will no longer exist.

Next Post Previous Post
No Comment
Add Comment
comment url