Optimizing Robots.txt file for WordPress indexation
April 12th, 2011 by
Robots.txt file is a way to tell search engines and other crawlers which parts of your website should be crawled and indexed.
WordPress uses a virtual robots.txt file by default.
This means that if you open your blog’s root directory, you will not find a file named robots.txt unless you created one manually.
However, you can view the WordPress generated virtual robots.txt by adding “/robots.txt” to your blog’s URL (e.g. yourblog.com/robots.txt).

The virtual robots file includes the following 2 lines (unless it is modified by some kind of plugin or by your blog’s privacy settings):
User-agent: *
Disallow:
These lines tell all crawlers (called user agents) that all the pages and directories of your site can be indexed, including your admin pages (such as yourblog.com/wp-admin/ ).
As you may have guessed, indexing admin pages in search engines is not recommended, so you should definitely do some tweaking to these settings in order to improve SEO and prevent irrelevant pages from being indexed.
I recommend to add the following rules:
- If you use both categories and tags in your blog, do not index both (reason: duplicate content).
- Prevent search result indexation (if you have a search feature in your blog. reason: duplicate content).
- Consider preventing author and date archives (depends on whether you index categories and tags. reason: duplicate content).
- Prevent indexation of all admin pages.
- Advanced – consider preventing indexation of URLs that include arguments (depending of whether you use SEF premalinks). example: yourblog.com?s=q
There are a couple of options to set these rules:
- Install a plugin like Yoast’s Robots meta.
This plugin adds meta tags to the head section of the pages and tell tells the search engine whether or not to index them.
It also allows you to control search engine indexing for individual posts or pages. - Create a robots.txt file. This is very simple, you can use your notepad for this task and save the file as robots.txt.
Alternately, you can generate this file using an online generator such as this.
Once you have finished, you should upload the file to your blog’s root directory.
I use the following rules in my robots.txt file. It covers the steps above plus a few more.
Note that I block search engines from crawling my category pages because I decided to use tags in this blog.User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /*wp-login.php* Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /contact/ Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /z/j/ Disallow: /z/c/ Disallow: /stats/ Disallow: /dh_ Disallow: /category/* Disallow: /category/ Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /i/ Disallow: /f/ Disallow: /t/ Disallow: /c/ Disallow: /j/ Disallow: /*.php$ Disallow: /*?* Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /*?* Disallow: /*? Allow: /wp-content/uploads # alexa archiver User-agent: ia_archiver Disallow: / # disable duggmirror by Digg User-agent: duggmirror Disallow: / # allow google image bot to search all images User-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire site User-agent: Mediapartners-Google* Disallow: Allow: /*
Once you have set a robots.txt file for your blog, you can test it to see if it does what it should (blocking certain pages).
To test it, you can use the Crawler Access tool in Google Webmaster Tools:
- In GWT, go to Site Configuration -> Crawler Access.
- In this page, make sure the text area of the robots.txt file has been downloaded recently by Google and that it reflects the most recent changes you have made.
- In the URLs box, type different URLs to test against (for example, yourblog.com/wp-admin/) and click on the Test button.
The result displays something like “Blocked by line 3: Disallow: /wp-admin”.
If it doesn’t, you missed something when creating the file.
Another useful method to test the effectiveness of your robots.txt file, is to use Robots.txt Analyzer.

These are very informative writing, specially for SEO point of view
Thanks, I’m glad you find it useful
Well done. Simple enough. Thanks fr letting us know about Robots.txt Analyzer too. First time I heard about it.
No problem, glad I could help.
I once had some problems on a website with duplicate content and I set robots.txt to index only certain pages ( it was a 5 page website ). When I wanted to add more pages to the website I had big problems with making google to index those new pages. The indexing time was over few weeks, while on other websites indexing time was less than 2 hours.
I know, Google can be sometimes unpredictable in their indexing. Just gotta be patient and eventually it works out
thankx for the great tip.
What about blocking all posts from a specific category? This was great information. My category name is “episodes”. How would that look in the robots.txt file? Also, if I’ve set my permalink structure to not display the category, would this make a difference for blocking the user agents from that category and all of it’s posts?
It should be Disallow: /category/episodes/
The permalink settings apply to posts and not to the category itself, so no, it wouldn’t block the search engines from that category, so you should add that line to the robots.txt file
Thanks Omer Greenwald. I was looking for this solution.
nice.. thnks
Will this one robot.txt file work for all of my multi-site blogs that are installed on the top-level domain that this robot.txt file is written for? Or do I have to create a robot.txt file for each subdomained blog? Currently I have about five blogs installed on a top-level domain using wordpress multisite, but I’m using one robots.txt file that is installed in the root of the top-level domain. For instance: mytopleveldomain.org and mysubdomainsite.mytopleveldomain.org (do I have to create a robots.txt file for the subdomain site)? If so, how would I accomplish this on blogs that are created virtually by multisite?
Hi Penina, unfortunately I dont have enough experience in multisite to answer that, however, I suggest you go to http://mu.wordpress.org/forums/ and see if the there is a similar question or post this one.
Thanks for such a good explanation. The Big G is really hard to please.
Will this robots.txt file you created still allow Google to spider and index my blogs,but not the other files.
Wow! Very nice robots.txt code. Thank a lot.
This is a super-useful post, clearly explained at just the level I needed.
I’d noticed that I had lots of duplicate metatags due to the different categories on my WordPress site. I’m a beginner coder but was able to follow your instructions and upload a robots.txt file which (cross fingers) should improve my site’s Google performance.
Thank you!
Thank you for your article on Robots.txt.
On one of my sites, the images were getting indexed and not the actual post. I have used the above robots.txt. Hopefully this will sort out the problem.
I am going to use this robots.txt file for my site, Thank you.
After analyzing it on Robots.txt analyzer recommended by you it is giving following error
Line 15: Disallow: */trackback # the url should start with /
Line 16: Disallow: */feed # the url should start with /
Line 17: Disallow: */comments # the url should start with /
Thanks for info. Now I will add robots.txt file to my site. Thanks
a few questions:
is it Disallow: /wp-includes or Disallow: /wp-includes/ with a trailing slash?
I am asking because I used Disallow: /wp-includes/ and a lot of files within that directory are still being indexed by Google.
Besides that I also had this in my robots.txt Disallow: /wp-content/ and that virtually eliminated all my images showing up when using the Gogle Image Search…
Any advice?
Will the code for the robots.txt work if I have WordPress installed in a subdirectory = site.com/blog? Can you work up an example if it is a lot different? Thanks!
I Think That The Virtual Robots Files, I Think That They Are Very Intellgent .