Magento, Google Merchant Center & Robots.txt

If you are looking to make SEO improvements to your Magento website, you have probably considered using a robots.txt file to block specific pages from search engine listings. A word to the wise: If you are listing your products on Google Merchant Centre, you may want to tread carefully.

The issue: robots.txt blocking access to Magento product images

The core issue is that your robots.txt file could be blocking Google from accessing your product images (used in Google shopping feeds, etc), especially if you’ve just scrapped the default Magento robots.txt in to your site.

If you used the default Magento robots.txt file already, you may have run in to the error below:
'The submitted image URLs seem to be blocked by robots.txt. Google will not be able to display these images together with the products. Please change your robots.txt file to allow Google to download the image.'

If you are receiving the error above, then the chances are that your products are slowly being de-listed for product feed listings. Let’s look at how this can be fixed. I’ll start with a sample robots file.
User-agent: *
# Directories
Disallow: /404/
Disallow: /media/
Disallow: /catalogsearch/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/

This is a simplified example of what your robots.txt file may look like for a Magento store. The problem is with the following line:
Disallow: /media/

This line blocks all search engine bots from accessing the media folder, including Google’s media bot which is used to access this, in which the product images (among others) sit.

Solution: ensuring your Magento product images are accessible to Google Images

If you understand how the robots.txt file works, you may be tempted to try the following fix, which gives the Google ‘image-bot’ access to the media folder.
User-agent: googlebot-image
Allow: /media/

As it turns out, this doesn’t seem to work. As an alternative you can simply try to remove the ‘media’ line altogether. However, if this doesn’t work you’ve landed yourself in a tricky situation.

Try waiting a couple of days after making a change, as it often takes this long for Google to re-index the updated robots.txt file. If you are unsure if the changes have been picked up, try looking in your Google Webmaster Tools account, under the ‘Crawl > Blocks URLs‘ page. Here you will see the contents of your robots.txt file from Google’s perspective.

Even though the above content in your Magento website’s robots.txt file should do the job, it appears you actually need to use the following at the bottom of your robots.txt file to re-allow Google’s image bot to access what it needs:
User-agent: Googlebot

If you urgently need to get your products back in the listings, this solution worked for our client – but it came at a price. Adding these lines to the end of your Magento store’s robots.txt file basically overwrites everything prior to it, giving Google access to all the pages you originally wanted to block.