May
14
2015

Providing a unique robots.txt file per site within Magento’s multisite

This is a guide on how to provide a unique robots.txt file per site within a Magento multisite installation.

As this solution isn’t Magento-specific (it’s server specific), it should work with any version of Magento Community and Magento Enterprise Editions (we tested it on Magento CE 1.4 and Magento CE 1.9).

There are a couple of requirements before we begin:

  1. Your server must been on Apache (this method won’t work for nginx servers)
  2. Your sites should live on different domains (e.g., example.com and example.de)

Having a robots.txt file at the root of your website is essential for many reasons. Search engines require the files to be named robots.txt in order for them to be able to read and understood.

What happens however, when you have multiple sites all with the same root directory? With only one robots.txt file you would have to effect the same sections of each individual website. This is obviously not ideal for some stores running within a single Magento installation using the multisite feature, who may have different requirements.

We can get around this with two different methods. The first involves a redirect. You can view the pro’s and con’s of using a redirect in the following table:

Method 1 – Redirect the robots.txt to separate files

Advantage Disadvantage
Easily setup The redirect negatively affects the SEO of your site
Requires minimal knowledge of programming

The file we will need is the .htaccess file of which can be found in your root directory. Within this we need to do a simple redirect like so:

RewriteRule ^robots\.txt$ %{HTTP_HOST}.txt [L]

This redirect would essentially redirect your robots.txt file to a domain_name.txt file. You could therefore create a robots.txt file per-site and have them read via the redirect. This would work like so:

Domain Name Redirected robots.txt file
example.com example.com/example.com.txt
ilovelearning.com ilovelearning.com/ilovelearning.com.txt

Within each of these files you can place the relevant information of which you need per site.

Method 2 – unique robots.txt files using PHP

The second method involves a bit more work. However, it includes no redirects making it more friendly to search engines crawling through your site.

To begin with, we will have to tell our site that the robots.txt file is actually one operates with php. To do this add the following to your .htaccess file:

<FilesMatch ^robots\.txt$>

SetHandler application/x-httpd-php

</FilesMatch>

With this, we can now add php to the robots.txt. Next we need to add a little code!

@header("Content-Type: text/plain");

$content = "";
$url = SERVER['SERVER_NAME'];


if ($url == "a.site.com")

{

$content .= "User-Agent: * \n";

$content .= "Disallow: /app/ \n";

...

} elseif($url == "b'site.com") {

$content .= "User-Agent: * \n";

$content .= "Disallow: /skin/ \n";

...

} else {

...

}

echo $content;

A little explaining is in order. We first need to create a variable of which is empty so that we can add our robots.txt information within it. Next we need to ask for the url of which the crawler is currently within.

We can then add specific information based upon which URL the file is being called from. This allows us to only use a single robots.txt file throughout the whole of our multisite.

Ensuring that there is a newline at the end of every addition (with the use of “\n”) is very important. We need to ensure that we are displaying the file as we need to otherwise the crawler may not be able to understand what we are telling it.

The above example would output the following based upon the current URL:

Domain Name robots.txt content
a.site.com User-Agent: *Disallow: /app/
b.site.com User-Agent: *Disallow: /skin/

 

That should be it! You are now giving search engines a different robots.txt file depending on which website they are currently crawling on your Magento multisiute installation.
If you desire any further help with Magento, don’t hesitate to check out our comprehensive Magento consultancy services.