Thursday 28 January 2016

Hiding jsessionid parameter from Google using apache

If you're running a website on JBoss you may discover that Google has indexed your pages with a jsessionid query parameter in the links.
The Google crawl bot does not support cookies, therefore JBoss uses the jsessionid query parameter in order to maintain a session state without cookies. These query parameters can impact your Google rank and indexing efficiency as the same page can be indexed multiple times with different session ids, and dilute your ranking. Also, it leads to ugly links.
If you want to still be able to support non-cookie using users, but would like Google to see cleaner links, you can use Apache's mod_rewrite to modify the links for the Google bot only, leaving the normal functionality available to the rest of your users.
Assuming you have mod_rewrite enabled in your Apache instance, use this configuration in your apache config:
 # This should strip out jsessionids from google
 RewriteCond %{HTTP_USER_AGENT} (googlebot) [NC]
 ReWriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]
This rule says for request where the user agent contains "googlebot" (with case insensitive matching), rewrite the URL without the jsessionid. It seems to work nicely.

No comments:

Post a Comment