Detecting Search Bots PHP Script

by admin on December 1, 2009

in Tutorials

Search Engine Bot Detection

Many times when writing syndication or ad tracking scripts it is important to track what search bots are clicking through your links, as opposed to an actual users; thus allowing you to better track actual user interaction over bot interaction. Here is a script we wrote to accomplish the function.

First we compile an array of all the known search bots. (Although this is not comprehensive, it does cover the main web crawlers – and we will update it as we see the need.)

//********************* || LIST OF KNOWN SEARCH BOTS ||
function GetBotList(){
$BotList = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi", "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory", "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot", "crawler", "www.galaxy.com", "Googlebot", "Googlebot/2.1", "Google Webmaster", "Scooter", "Scooter", "Slurp", "msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz", "Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot", "Mediapartners-Google", "Sogou web spider", "WebAlta Crawler", "MJ12bot");
return $BotList;
}

Next, we scan through the search bot array and match it up against out $_SERVER['HTTP_USER_AGENT'] variable.

//********************* || SEARCH BOT DETECTION FUNCTION ||
function DetectBot(){
	$BotList = GetBotList();

foreach($BotList as $bot) {
	if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
		$thebot = "BOT: " . $bot;
		return $thebot;
	}
        }
}

Finally, we use the PHP get_browser() function to format and return everything pretty and double check the code against $_SERVER['HTTP_USER_AGENT'].

//********************* || USER AGENT DETECTION FUNCTION ||
function DetectBrowserInfo() {
	$UserAgent       = get_browser(null, true);
	$UserBrowser     = $UserAgent['parent'];
	$UserOS          = $UserAgent['platform'];

	if(DetectBot() != false) {
		$UserBrowser = DetectBot();
	} elseif (isset($UserBrowser)) {
		$UserBrowser = $UserBrowser;
	} else {
		$UserBrowser = "BOT: Unknown";
	}	

	if($UserOS != "unknown") {
		$UserOS  = $UserOS;
	} else {
		$UserOS  = "BOT: N/A For Traffic";
	}

	$UserAgentDetails = array($UserBrowser, $UserOS);
	return $UserAgentDetails;
}

{ 1 trackback }

Top 35 Web Crawler Search Bots — Streamlined Fusion Blog
December 22, 2009 at 10:41 pm

{ 1 comment… read it below or add one }

ganool February 3, 2010 at 8:41 pm

sometimes bad robot using user agent same like search engine bot

Reply

Leave a Comment

Previous post:

Next post: