Original post: https://3xn.nl/projects/2023/09/20/crude-solution-to-ban-bots-by-their-user-agent/
I’ve very much simplified the script that instantly redirects unwanted traffic away from the server. Currently, I am using a very cheap VPS to receive all that traffic.
Here ya go:
<?php
// CC-BY-NC (2023)
// Author: FoxSan - fox@cytag.nl
// This is a functional but dirty hack to block bots, spiders and indexers by looking at the HTTP USER AGENT.
// Traffic that meets the conditions is being yeeted away to any place of your choice.
//////////////////////////////////////////////////////////////
// Emergency bypass
// goto end;
//////////////////////////////////////////////////////////////
// attempt to basically just yeet all bots to another website
$targetURL = "https://DOMAIN.TLD/SUB/";
// Function to check if the user agent appears to be a bot or spider
function isBot()
{
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$bot_keywords = ['bytespider', 'amazonbot', 'MJ12bot', 'YandexBot', 'SemrushBot', 'dotbot', 'AspiegelBot', 'DataForSeoBot', 'DotBot', 'Pinterestbot', 'PetalBot', 'HeadlessChrome', 'GPTBot', 'Sogou', 'ALittle Client', 'fidget-spinner-bot', 'intelx.io_bot', 'Mediatoolkitbot', 'BLEXBot', 'AhrefsBot'];
foreach ($bot_keywords as $keyword) {
if (stripos($user_agent, $keyword) !== false) {
return true;
}
}
return false;
}
// Check if the visitor is a bot or spider
if (isBot()) {
// yeet
header("Location: $targetURL");
// Exit to prevent further processing
exit;
}
end:
// If the visitor is not a bot, spider, or crawler, continue with your website code.
//////////////////////////////////////////////////////////////////////
?>
Here’s a list of stuff that I have in my .htaccess files on various websites.
I want to work on my website, but any other visitor should be booted to another website so I can work in peace. Sidenote: It's forever since I last used this, so it might work. Or not.
---
# YOUR IP address goes here:
RewriteCond %{REMOTE_ADDR} !^000\.000\.000\.000$
# And provides you access to:
RewriteCond %{REQUEST_URI} !^https://DOMAIN.TLD$ [NC]
# Fine, go have all the media as well
RewriteCond %{REQUEST_URI} !\.(jpg|jpeg|png|gif|svg|swf|css|ico|js)$ [NC]
# Any other visitor can go visit the following website:
RewriteRule .* https://DOMAIN.TLD/ [R=302,L]
# Hey, no viewing access to this file
<FilesMatch "^.ht">
Order deny,allow
Deny from all
</FilesMatch>
# Disable Server Signature
ServerSignature Off
# SSL all the things!
RewriteCond %{HTTPS} !=on
RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R,L]
# No WWW
RewriteCond %{HTTP_HOST} ^www\.DOMAIN\.TLD$
RewriteRule ^/?$ "https\:\/\/DOMAIN\.TLD\/" [R=301,L]
# Do we like Symlinks? Yeah we do.
Options +FollowSymlinks
# No open directories or directory listings. What is this... 1998?
Options All -Indexes
IndexIgnore *
# Rewrite rules to block out some common exploits.
RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]
# PHP doohickies
php_flag register_globals off
php_flag safe_mode off
php_flag allow_url_fopen off
php_flag display_errors off
php_value session.save_path '/tmp'
php_value disable_functions "exec,passthru,shell_exec,system,curl_multi_exec,show_source,eval"
# File Injection Protection, or a code-condom. What.
RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{QUERY_STRING} [a-zA-Z0-9_]=http:// [OR]
RewriteCond %{QUERY_STRING} [a-zA-Z0-9_]=(\.\.//?)+ [OR]
RewriteCond %{QUERY_STRING} [a-zA-Z0-9_]=/([a-z0-9_.]//?)+ [NC]
RewriteRule .* - [F]
# /proc/self/environ? Go away!
RewriteCond %{QUERY_STRING} proc/self/environ [NC,OR]
# Disallow Access To Sensitive Files. Enter your own file names.
RewriteRule ^(htaccess.txt|configuration.php(-dist)?|joomla.xml|README.txt|web.config.txt|CONTRIBUTING.md|phpunit.xml.dist|plugin_googlemap2_proxy.php)$ - [F]
# Don't allow any pages to be framed - Defends against CSRF
<IfModule mod_headers.c>
Header set X-Frame-Options SAMEORIGIN
</IfModule>
# Uh. I forgot.
<IfModule mod_autoindex.c>
IndexIgnore *
</IfModule>
# NO SNIFFYWIFFY OwO
<IfModule mod_headers.c>
Header always set X-Content-Type-Options "nosniff"
</IfModule>
# NEEDS TESTING
# Turn on IE8-IE9 XSS prevention tools
#Header set X-XSS-Protection "1; mode=block"
# NEEDS TESTING TOO
# Only allow JavaScript from the same domain to be run.
# Don't allow inline JavaScript to run.
#Header set X-Content-Security-Policy "allow 'self';"
# Example if you don't like Russia and Turkey (Optional A1 is to block anonymous proxies)
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(RU|TR)$
RewriteRule .* https://DOMAIN.TLD/directorywithindexdothtml/ [R=302,L]