Original post: https://3xn.nl/projects/2023/09/20/crude-solution-to-ban-bots-by-their-user-agent/
I’ve very much simplified the script that instantly redirects unwanted traffic away from the server. Currently, I am using a very cheap VPS to receive all that traffic.
Here ya go:
<?php
// CC-BY-NC (2023)
// Author: FoxSan - fox@cytag.nl
// This is a functional but dirty hack to block bots, spiders and indexers by looking at the HTTP USER AGENT.
// Traffic that meets the conditions is being yeeted away to any place of your choice.
//////////////////////////////////////////////////////////////
// Emergency bypass
// goto end;
//////////////////////////////////////////////////////////////
// attempt to basically just yeet all bots to another website
$targetURL = "https://DOMAIN.TLD/SUB/";
// Function to check if the user agent appears to be a bot or spider
function isBot()
{
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$bot_keywords = ['bytespider', 'amazonbot', 'MJ12bot', 'YandexBot', 'SemrushBot', 'dotbot', 'AspiegelBot', 'DataForSeoBot', 'DotBot', 'Pinterestbot', 'PetalBot', 'HeadlessChrome', 'GPTBot', 'Sogou', 'ALittle Client', 'fidget-spinner-bot', 'intelx.io_bot', 'Mediatoolkitbot', 'BLEXBot', 'AhrefsBot'];
foreach ($bot_keywords as $keyword) {
if (stripos($user_agent, $keyword) !== false) {
return true;
}
}
return false;
}
// Check if the visitor is a bot or spider
if (isBot()) {
// yeet
header("Location: $targetURL");
// Exit to prevent further processing
exit;
}
end:
// If the visitor is not a bot, spider, or crawler, continue with your website code.
//////////////////////////////////////////////////////////////////////
?>
Here’s a list of stuff that I have in my .htaccess files on various websites.
I want to work on my website, but any other visitor should be booted to another website so I can work in peace. Sidenote: It's forever since I last used this, so it might work. Or not.
---
# YOUR IP address goes here:
RewriteCond %{REMOTE_ADDR} !^000\.000\.000\.000$
# And provides you access to:
RewriteCond %{REQUEST_URI} !^https://DOMAIN.TLD$ [NC]
# Fine, go have all the media as well
RewriteCond %{REQUEST_URI} !\.(jpg|jpeg|png|gif|svg|swf|css|ico|js)$ [NC]
# Any other visitor can go visit the following website:
RewriteRule .* https://DOMAIN.TLD/ [R=302,L]
# Hey, no viewing access to this file
<FilesMatch "^.ht">
Order deny,allow
Deny from all
</FilesMatch>
# Disable Server Signature
ServerSignature Off
# SSL all the things!
RewriteCond %{HTTPS} !=on
RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R,L]
# No WWW
RewriteCond %{HTTP_HOST} ^www\.DOMAIN\.TLD$
RewriteRule ^/?$ "https\:\/\/DOMAIN\.TLD\/" [R=301,L]
# Do we like Symlinks? Yeah we do.
Options +FollowSymlinks
# No open directories or directory listings. What is this... 1998?
Options All -Indexes
IndexIgnore *
# Rewrite rules to block out some common exploits.
RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule .* index.php [F]
# PHP doohickies
php_flag register_globals off
php_flag safe_mode off
php_flag allow_url_fopen off
php_flag display_errors off
php_value session.save_path '/tmp'
php_value disable_functions "exec,passthru,shell_exec,system,curl_multi_exec,show_source,eval"
# File Injection Protection, or a code-condom. What.
RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{QUERY_STRING} [a-zA-Z0-9_]=http:// [OR]
RewriteCond %{QUERY_STRING} [a-zA-Z0-9_]=(\.\.//?)+ [OR]
RewriteCond %{QUERY_STRING} [a-zA-Z0-9_]=/([a-z0-9_.]//?)+ [NC]
RewriteRule .* - [F]
# /proc/self/environ? Go away!
RewriteCond %{QUERY_STRING} proc/self/environ [NC,OR]
# Disallow Access To Sensitive Files. Enter your own file names.
RewriteRule ^(htaccess.txt|configuration.php(-dist)?|joomla.xml|README.txt|web.config.txt|CONTRIBUTING.md|phpunit.xml.dist|plugin_googlemap2_proxy.php)$ - [F]
# Don't allow any pages to be framed - Defends against CSRF
<IfModule mod_headers.c>
Header set X-Frame-Options SAMEORIGIN
</IfModule>
# Uh. I forgot.
<IfModule mod_autoindex.c>
IndexIgnore *
</IfModule>
# NO SNIFFYWIFFY OwO
<IfModule mod_headers.c>
Header always set X-Content-Type-Options "nosniff"
</IfModule>
# NEEDS TESTING
# Turn on IE8-IE9 XSS prevention tools
#Header set X-XSS-Protection "1; mode=block"
# NEEDS TESTING TOO
# Only allow JavaScript from the same domain to be run.
# Don't allow inline JavaScript to run.
#Header set X-Content-Security-Policy "allow 'self';"
# Example if you don't like Russia and Turkey (Optional A1 is to block anonymous proxies)
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(RU|TR)$
RewriteRule .* https://DOMAIN.TLD/directorywithindexdothtml/ [R=302,L]
Finally, there is an official Joomla (4.x) Docker made for unRAID. But it is not done well. As in, you will lose all your settings and data when you restart your docker. So you need to copy the data outside the docker into the appdata folder. (Or wherever you want it)
Inside the docker image, the Joomla folder resides here:
/var/www/html/
And this is the folder you need to make “external” by adding a path to the edit/install panel.