Update in the script to block bots, spiders and indexers

Original post: https://3xn.nl/projects/2023/09/20/crude-solution-to-ban-bots-by-their-user-agent/

I’ve very much simplified the script that instantly redirects unwanted traffic away from the server. Currently, I am using a very cheap VPS to receive all that traffic.

Here ya go:

<?php

// CC-BY-NC (2023)

// Author: FoxSan - fox@cytag.nl

// This is a functional but dirty hack to block bots, spiders and indexers by looking at the HTTP USER AGENT.
// Traffic that meets the conditions is being yeeted away to any place of your choice.

//////////////////////////////////////////////////////////////
// Emergency bypass
// goto end;
//////////////////////////////////////////////////////////////

// attempt to basically just yeet all bots to another website

$targetURL = "https://DOMAIN.TLD/SUB/";

// Function to check if the user agent appears to be a bot or spider

function isBot()

{

    $user_agent = $_SERVER['HTTP_USER_AGENT'];

$bot_keywords = ['bytespider', 'amazonbot', 'MJ12bot', 'YandexBot', 'SemrushBot', 'dotbot', 'AspiegelBot', 'DataForSeoBot', 'DotBot', 'Pinterestbot', 'PetalBot', 'HeadlessChrome', 'GPTBot', 'Sogou', 'ALittle Client', 'fidget-spinner-bot', 'intelx.io_bot', 'Mediatoolkitbot', 'BLEXBot', 'AhrefsBot'];

    foreach ($bot_keywords as $keyword) {

        if (stripos($user_agent, $keyword) !== false) {

            return true;

        }
    }

    return false;

}

// Check if the visitor is a bot or spider

if (isBot()) {

// yeet

header("Location: $targetURL");

    // Exit to prevent further processing

    exit;

}

end:

// If the visitor is not a bot, spider, or crawler, continue with your website code.

//////////////////////////////////////////////////////////////////////

?>

Loading

Crude solution to ban bots by their user-agent

Okay, this is a very crude way to block bots, spiders and crawlers by their user-agent, but so far, this has been very, very efficient.

Even when one chooses ” yes “, the question will be repeated. This is not a problem, because no one in their right mind is going to add “bot”, “spider” or “crawler” as their user-agent.

So here’s the PHP script that I rammed into a certain website to prevent it from being DDOSsed by (malicious) bots.

<?php

// CC-BY-NC (2023)
// Author: FoxSan - fox@cytag.nl
// This is a functional but dirty hack to block bots, spiders and indexers by looking at the HTTP USER AGENT.
// The form is, iirc, not even working, but that's fine if you only want human visitors.
// It can also throw a 403, but the effect is the same.

////////////////////////////////////////////////////////////////////////////////
// Emergency bypass
// goto end;
////////////////////////////////////////////////////////////////////////////////

// Function to check if the user agent appears to be a bot or spider.
// Enter the bots you would like to block in a list as shown below.
function isBot()
{
    $user_agent = $_SERVER["HTTP_USER_AGENT"];
    $bot_keywords = ['bytespider', 
                     'amazonbot', 
                     'MJ12bot', 
                     'YandexBot', 
                     'SemrushBot', 
                     'dotbot', 
                     'AspiegelBot',
                     'DataForSeoBot',
                     'DotBot',
                     'Pinterestbot',
                     'PetalBot',
                     'HeadlessChrome', 
                     'AhrefsBot'];

    foreach ($bot_keywords as $keyword) {
        if (stripos($user_agent, $keyword) !== false) {
            return true;
        }
    }

    return false;
}

// Check if the visitor is a bot or spider
if (isBot()) {
    // This visitor appears to be a bot or spider, so display a choice.
    // Check if the choice form is submitted
    if (isset($_POST["submit"])) {
        // Check the choice made by the visitor
        $choice = isset($_POST["choice"]) ? $_POST["choice"] : "";

        if ($choice === "yes") {
            // User selected "Yes," block access
            echo "Access denied. If you believe this is an error, please contact us by writing the word [MAILBOX] before the at sign, followed by [DOMAIN.TLD]";
        } elseif ($choice === "no") {
            // User selected "No," proceed to end
            goto end;
        }
    } else {
        // Output the message to the user and make the choice mandatory
        echo "Your user agent suggests you might be a bot, spider, or crawler. Are you one of these three?";

        // Output the radio button choices within a form
        echo '</p>
<form method="post" action="">';
        echo ' <label><input type="radio" name="choice" value="yes" required>Yes</label>';
        echo ' <label><input type="radio" name="choice" value="no">No</label>';
        echo ' <button type="submit" name="submit">Proceed</button>';
        echo "</form>
<p>";
    }

    // Exit to prevent further processing
    exit();
}
end:
// Original website code starts from here.
/////////////////////////////////////////////////////////////
?>

Loading