A Web Bot (aka bot) is effectively code that is executed through a browser, it is generally called headless meaning there is no user interface and is usually designed to perform tasks repetitively or without user interaction. A bot can be designed to crawl (visit many websites and / or website structures) using predefined lists, sitemaps, website links or target a specific website page. There are both legitimate and illegitimate reasons to use bots, search engines for example use bots to trawl for current search indexes while illegitimate bots could be used for any number of purposes such as identifying security holes, identifying web applications or simply to deny your website or network service. Bots, regardless of their purpose, have the ability to be disruptive and regardless of whether you self host or use web services, it is worth having a basic understanding of some common behaviours.
First lets discuss legitimate bots and how they have the ability to be disruptive. Let's talk about the humble website crawler, they are reliant on a number of rules about how they behave and essentially search engines put the burden of limiting their behaviour on the developer or website owner. Some common requirements are that you are required to maintain a sitemap which is a file containing a full list of website pages you want "indexed". On top of this you need to have a robots.txt file to tell the bot where it is allowed to crawl and where it isn't. Then there is the addition of the ads.txt file for authorised third party advertisers / resellers...with every change to the bot's rules, developers must review and respond by making changes to the website...sound reasonable? The problem being that if you buy a website, rarely do you have a developer on call to maintain your website which is where the burden comes in. Regardless of conforming to new rules, log files soon are showing errors for crawl attempts to pages or paths that no longer exist, don't exist in your sitemap and haven't existed for some time...this has the potential to mask or lag responses to real threats. So what can you do about all this "noise"...nothing other than applying filters to your log queries and accepting the additional bandwidth load. This is not to mention that there isn't just one search engine indexing your site, there are many not to mention social media bots, bots for advertising, bots for business listings, bots for other services that may be unique to your site which is alot of non user traffic to say the least and adds to the disruption I mentioned (not to mention skewing analytics in some cases).
Illegitimate Bots can also be used to crawl and "index" websites, usually to locate information such as email addresses, links, contents of certain pages etc and they get the same info as any user visiting the website. These kinds of bots are typically the source of email scams targeted at businesses such as unsolicited offers to do SEO work, website upgrades, services or selling goods from overseas. There are obviously victims of these scams as otherwise they would cease and this is how scams like the iTunes Gift Card scams began...there is always someone who will fall for it. Information trawling by bots is how a substantial amount of spam, scamming and overseas marketing manage to form more targeted and sophisticated lists. This is also how businesses can be compromised through non technical means such as having enough information to manipulate or sound convincing enough to open an email attachment, give up more sensitive information or some other attack avenue.
Illegitimate Bots also have the ability to challenge security especially if you are someone who uses a very popular web language, application or service. Illegitimate bots can be used to scan your domain or sub domain for files, paths, security holes and responses just to name a few uses. One common use for bots is to search multiple paths in search of login pages, news feeds, blog portals, ads.txt files (yes there is an exploit here), backups, configuration / setup pages left active, development websites, system information, code repositories, payment exploits and oh so much more...This approach is sometimes called trawling, looking for fish to catch in the ocean that is the internet. In a seven day period it wouldn't be unusual for our business to receive more than a couple of hundred queries with malintent. In terms of language oriented attacks, PHP is by far the most commonly referenced in the queries. In more recent times there has been an increase in attempts to exploit the recently depreciated SSL3 protocol which has since seen a move to TLS 1.1, 1.2 and 1.3 for secure website communications. If you run a wordpress platform then be prepared to take extra precautions as this is the most commonly queried / attacked application that I have seen to date with repetitive and frequent trawling for wordpress login pages, installation pages and configuration pages. With regards to wordpress, it doesn't matter about your size, you will eventually become a target and so it is very important to have complex passwords and never use generic usernames such as admin.
While Web Bots won't be going away anytime soon this doesn't mean that they are becoming more effective over time as they work on the premise that security is a lagging mechanism. This effectively means that often security is bolstered in response to events and in the meantime they have a window whereby they can exploit a given weakness. It has often been cited by many a manager that time should be invested in things that generate revenue over things that prevent or reduce loss however security should trump this mentality in my opinion. Simply look at the front door to your house and ask yourself, how would I feel if someone went from house to house checking them, checking to see if my front door was locked, my windows closed, my gate locked, checking things I haven't even thought of...how comfortable would I be knowing this could be happening more than 100 times over seven (7) days. Personally, I take security seriously and even that thought makes me feel a little nervous...how secure do you feel?
Thank you for your support and camaraderie in these troubled times and we look forward to doing business with you should you be interested in our services. If you are interested in finding out more about what we can do for you then please feel free to visit our main website or contact us. Thank you for your time, for reading our blog post and it would be great if you feel the need to share or like our articles via one of our social media platforms with the @ActsIntuitively tag as applies.
Technical Services Manager
Read Prev Post Read Next Post