Information Gathering / Robots.txt File

Web and API

Description

Robots.txt is a text file used by web servers to indicate which parts of a website can or cannot be accessed by a web crawler. It is also used to define a list of webpages that should not be indexed or crawled by search engine bots. This vulnerability occurs when the robots.txt file is not properly configured or is missing, thus allowing unauthorized access to sensitive information or parts of the website (CWE-245). This could lead to information gathering, which is a type of attack that seeks to obtain sensitive data such as usernames, passwords, credit card numbers, and other confidential information (OWASP Testing Guide).

Risk

If the robots.txt file is not properly configured or is missing, an attacker can gain access to sensitive information or parts of the website, such as file directories, that were meant to remain private. This can lead to an information gathering attack, which can have serious consequences for the security of the website. The risk assessment of this vulnerability is high.

Solution

The solution to this vulnerability is to ensure that the robots.txt file is properly configured and up to date. It should be configured to prevent unauthorized access to sensitive information or parts of the website.

Example

The following example from CVE-2020-15044 illustrates a robots.txt file that has been misconfigured, allowing access to a directory containing sensitive files:

User-agent: *
Disallow:
Allow: /sensitive-directory

Curious? Convinced? Interested?

Arrange a no-obligation consultation with one of our product experts today.