This article contains original research, observations, conjecture, and synthesis. Feel free to leave a comment if you disagree with something.
A hosts file is a list of IP addresses that are associated with particular hostnames. It was originally used for machines to locate other machines on the Internet, where site administrators would synchronize them using FTP or other means. The growth of the Internet after commercialization in the early 1990s led to the development of the Domain Name System (DNS), but even as of 2013, hosts files are still used to override DNS entries.
# Example hosts file (all hosts fictitious) # The first word on the line is an IPv4 or IPv6 address; the rest # are hostnames. However, some operating systems may allow only # one hostname on each line. 127.0.0.1 localhost 64.23.67.55 example.com www.example.com 55.230.133.158 mail.example.com 0.0.0.0 badserver.com www.badserver.com maliciousserver.info infectedserver.info 0.0.0.0 ad.tripleclick.net tracking.tripleclick.net socialwidgets.buttbook.com
One can use a hosts file to disable lookup entirely for a hostname by adding an entry mapping it to 0.0.0.0
.
Some people advocate using this feature to disable lookup entirely for servers containing malicious software, third-party tracking servers such as those used by social recommendation widgets, or machines involved in serving rich-media advertisements whose display slows a computer down and runs up the data bill of a heavily capped Internet access subscription.[1]
As a firewall mechanism, hosts files are easy for a computer's owner to deploy. Their common text-based file format across multiple operating systems allows for cross-platform tools and simple hand-editing by novices using a text editor. And because they are stored on a single machine, they follow that machine no matter what networks it joins. This proves helpful on a laptop that connects to several wireless networks open to the public. Hosts files can be used as a countermeasure against DNS cache poisoning,[1] especially when it's so easy to upload malware to ad networks that popular sites end up hosting even more malware than porn sites.[2]
Hosts files have several drawbacks compared to other means of blocking undesirable connections, which can be split into those of the format itself and those of resolvers using the format.
The hosts file format has at least two theoretical drawbacks.
One is lack of support for wildcards, such as *.cn
or *.ru
or *.someadnetwork.com
.
Some tracking providers have begun to exploit this, changing the hostname every day.[3]
Another is lack of a way to specify NXDOMAIN
, as applications may not treat a 0.0.0.0
result the same.
The resolver built into popular operating systems for home PCs is slow. It scans the hosts file linearly for each resolution instead of loading the entire file into an in-memory tree, trie, or Bloom filter that can be searched quickly. This can cause a hostname not found in a multi-megabyte hosts file to take longer than a second, forcing a certain popular tool to have to reinvent the DNS wheel by managing its own cache of commonly used hostnames at the top of the file.
This resolver also tends to support only one hosts file per machine, which the administrator installs system-wide in a place like /etc/hosts
, not one per user.
In addition, Windows 8 and later remove certain entries from the hosts file unless the user tells Windows Defender to exclude the file from protection.
This is ostensibly to stop phishing attacks where malware adds malicious entries to the hosts file, but it ends up getting in the way of blocking the slow, privacy-invading scripts used by advertisement and social networks.
And devices running a mobile operating system such as iOS or Android typically don't give the device's owner enough administrative privileges to edit the hosts file.
In some situations, it might be preferable to run your own DNS server on a computer or router appliance on your network. For example, unlike a hosts file on each machine, a DNS server works across multiple machines within a LAN. A DNS server also supports more flexible file formats that can block an entire domain, not individual hostnames as one must with a hosts file.
Several enthusiasts compile lists of these undesirable servers and make these lists available for download through the web. The spyware removal program Spybot – Search & Destroy has an "inoculate" feature that makes changes to the system to thwart spyware installers; one of these is a hosts file. Someone else makes this Spybot hosts file available separately.[2]
Alexander P. Kowalski (APK), a staunch advocate of hosts files as an element of layered security[3] and the self-proclaimed "Lord of HOSTS", wrote an application to manage hosts files in Windows that draws from about a dozen of these blocking lists.[4][5] The user can choose to build a complete list of threats (over 4 million as of December 2015) or limit it to current threats (far smaller). This application is proprietary because APK fears that his software may be repackaged by malware authors the way the Chromium web browser was repackaged as eFast.[6] (In 2014, he branched out into registry edits that install firewall rules that block IP addresses shared by loads of individual hostnames at Akamai and other CDNs that are impractical to block with a hosts file, as well as boosting the priority of hosts in the Windows registry's DNS section[7].) APK Hosts File Engine is said to do things that ad-blocking browser extensions can't do, such as protect applications other than web browsers, cache IP addresses to work around DNS downtime and poisoning, and operate in kernel mode. It has been recommended by an employee of MalwareBytes. As of July 2015, the latest version is version 9.0. APK has built up a reputation for spamming his app on Slashdot, but some Slashdot users seem to like it.
A few Slashdot users other than APK use a hosts file or other means of DNS-level blocking, such as SuricouRaven, shellbeach, and bmo.
By May 2013, one popular hosts file manager was generating hosts files with over two million entries. This slows the performance of name resolution in popular desktop operating systems whose resolver rescans the hosts file linearly for every DNS resolution. This shouldn't be too hard to fix at the operating system level: scan it once and build a Bloom filter, a data structure that acts as a lossy compression of a set. The so-called "safe browsing" filter in Google Chrome uses the same data structure.[4] This allows the resolver to rapidly determine whether or not a particular hostname is in the hosts file, at the cost of a few false alarms. (A false alarm is a name not in the hosts file that matches the filter by chance.) For a false alarm probability of 1/256 and a set cardinality of 2 million, the size of the Bloom filter is 23083120 bits, or just under 3 MB. But an implementation might round the size to the nearest power of 2 so that it can more easily use something like SHA-256 on the hostname to generate a set of bit addresses into the filter. In addition, the resolver would store the most commonly accessed entries and false alarms in a cache in RAM.
A resolver could follow this algorithm:
Categories: Computer security, Original research