Using keywords and links to perform threat intelligence analysis on onion websites
With rapid advancement of technologies on the dark web, cybercrimes are skyrocketing. Onion websites represent the main source of illegal activities across the dark web. Cyber threat intelligence (CTI) aims at pinpointing onion websites that represent the ground for cybercriminal activities. Monitoring the dark web to gather threat intelligence represents a daunting and complex task. Intelligence agencies and government based entities manually search for hidden networks, along with their connections to the dark web, to develop threat intelligence data. Nevertheless, currently available onion websites utilize dynamic IP addresses which are extremely hard to trace.
A recently published study proposed a novel Threat iNtelligence Tool (TnT) for automatic monitoring of suspicious onion websites. Data obtained via TnT is used to develop threat intelligence via predicting the popularity of suspicious websites on the dark web. TnT operates via evaluation of a pair of parameters; number of keywords and sub links, which are gathered from every website. The innovated TnT tool is tested on a group of onion websites that are currently present on the dark web. The results of these tests extracted the most popular onion websites that represent the source of data and discussion communities about various forms of criminal activities taking place on the dark web. Throughout this article, we’ll take a closer look at the TnT tool.
How the TnT tool works?
The TnT tool is developed via Python, so it can work on any platform. Authors of the paper examined 4,320 dark web onion websites which are known to be used in malicious and/or cybercriminal purposes.
A diagram that illustrates the proposed method of TnT is shown below. TnT represents an advanced dark web crawler that initiates the crawling process at a website specified by the operator of the tool. Then, it retrieves the HTML content of all the pages of the selected URL, analyze them, and compute the popularity page score for each examined page. As per the objective of the tool, keywords are selected a priori from the discussion and the webpages’ HTML content.
Whenever the TnT tool crawls a homepage of an onion website, it extracts all sub links that match the intended keywords including hacking, trafficking, tracking, attacking, and DDoS, and returns output to an advanced filtering module. The aim of the advanced filtering module is to eliminate irrelevant and redundant content, in addition to the hyperlinks pointing to other domain and images. It also eliminates duplicate sub links. For instance, any repeated sub links, irrelevant webpage links, and others links pointing to .xml, .css, .jpg, .jpeg, and .ico are filtered out.
The popularity of a URL relies on the types of content available in this onion domain but not in other onion domains. The TnT tool counts the number of sub links which correspond to each intended keyword present in each webpage. To perform this count, TnT applies a special text mining procedure. Finally, a network tree is constructed for the user-specified URL and then, a cumulative score is computed corresponding to the URL to predict its popularity ranking.
Grouping of keywords:
The TnT tool groups keywords according to the following:
1- High Priority Keywords: This group includes extremely dangerous products including weapons (e.g. FN SCAR 17S, FLIR LS32 Thermal Night Vision Black, etc) and drugs (Heroine, Flakka, Cocaine, Zombie, etc).
2- Medium Priority Keywords: This group includes various forms of shops that exist in the dark web and specialize in different products including safe email, mobile, dark web hosting, drugs, weapons, and malware.
3- Low Priority Keywords: This group include normal keywords which are more or less commonly used on onion websites such as hack, bitcoin, anonim, market, hack, onion, and dark web.
Usefulness of the TnT tool:
The TnT threat intelligence tool has been proven to be useful in automatic gathering of threat intelligence data from various hidden dark web domains. Gathered intelligence data can be extremely useful for intelligence agencies including the FBI in the US for close monitoring and subsequent blockage of websites with proven facilitation of illegal activities. TnT is based on a basic concept which states that the popularity of an onion domain increases whenever it offers multiple services to its users. Reliance on keywords, incoming links, and outgoing links has proven to be very handy when the appropriate crawler is utilized.