Alerts can be based on clusters. For example, if 10 messages are clustered, and have met a threshold severity or priority, an alert can be generated. Once messages are formed into clusters, the clusters can be sorted based on cluster count. The system can provide an interface for creating a recipe from a cluster.
Similarity Clustering
Using the hypothesis that although an attacker may generate randomized and/or forged data for each email sent within a phishing campaign, due to the nature of mail infrastructure, attacker proficiency and attachment markup, it is possible to dynamically cluster related emails. This is accomplished by comparing distinct sections, which are then used to generate an overall score that represents the similarity between emails.
Fuzzy hashing and string similarity algorithms can be used to dynamically cluster related phishing emails through the use of Phishing Similarity Indicators (PSI). A weighted average of indicators is calculated to produce an over all Phishing Similarity Score (PSS). This similarity score is then used as the basis to create clusters of related emails, with higher value scores resulting in (stronger) clusters.
As this does not produce absolute matches, this can result in emails having related indicators, but overall have no real cluster value. An example of this would be two emails with the same delivery time, although these may have a high value for this specific indicator, having no other related indicators would produce a low similarity score (PSS) resulting in the emails not being clustered.
Phishing Similarity Indicator (PSI)—The result of comparing properties between emails. The Indicator also includes a weight, which is used to amend the result within the overall similarity score between datasets. A PSI score will have a value between 0-1