Header Keys
Related phishing emails will contain similar Header Keys when delivered to targets within an organization, as they will have traversed similar mail infrastructure and deliver paths. Even if Received Path or other Headers have been forged, legitimate mail infrastructure will add valid headers to the email. This can be used to compute a phishing similarity indicator.
To create a lowercased raw key value the Header Keys are sorted and appended into a single string and an SSDeep hash is computed for the header key block. This hash is then used to compute a similarity score with other emails.
Sender
Related phishing emails can contain a similar Sender by either using the same domain to send emails or generating similar sender email addresses. This can be used to calculate a domain and/or Sender phishing similarity indicator.
Two lowercased indicator values are created for the Sender; one for the sender domain and one for the overall sender email address.
Domain similarity is calculated by doing a bigram comparison of the sender domain with the TLD removed.
Sender similarity is calculated by doing a bigram comparison of both senders.
Delivery Time
Phishing emails delivered to a target organization within close proximity can be used as a phishing similarity indicator. The time drift in hours is calculated between emails and used to calculate a similarity score.
The following values can be used in testing based on the number of hours drift between delivery times.