For exemplary purposes, the clustering module 900 shown in
If one of the messages 910 is not sufficiently close to any of the clusters, the clustering module 900 may create a new cluster 970 and assign that message to the new cluster 970. For example, message 4 914 may match a multiple of rules which are not in common with any of the other clusters. The cluster module 900 may then indicate that message 4 914 should be classified under a new cluster 970 and assign message 4 914 to that cluster.
Clustering can be based on the domains of links in an email body, or attachment content, or attachment hashes. Other clustering techniques contemplated include k-means, deep learning (such as a convolutional neural network), or through various other machine learning techniques, such as natural language processing.