Navigate Up
Sign In

Automatic E-Mail Classification – A Case for Machine Learning

Publication Date: Nov 07, 2019

email classifier.jpg

Written by: Mario Mallia MilanesMallia Milanes Mario.jpg

Apart from our web browser software, email software is one of the most used applications today. It is estimated that 300 billion emails are sent every day worldwide, each one of them having a specific task.  One can safely deduce that our dependency on email is great, this to the point that when email servers malfunction, we loose business.  This dependency has also been noted by spammers and hackers alike who push innocent looking mail into our inbox with the hope of getting something in return.

Issues rise when volumes of mail hit single mailboxes.  How does one separate the wheat from the chaff?  Or in computing jargon, the spam from the ham.  Call Centre operators struggle daily to deal with this issue.  An operator must decide if an email is valid, urgent or flak from an irate client. Further to that, each valid email must be accurately classified and forwarded to the appropriate solution-provider.  Not an easy task when you must deal with hundreds of emails a day and have to ensure that each one does not exceed agreed Service Level Agreement (SLA) limits.

The question that comes to mind at this point is whether this task can be dealt with by machines who seem to have an uncanny preference for boring repetitive tasks.  The answer to this question is luckily in the affirmative, but it must be added that it is not a walk in the park to do so.

Weighing the Benefits

Before entertaining on any solution, we should ask whether the benefits gained outweigh the effort expended to implement.  Without any doubt automatic classification is certainly a bonus in this scenario.  Algorithms do not tire, and consequently keep up their output independent of the number of hours on-station.  Moreover, a Call Centre would be able to tend to more emails within the same working day using automated means.  If one were to extrapolate even further, a possible solution to a routine problem could also be automatically suggested by the algorithm without the need of human intervention.

Coming to a Solution

Machine Learning (ML) can be employed to satisfactorily levitate the burden of the problem at hand.  And thankfully there are many ways to arrive a plausible solution using ML. But one moot point needs to be emphasized, ML is not a silver bullet that miraculously kills all the werewolves with a single shot.  A lot of work in preparation must be invested so that ML algorithms can give us the desired outcomes.

Prior to going into the search for a solution one should assess the situation at hand by thinking a bit about what needs to be achieved and what are the assets available for use within such a project. One can start off by addressing five simple questions to assess readiness:

1.     Do I need to partition my data into distinct types of classes? – Data classification

2.     Is there anything weird going on in my data? – Anomaly detection;

3.     Do I need to predict a likely outcome? – E.g. What will my sales look like next quarter? – Linear regression algorithms.

4.     Are there any hidden structures in my data? – Clustering Algorithms;

5.     What should I do from this point on? – Reinforcement learning.

These questions will hint at the direction that must be taken in the journey towards automation.

Learning Methods

Once we have decided on what is really needed to solve our problem, we will now look at the main learning methods that could be used to help us analyze and use our data. Three classes of learning methods can be used in this case.  These are:

1.     Supervised Learning:

a.     In this case data classes are pre-defined by the user.

b.     The algorithm must be supplied with test cases which enable it to associate live data with the test cases.

2.     Unsupervised Learning:

a.     In this situation no test data or pre-defined classes are set.

b.     Data is analyzed by the algorithm and the resultant classification is then discovered by the algorithm itself;

3.     Reinforcement Learning:

a.     In this scenario we would need the algorithm to behave in a pre-defined way when specific situations arise.  Hence when data is analyzed through an algorithm, desired outcomes are favorably weighted, and undesired outcomes are penalized.  Hence such an algorithm would naturally favour positive outcomes over negative ones thus reinforcing its behavior.

The e-Mail Scenario

email classifier 2.jpgGoing back to our e-mail example we can safely say that our need is to classify email into various categories.  If we were to take the unsupervised learning route one could seek to identify clusters or groups within email data for instance by considering the subject, the recipient, the sender and even the body of the mail.  We would then leave the algorithm to its own devices to farm clusters without any human intervention.

Otherwise if we take a supervised approach.  The data coming from email metadata would have to be pre-classified and a training data-set produced.  Once the training set would have been produced and run through the algorithm would be able to assimilate mail to the pre-established categories.  When we are satisfied with the desired outcome a reinforcement learning algorithm would help us taking correct decisions in a uniform manner.

Classification of data can be done using a variety of algorithms. Usually the following algorithms are used:

1.     Naïve Bayes;

2.     Neural Networks;

3.     Random Forests or Jungles;

4.     Support Vector Machines.

Each algorithm has its efficiencies and deviancies, and it is not rare that the performance of the each one is studied empirically so that the better one can be used.  Performance considerations focus on the proper classification of data and the speed of classification.


As one could expect, the implementation part of such a project is not trivial, providentially there are many tools that help ease the burden.  If one were to decide to take up such a task, we are only spolit for choice. Software from both the Open Source and Commercial-off-the-Shelf communities abound.  Libraries and utilities like Sci-Kit Learn, Keras, Pandas, NumPy, Matlibplot and TensorFlowTM can be used in conjunction with Python to set up your environment.  Many of the common ML algorithms are provided through these libraries as functions to Python. The programmer would have freehand over the code that would be designed and could also benefit from ready-made tried-and-tested complex algorithms.

email classifier 3.jpgApart from the open source route, one can take on commercially available software.  Cost here is a definite disadvantage, but in return one gets a wealth of support and experience thrown in by the company backing the software.  One product which, in my opinion, stands out from the crowd is Microsoft’s Azure.  Azure has an extensive set of features that it makes available, one notably being the Azure Machine Learning Studio, also known as Azure ML.

Azure ML offers a decent range of multi-class or two-class classifiers which can be used for text or email analysis.  Supplying the user with an arsenal of algorithms covering logistic regression, neural networks, Bayes classifiers and more.  These features can also be exposed on the web through APIs which facilitates the dissemination of you work.


Apart from the correct grouping of an email there are many side effects to a classification exercise that can be exploited. Namely one could identify senders that would be linked to spam or malware.  Customers could also be targeted for specific marketing mailshots.  Drilling further one can also identify which customer is more likely to react positively to our mail. One can also study the sentiment of customers over time.  Are they more positive towards our service, or would they be more hostile?

One aspect of ML that may easily be overlooked in the enthusiasm for the delivery of an automated solution is its maintenance.  As with people, knowledge must be kept up to date.  Algorithms must be re-trained every so often to keep them well tuned to current classification needs.  This may take time but, in the end, will surely pay out its dividend.