Email Management in Multilingual Environments

Today, Email has become one of the most prevalent communication media that allows people to exchange information. The ease of this communication has led to producing a large volume of emails that causes a problem termed "Email Overloading". Nowadays, solving the email overloading problem is pressingly urgent and "Email Management" has emerged as a new branch of research to alleviate this problem. On the other hand, with the recent rapid diffusion of the email service over the international World Wide Web, multilinguality across the email data is an inevitable phenomenon. Thus, it is becoming more and more important to provide multilingual support for email management techniques. This thesis focuses on two important tasks in email management, “Reconstructing Conversation Threads” and “Automatic Email Filing,” with regard to multilingualism in email data.

In my MSc thesis, I focused on the problem of "Email Management in Multilingual Environments". In particular, I focused on two important tasks in email management, "Reconstructing Conversation Threads" and "Automatic Email Categorization" with regards to multilingualism in email data.

Reconstructing Conversation Threads

An email conversation thread is defined as a topic-centric discussion unit that is composed of exchanged emails among the same group of people by replying or forwarding. We proposed two different approaches to reconstruct conversation threads in email corpora based on an evolutionary algorithm and machine learning.

For more details, please take a look at here.

Automatic Email Categorization

Automatic email categorization is also a fundamental problem in email management. We studied the challenges of email categorization and proposed a new learning method to automatically move emails into folders viewing the problem from a different angle.

For more details, please take a look at here.

Dealing with Multilinguality

In my thesis, in the proposed methods for email management, text similarity is exploited as an important feature to determine the content relationships among the emails. In order to process multilingual emails, we need to deal with two types of Multilinguality: Intra-Email Multilinguality and Inter-Email Multilinguality (Mixed-language emails). To be able to process multilingual email data, or in general multilingual documents, we introduced a new robust method based on language modeling framework. This method builds a multilingual model as language independent representations for documents. The estimated multilingual models are employed to determine the similarity of mixed-language and multilingual documents.

For more details, please take a look at here.

Email Datasets

Furthermore, In order to be able to conduct experiments to evaluate the performance of the proposed methods we provided some email datasets and make them publicly available. Here is the list of these datasets:

For more details, please take a look at here.

We have done several experiments and provided discussions on the achieved results. Experiment results show that compared to the previous methods, the proposed methods not only improve the performance of email management tasks but also enhance the time efficiency.`

Here is the presentation slides of my thesis defence.

MSc thesis related publications


Here are some useful links for research on email data!