Our paper "Authorship Identification Using Dynamic Selection of Features from Probabilistic Feature Set", with Hamed Zamani, Hossein Nasr Esfahani, Pariya Babaie, Samira Abnar, and Azadeh Shakery, has been accepted at the Conference and Labs of the Evaluation Forum (CLEF'14). \o/
Authorship identification was introduced as one of the important problems in the law and journalism fields and it is one of the major techniques in plagiarism detection. In this paper, to tackle the authorship verification problem, we propose a probabilistic distribution model to represent each document as a feature set to increase the interpretability of the results and features.
We also introduce a distance measure to compute the distance between two feature sets. Finally, we exploit a KNN-based approach and a dynamic feature selection method to detect the features which discriminate the author’s writing style.
The experimental results on PAN at CLEF 2013 dataset show the effectiveness of the proposed method. We also show that feature selection is necessary to achieve an outstanding performance. In addition, we conduct a comprehensive analysis on our proposed dynamic feature selection method which shows that discriminative features are different for different authors.
For more details, please read this paper:
- H. Zamani, H. N. Esfahani, P. Babaie, S. Abnar, Mostafa Dehghani, A. Shakery, "Authorship Identification Using Dynamic Selection of Features from Probabilistic Feature Set", In proceedings of Conference and Labs of the Evaluation Forum (CLEF'14), 2014, pp. 128-140.