Expanded N-Grams for Semantic Text Alignment

FingerPrintText alignment is a sub-task in the plagiarism detection process.  The aim of a text alignment is to detect pairs of related regions from two distinct documents. How these two regions are related and also the length of the regions should be determined regarding the application. Considering the plagiarism detection as an application of text alignment, the relatedness between the regions would be not only in terms of concept and semantic, but also in terms of lexical and grammatical structures. Also about the length of the regions in case of the plagiarism detection, it is reasonable to be at least as long as a short paragraph; while in parallel corpus construction it may be fine for the length of the regions to be as long as a sentence.

As a team from IIS lab at the University of Tehran, we participated in PAN2014 plagiarism detection challenge. Generally, our approach is based on mapping text alignment to the problem of subsequence matching just as previous works. We have prepared a framework, which lets us combine different feature types and different strategies for merging the features.

We have proposed two different solutions to relax the comparison of two documents, so as to consider the semantic relations between them.  Our first approach is based on defining a new feature type that contains semantic information about its corresponding document. In our second approach we have proposed a new method for comparing the features considering their semantic relations. Finally, We have applied DBSCAN clustering algorithm to merge features in a neighborhood in both source and suspicious documents. Our experiments indicate that different feature sets are suitable for detecting different types of plagiarism.

To read more about our approach, please read this article:

2 thoughts on “Expanded N-Grams for Semantic Text Alignment


    We offer you the BEST SEO STRATEGY for 2020, my name is Paige Devaney, and I'm a SEO Specialist.

    I just checked out your website mostafadehghani.com, and wanted to find out if you need help for SEO Link Building ?

    Build unlimited number of Backlinks and increase Traffic to your websites which will lead to a higher number of customers and much more sales for you.

    SEE FOR YOURSELF=> https://bit.ly/3dhrKtA

  2. YOU NEED QUALITY VISITORS for your: mostafadehghani.com

    My name is Hung Roman, and I'm a Web Traffic Specialist. I can get:
    - visitors from search engines
    - visitors from social media
    - visitors from any country you want
    - very low bounce rate & long visit duration

    CLAIM YOUR 24 HOURS FREE TEST => https://bit.ly/3h750yC

Leave a Reply

Your email address will not be published. Required fields are marked *