31 Jan

Using lexrank to summarize text(Using Sumy.py)

Text summarization is a difficult challenge that is faced by  NLP researchers. Currently I am experimenting with a few text-summarization algorithms in my projects. One of them is LexRank. It is a graph based algorithm that uses a similarity function(cosine similarity in the original paper) to compute similarities between different sentences. It uses a pre-defined threshold to build the graph of the documents, creating an edge between 2 sentences(nodes) every time the similarity is above the threshold. They also used a Pagerank-like scheme to rank the sentences(nodes).

In this post we shall use sumy, a python based library that implements lexrank and a few other summarisation libraries. Following is an example code that reads from a plain text file and generates a summary.