31 Jan

Using lexrank to summarize text(Using Sumy.py)

Text summarization is a difficult challenge that is faced by  NLP researchers. Currently I am experimenting with a few text-summarization algorithms in my projects. One of them is LexRank. It is a graph based algorithm that uses a similarity function(cosine similarity in the original paper) to compute similarities between different sentences. It uses a pre-defined threshold to build the graph of the documents, creating an edge between 2 sentences(nodes) every time the similarity is above the threshold. They also used a Pagerank-like scheme to rank the sentences(nodes).

In this post we shall use sumy, a python based library that implements lexrank and a few other summarisation libraries. Following is an example code that reads from a plain text file and generates a summary.


09 Jan

Though Linear Regression may seem somewhat dull compared to some of the
more modern statistical learning approaches described in later chapters of this book, linear regression is still a useful and widely used statistical learning method. Moreover, it serves as a good jumping-off point for newer approaches: as we will see in later chapters, many fancy statistical learning approaches can be seen as generalization or extensions of linear regression. Consequently, the importance of having a good understanding of linear regression before studying more complex learning methods cannot be overstated.
-Gareth et al,  An Introduction to Statistical Learning
From an answer to one of my questions on Coursera’s forums on the relevance of Regression in modern Data science.

04 Jan

Diving into Deep Learning Part I: Perceptrons

This new years I am gonna be working on deep learning. I have decided to experiment with deep learning techniques in biomedical information retrieval. I have read about Neural networks, worked with them on some projects, and even implemented my own Neural networks library in Javascript. But this time am taking baby steps and reviewing all basic concepts along my way.

Am using Neural networks and deep learning, which seems to be a free online collection of essays, being converted into a book, as my reference point. I am also using Deeplearning.net‘s reading list. Deep learning by Bengio et al. is a work in progress, the draft is freely available online and using that too.

Using Perceptrons for Implementing Logic gates

Logic gates are simple to understand. They are often used as examples to introduce students to Neural networks. Now if we are to implement an OR gate, we basically have to implement the following truth table:

x y Z
0 0 0
0 1 1
1 0 1
1 1 1

A perceptron is defined by 2 parameters:  w and b, the weight vector and the bias respectively. Now consider the perceptron shown in the following figure. It implements the OR gate.  Try and check the perceptron for different values of x and y and observe the output value. You’ll find that it mimicks the OR gate.

Perceptron for OR gate

Perceptron for OR gate