15 Nov

Analyzing sentiments on twitter Part I: Streaming twitter data

Sentiment analysis is a widely studied field in the field of Natural language processing. In this series we try to understand sentiment analysis. We’ll write our own quick-fix sentiment analyzer. In subsequent posts we’ll explore techniques to visualize the social media sentiment.

Streaming Twitter Data

In this post we shall track twitter on a hashtag and push those tweets live to the browser. We assume that you have Node installed. We assume you know how to configure and run an Express web server on node.

npm install node-tweet-stream

This installs node-tweet-stream which lets you stream twitter data on your node server. We push these tweets to the client whenever we receive any tweet. The architecture is as follows:

stream twitter to browser

On the server side we emit the tweet every time we receive it:

We listen for tweets on the client side:

So thats pretty much all you need to do to get a stream of tweets on your browser. Once you have this stream you can add your presentation logic to create visualizations or other fancy stuff with the tweets. You could also do computationally intensive work on tweets on the server and push it the result to the client along with the tweet.

12 Nov

Linear regression explained. With javascript code (Lineareg.js)

Disclaimer: I don’t advice using Javascript for data science. I do write/use learning libraries at times just for the fun it and ofcourse Atwoods’ law. This post is about the theoretical background for linear regression not the Javascript implementation.

I while ago I wrote lineareg.js, a Javascript library that lets you fit a line on a dataset. You can find the source code on github or install it from npm. I realized I never went about describing it. So here it is:

The crux of the code is in the cost computation.
The hypothesis h=\theta.X is our prediction vector.
Difference D = h-y
The cost function J = \frac{1}{2m}\sum_{i=1}^{n}D

Now we need to minimize this cost function. For this we use gradient descent to minimize it. To find the local minima gradient descent takes a step in the greatest negative gradient in every iteration. The number of iterations.

You can find the source code on Github.
Or install it from npm “`npm install lineareg“`

01 Nov

The goal of this blog

Over the last year I have done a lot of work with data. I took courses, both online and physical, on machine learning and data science. I did experiments with twitter data, with biomedical data, and synthetic data. During the summers, I worked on my Google summer of Code project, on creating an interactive data visualization environment. Before that I worked with team IRLabDAIICT on a biomedical information retrieval task, where I also wrote my first paper. When I look back I regret not having journaled my experiments. The aim of this blog is to serve as a journal of sorts, for my experiments on data. This is where I write about the ideas I get, the experiments I did, and the results.  These are works that are interesting and intriguing, they may not have great scientific value, but am sharing this with the hope that someone might find them useful.