Tag Archives: NLP

What are Cosine Distance, Cosine Similarity ?

Cosine Similarity is the cosine of the angular difference between two vectors which is equal to the dot product divided by the sum of the magnitudes. ( wikipedia / wolfram )

\text{similarity} = \cos(\theta) = {A \cdot B \over \|A\| \|B\|} = \frac{ \sum\limits_{i=1}^{n}{A_i \times B_i} }{ \sqrt{\sum\limits_{i=1}^{n}{(A_i)^2}} \times \sqrt{\sum\limits_{i=1}^{n}{(B_i)^2}} }

It is used in word2vec to find words that are close by.

It does not account for magnitude only angular difference but it can be calculated fast on sparse matrixes with only non-zero entries needing calculation and so has found a place in text classification.