Tag Archives: linguistics


October 4, 2013

Undertanding the C-Bow and Skip-gram models.

From the Arxiv Preprint : Efficient Estimation of Word Representations in Vector Space byTomas Mikolov,Kai Chen, Greg Corrado,Jeffrey Dean”

Natural Language Processing or computational linguistics can attempt to predict the next word given the previous few words.

A gram being a chunk of language possibly 3 characters, probably  a word; an n-gram is a number of these occuring together.
Google’s N-gram viewer ( word frequency by year )

The Mikolov model is a hierarchical softmax encoded as a Huffman Binary Tree.

The vocabulary is represented as a Huffman Binary Tree which dramatically saves processing.

The softmax is an activation function common in neural net outputs as it has a sigmoid curve and all the outputs sum to 1 – each output represents an exclusive probability.

\sigma(\textbf{q}, i) = \frac{\exp(q_i)}{\sum_{j=1}^n\exp(q_j)} \text{,}
where the vector q is the net input to a softmax node, and n is the number of nodes in the softmax layer.

The hierarchical softmax reduces the dimensionality component of the computational complexity to  the log of the Unigram_perplexity of the dimensionality.

for heirarchical softmax the paper cites :
Strategies for Training Large Scale Neural Network Language Models by Milokov et al, 2011
A Scalable Hierarchical Distributed Language Model by Hinton & Mnih, 2009
Hierarchical Probabilistic Neural Network Language Model by Bengio & Morin, 2005


Mikolov Sutskever Dean Corrado Word Vectors Machine Translation

CBOW predicts a word given the immediately preceeding and following words, 2 of each – order of occurence is not conserved.

Skip-gram predicts the surrounding words from a word. Again 2 preceeding and 2 following but this time not immediately subsequent but skipping a certain constant quantity each time.

In “Exploiting Similarities among Languages for Machine Translation by Mikolov, Le & Sutskever” PCA dimensionality reduction is applied to these distributed-representations (illustrated above).

The architectures of CBOW and Skip-gram are similar – CBOW likes a large corpus and Skip-gram prefers smaller corpora.

From Google Groups Milokov posts :

“Skip-gram: works well with small amount of the training data, represents well even rare words or phrases
CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words
This can get even a bit more complicated if you consider that there are two different ways how to train the models: the normalized hierarchical softmax, and the un-normalized negative sampling. Both work quite differently.”


New Puctation – like Math symbols – Smilies have utility

Before craptacular smilies became a semiotic graveyard of meaningless ani-sparkle the internet had a problem.


There was no punctution for tone of voice and transatlantically humour failed to translate.

Flame wars were frequently fanned by misunderstood jokery

<sarcasm>Democracy</sarcasm> markup was tried – but non HTML chat and text email needed something plain text – like an exclamation point, question mark.

And from old school telegraphy came the answer – sideways on a semi colon a hyphen and close bracket.


Side on a winking smiling face – now you knew when someone was joking and many a flame war was averted :-) yay

Emote – icons => emoticons

Now adays historically unaware snobs decry smileys as teen uni-rainbow-sparkle-corn but it is legitimate and necessary punctation – sure you could write “just joking” – just like you can write “STOP” in a telegram but dots as full stops save bandwidth and typing.

:-(  sad    8-)  nerd glasses    :-o  shock / suprise  and then the nose was eliminated and direction abandoned – sometimes to evade gif substitution.

thus ( => )

:D comedy
tragedy D:

In Japan the smiley was right way up, cat based and called Emoji
zen calm

Motion is sequential narrative stylee

Doppler effect in 3 frames
o.O O.O O.o

Replacing ascii smileys with animated gifs is like sticking a brand logo over sumi-e caligraphy – ASCII art is 256 chars (8 bit) beautiful.

Abbreviations, portmanteau, compound words, spoonerisms are the evolution of writing to encompass what people say and we are richer for it. ʘ‿ʘ

Unicode Emoji is teh awesomer.

ʕノ•ᴥ•ʔノ ︵ ┻━┻
is a bear flipping a table = exasperation

shrug = wtf

and the classic reddit staple Emoji of dissaproval

<3 <3 <3


8====D —— (o)(o)