Category Archives: software

Ubuntu Bash Readline ( Command Line ) Shell Terminal Keyboard Shortcuts

Alt+Y - cycle kill ring 
ctrl+Y - paste from ring
ctrl+k - kill after cursor
ctrl+u - kill before cursor
ctrl+a - start of line
ctrl+e - end of line
ctrl+spc - begin select
ctrl+r - reverse history search 
sudo !! - make me a sandwich
cd !$ - last argument ( i.e. ls )
alt+. - paste previous arguments
alt+f, alt+b - same as ctrl + <- or-> - move a word

 

Binary File Hex Viewers on Ubuntu 12.04

I tried command line beav and gnome Jeex

As these are both binaries I am looking for a binary viewer / hex viewer – I will try beav

sudo apt-get install beav

running beav on the binary from demo-words.sh based on the 1st billion bytes of wikipedia:

beav vectors.bin

reults in :

0: 37 31 32 39 30 20 32 30  30 0A 3C 2F 73 3E 20 07  71290 200.</s> .
10: F2 DE 3A 42 8B 9D 3A 0D  43 FC 3A E8 25 46 3A A3  ..:B..:.C.:.%F:.
20: 0E D9 BA C1 C0 02 BA 6F  1C D6 BA 0A 52 A3 B9 41  …….o….R..A

trying jeex hex-editor

sudo apt-get install jeex

Why early man didn’t develop Hex is beyond me after all we have 10 fingers that’s enough for signed 8 bit numbers with a carry bit much more useful than decimal.

Well much as I love the sight of Hex – flashback to 1970’s NASCOM Z80 Hex code ! This is impenetrable, no easily discernible patterns – so I need a text file and word2vec has a flag for that.

Python Simple HTTP Server – Web Serve Files over HTTP to properly simulate file permissions

Problem : Everything worked till I put it online ; site stopped working when uploaded to the server ?

When working with Javascript Files, Libraries, Includes, jQuery, WebGL textures and the like of HTTP included files from a web page I find it best to open files over HTTP from a web server as there can be cross site permission restrictions on file types and such like – I have run into this problem when including images as WebGL textures using three.js.

[SOLVED]
Python Simple HTTP Server – runs in a directory and serves those files and folders over HTTP on localhost port 8000

navigate to the file directory in a BASH prompt ( or shell or terminal ) and run :

python -m SimpleHTTPServer

and open http://localhost:8000 in a web-browser

which will load an index page with the files as links or index.html if that is present

And the HTTP transactions are logged to the console :D

Vary the PORT by adding a port number 8020 to the command thus :

python -m SImpleHTTPServer 8030

and you can serve multiple directories

As a bonus you can connect over your local network at the IP ( 127.0.0.1 is localhost ) address of the Server thus

http://192.168.1.5:8000

or use a dynamic dns service or DMZ or port forwards on your router to serve the site globally ( prolly super insecure so do not ever do this ever :( )

 

[Solved] Ubuntu 12.04 – How to remove volume change sound effect noise

Ubuntu 12.04 makes a very loud and terribly annoying put putting sound effect when the volume is changed.

To remove Applications -> System Tools -> System Settings –> –> Sound (icon) –> Sound Effects and(tab) –> here you can change the alert sound volume and switch it off. [SOLVED]

 

sinclair

October 4, 2013

Undertanding the C-Bow and Skip-gram models.

From the Arxiv Preprint : Efficient Estimation of Word Representations in Vector Space byTomas Mikolov,Kai Chen, Greg Corrado,Jeffrey Dean”

Natural Language Processing or computational linguistics can attempt to predict the next word given the previous few words.

A gram being a chunk of language possibly 3 characters, probably  a word; an n-gram is a number of these occuring together.
Google’s N-gram viewer ( word frequency by year )

The Mikolov model is a hierarchical softmax encoded as a Huffman Binary Tree.

The vocabulary is represented as a Huffman Binary Tree which dramatically saves processing.

The softmax is an activation function common in neural net outputs as it has a sigmoid curve and all the outputs sum to 1 – each output represents an exclusive probability.

\sigma(\textbf{q}, i) = \frac{\exp(q_i)}{\sum_{j=1}^n\exp(q_j)} \text{,}
where the vector q is the net input to a softmax node, and n is the number of nodes in the softmax layer.

The hierarchical softmax reduces the dimensionality component of the computational complexity to  the log of the Unigram_perplexity of the dimensionality.

for heirarchical softmax the paper cites :
Strategies for Training Large Scale Neural Network Language Models by Milokov et al, 2011
A Scalable Hierarchical Distributed Language Model by Hinton & Mnih, 2009
Hierarchical Probabilistic Neural Network Language Model by Bengio & Morin, 2005

 

Mikolov Sutskever Dean Corrado Word Vectors Machine Translation

CBOW predicts a word given the immediately preceeding and following words, 2 of each – order of occurence is not conserved.

Skip-gram predicts the surrounding words from a word. Again 2 preceeding and 2 following but this time not immediately subsequent but skipping a certain constant quantity each time.

In “Exploiting Similarities among Languages for Machine Translation by Mikolov, Le & Sutskever” PCA dimensionality reduction is applied to these distributed-representations (illustrated above).

The architectures of CBOW and Skip-gram are similar – CBOW likes a large corpus and Skip-gram prefers smaller corpora.

From Google Groups Milokov posts :

“Skip-gram: works well with small amount of the training data, represents well even rare words or phrases
CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words
This can get even a bit more complicated if you consider that there are two different ways how to train the models: the normalized hierarchical softmax, and the un-normalized negative sampling. Both work quite differently.”