Category Archives: Research

3 Google Engineers observe consistant hyperspace between languages when dimensionality is reduced by a deep neural net

“How Google Converted Language Translation Into a Problem of Vector Space Mathematics”

“To translate one language into another, find the linear transformation that maps one to the other. Simple, say a team of Google engineers”

 

Machine TranslationZarjaz ! Spludig vir thrig !

Visualised using t-sne a clustering dimensionality reduction visualisation.

t-sne was developed by students of Geoffrey Hinton’s Colorado Deep Neural Net Group – indeed one of the three authors of the paper Ilya Sutskever studied with Geoffrey Hinton at Colorado and is his partner in a deep neural network startup that was bought up by google almost apon it’s inception.

All Mimsy in the Borogroves.

The full paper at arxiv :

Exploiting Similarities among Languages for Machine Translation

from Arxiv

The code & vectors (word2vec in C) that access this hyperspace for the purposes of translation has been made available by the researchers so you can explore it for yourself. I am sure that many more papers will stem from observations made from this hyper-plane of meaning.

Commented Python Version of word2vec – associated blog announcement
HNews Discussion of word2vec
Example Word2vec App – finds dissimilar word in list.
Python Wrapper & yhat R classifier web interdace for word2vec

2nd paper : Efficient Estimation of Word Representations in Vector Space – anothe google paper using word2vec vector(”King”) – vector(”Man”) + vector(”Woman”) = vector(“Queen”)

Reduction of the exceedingly large space of all possible letters to a point of view where meaning is a movement common to every language excitingly points back before babel to the Ur-language; the Platonic Forms themselves at least semanticly.

Babelfish Babel Fish from the HitchHickers Guide to the Galaxy

I believe this space is somehow derived from intuitions made about the space defined by the weights of a deep neural network for machine translation of the google corpus of all the web pages and books of humankind.

[this proved correct the vectors are the compact represention from the middle of a deep neural net]

So in a way this is a view of human language from a machine learining to translate human languages.

In other words this deep-neural-net AI is the guy who works in Searle’s Chinese Room translating things he doesn’t understand – has no soul nor realises anything yet his visual translation dictionary appears to reveal the universality of movement that is meaning common to all human languages and discoverable within human cultural output.

Is this an argument for strong AI or weak ?

I think a more biologically inspired analogy is a Corporation. A Business where many desk workers receive and process messages between themselves and the outside world. Few of the workers posses a complete overview of the decision process and those that do are hopelessly out of touch with how anything is really actually achieved. Yet each presses on through self-interest, necessity, hubris and a little alturism generated by related-genetic-self interest tropically seeking the prime directives established in the founding document and more loosly in the unwritten orally transmitted office-culture. Is a Corporation intelligent / self-aware, or even conscious ? Probably not, but it may think of itself as so and act as though it is. So I hold a corporation is to a human mind what a mind is to an individual neuron and thus ‘intelligence’ does not truly exist in any one individual, machine, procedure or rule but is apparent in the whole system just as mind exists in the human brain. But yet I think that does not account for soul and it a not unuseful model of human behaviours that we appear possessed of souls as well as minds.

Hyperbolically, reductio ad absurdum, this implies perhaps that humanity is itself a type of mind = perhaps yet more visible this noosphere now more highly connected as a peer-to-peer internet, web and cellphone connected network. Indeed the interaction of us with the biosphere has a type of mind and so forth that the universe itself is itself predictable as analagous to a thinking being. Is the hubris of this Universal ‘God’ as is our hubris to say ‘I am’ yet made of individual cells.

Paraclecus, Yeats and Benoit Mandelbrot : as above so below; And find Everything in a grain of sand; How long is a coastline depends on how closely one looks.

I would suggest this is the evolution of a machine intelligence.

Certainly looks like The hyperplane of meaning.

In other news Google Translate for Animals ;-)

“Every conscious state has this qualitative feel to it. there’s something it feels like to drink beer and its different from  listening to music or scratch your head or pick your favourite feeling … all of these states has a certain qualitative feel to them.

Now because of that they have a second feature to them namely that they are subjective. They are ontolologically subjective in the sense that they only exist insofar as they are expereienced by a human or animal” – John Serle

Kant’s Transcendantal unity of aperception is Searle’s Unity of Consciousness and perhaps the body, soul and mind are analgous.

 

 

 

#ILoveScience #Verstehen
#ILoveArt #heArt

 

How and Why to Blog – Epistimology – weblog

Blog – from portmanteau web log => weblog => we blog => blog

like a Captain’s log stardate 2013…

Now a Blog is a diary – the primary index is date-of-posting.

But back in the day it was weblog, an annotated bookmarks file or a web history dumped and sometimes quickly praiseed.

Surfing meant folling links not jumping in from search. Search was rubbish in the 1990’s.

Altavista ( early search engine ) only ever indexed at most 3% of the web (and less each day aas itthe web grew faster than the Altavista web-spired spidered new content).

It had no pagerank and was keyword based only – synonyms were bugs, and misdirectable / spammable by huge lists of user invisible keywords\

\<Head\> metadata was much more imporatant.

To find a site again one to retrace each click.

Fundamentally blogging is a research tool – mostly read by the author.

If I write a how-to or tutorial it is mostly for me when 6 months later I need to do the same thing again.

The difficulty is : “I only realised today the relevancy of site I saw a link for yesterday but I can’t find it in my history and can’t remember how to get there.”

Folksonomies like tags and tag-clouds as well as heirarchicle categories came later.

A blog was curated, pruned praiseed links, sometimes sorted by topic and often ‘gems’ or best-in-class.

‘weBlog’ was synonymous with ‘Links page’

Even the early google had limited reach.

This was the primary mode before Google made search work.

‘Surf the Web’ meant to catch an info wave – the first links from a search engine were usually irrelevant synonimic mistakes – if the page was on track then hit the links for curated on subject gems.

The Web is a Web of links. Links were the ONLY method to get to most of the web – from one place to the next following a link trail – Search engines had tiny indexes and next to no intelligence before Google and Pagerank ( pagerank itself is based on counting inboind links as citations ).

Today links are sent and most pages come from search – one rarely follows a trail or ‘seam/vein’ of information.

Thus pages thus linked would more than likely be on track and best-in-class and then goto their links pages… rinse… repeat …er … profit

So an infowave was caught and one surfed it twisting the board by leaning into the flow – thus surfing – the term is an anachronism. Google made it somewhat irrelevant.

Web rings had next site buttons at the bottom of every page.

Next / PREV meant websites whereas today they mean in-article pagination or next entry in a diary.

<a title=”Hacker News ( Silicon Valley Startups and Comp Sci )” href=”http://news.ycombinator.net” target=”_blank”>HackerNews</a> for 2013-Aug-08-21:30PM (save) —

$ wget –max-depth=1 to capture the discussions and then links from the discussions — caveat : The menu and logo and footer links are NOT the discussion.

*** discussions and links should be captured

*** In page Linked Images and Code – and CSS should be grabbed HTML alone is useless

*** Screenshots – as a baseline format (standards / bowsers change ) and for thumbnails

(*) The question is how to effectively blog :

**Primary Audience : My own notes – other readers are secondary

**Need to save archival copies of links in case they go down / change

***SHould change, i.e. updates or further comments be recorded / subscribed to – a wikiwiki style of svn edit history – or git record diff patches — [A] Minor issue

****Permalinks are not a solution to this – but perms should be linked as should original link

**Surf Method – Load news page – open discussions in new tabs – sometimes the Original Article (but the discussion keeps this linked from title – the reverse is not true (discussion not linked from Original Article)

**Note HN churns fast and expires further pages (? – are the pages timestamped to the 1st page to stopp article repeating — reddit repeats hn doesn’t ) so pulling page 2 nd 3 immediately is good or it can’t be done -(actionable)- should script pulling the 1st 250 pages then — or as many as needed until repeats are most of the links.

-(actionable)- should script pulling the 1st 250 pages then for all sites eithed cache expiry or reasons of churn.

with RSS we can subscribe to sites — the ‘new queue’ and so never miss a post.

(Q) Why is Web History so very broken – I want everything I see saved as I see it and a branching searchable sortable history.

History is worse now than in Netscape 3 back in 1994.