AI WITH THE BEST 
BLOG

BOOK NOW

Crunching the letters: Structuralism’s impact on Word Embedding

by Daniel East

We already know that machine learning algorithms are writing our news, but what is lesser known is how this process works. Piero Molino, talking at With The Best, puts it like this:

“There are all of these names, word vectors, distributional semantics, geometrical models of meaning, embeddings, but they all have the same meaning — to represent words as vectors in a space so we can have a semantic representation to compute similarity.”

Molino is a Machine Learning Scientist at Uber AI Lab who is taking a cross-disciplinary approach to the challenges of word embeddings. Word embedding is the process of mapping words and phrases as vectors so that algorithms can use them algorithms to map, comprehend and ultimately, to recreate language.

Basically, Piero Molino takes words and turns them into numbers. He is concerned that most of the engineers and researchers in his field are not taking into consideration the vast history of linguistics.

“If you look to structuralism, and not everyone agrees with it, but if you start there you have this idea of interconnectedness. Structuralism says that the phenomena in language are not intelligible except through their interrelation.”

This interrelation, in terms of language modelling, comes from the context of the words and their usage in a sentence rather than the discrete value of any given word. In Saussurian terms, this means separating any “sign” into two values — the “signifier” and the “signified”.

Imagine a stop sign. The signifiers of this sign are the concrete bits that make it up — the white letters against the red paint, the post, the octagonal shape. The signified is what that means in context — so by the side of the road, the sign means ‘stop’. Hanging in a sharehouse, the same sign means ‘we are petty vandals and thought this would be a cool trophy to hang above our couch’.

“The main take away is that the meaning of signs is defined by their relationships and contrasts with other signs. Which takes us back to linguistic relationships.”

Molino uses this definition of signs as an approach to programming, and as a basis for a model that assigns the values of words by looking at them in context. The core of this approach is measuring co-occurrence. It is possible to create a vector by measuring how many times a certain word appears within a text, and how many times other words occur when the same word appears.

“It was [J.R.] Firth who said, ‘You shall know a word by the company it keeps,’” Molino states, “so when we record a word through co-occurrence vectors we create a distributional semantic model. We can measure the geometric distance of word vectors as a proxy to semantic relatedness.”

So a word can be qualified by its relationship to other words, right? Well yes, and also no. The problem Molino goes to great lengths to identify is that once you determine the context, you also affix a kind of meaning.

“When mapping words in context, the aperture of that context is very important. If you measure and relate words in context through two-word ‘windows’, you will wind up with different associations than if you take a wider window of 30 words.”

To give an example, ‘dog’ and ‘bark’ might regularly occur in the same sentence, but ‘dog’ and ‘cat’ might regularly appear in the same small window. The relationship between ‘dog’, ‘bark’ and ‘cat’ is profoundly different, but in terms of word vectoring, it is the nature of their context which defines them as related.

When we move from vectoring words into vectoring sentences, the problem is that sentences are sparse and more complex ‘signs’ with more specific ‘signifiers’. Similarly, adding up the sum of the vectors in any given sentence does not provide a satisfactory analysis.

“If I have a sentence like, ‘I drive a car’ you consider the two vectors, ‘car’ and ‘drive’. We can map these occurrences and say they are related. But if I say, ‘The car was driving you’, the vectors are the same but the sentence is completely different. The sum is commutative, so it doesn’t consider word order. A compositional analysis will say these two sentences are similar, but they just aren’t.”

Molino encourages programmers to look to the field of semantics in order to understand the current engineering challenges. His understanding of words and sentences is that they should be seen in terms of targets and contexts. The tools developers use are similar means to an end, and there is no one model that can best account for linguistic complexity, but there are a great many toolsets to choose from.

“If you accept the core of structuralism, that the meaning of something is defined by its connection to other things, then it all makes sense. Not everyone agrees with this definition, but some of the meaning can be captured by this definition.”