How Do Word Senses Evolve?
To extract meaningful information from the troves of data being collected by “smart” devices such as mobile phones, computers will need to be able to process language like humans—a branch of artificial intelligence known as natural-language processing.
A paper published online in the Proceedings of the National Academy of Sciences is the first to look at 1,000 years of English development and detect the kinds of algorithms that human minds have used to extend existing words to new senses of meaning. This “reverse engineering” of how human language has developed could have implications for natural-language processing by machines.
“To communicate successfully with humans, computers need to be able to use words flexibly but following the same principles that guide the use of language by humans,” explains Barbara Malt, director of Lehigh’s Cognitive Science Program and one of the project collaborators.
Words accumulate families of related senses over the course of history, Malt says. For example, the word “face” originally meant the front part of a head, but over time also came to mean the front part of other objects, such as the “face” of a cliff, and an emotional state, such as putting on a brave “face.”
“This work,” says Malt, “was aimed at investigating the cognitive processes that create these families of senses.”
The team includes lead researcher Yang Xu, a computational linguist from the University of Toronto, Mahesh Srinivasan, assistant professor of psychology at the University of California, Berkeley, and Berkeley student Christian Ramiro.
The team identified an algorithm called “nearest-neighbor chaining,” in which points of input are analyzed as a hierarchy of clusters, as the mechanism that best describes how word senses accumulate over time. The model captured the chaining process that occurs as emerging ideas are expressed using the word with the most closely related existing sense.
“It is an open question how the algorithms we explored here can be directly applied to improving machine understanding of novel language use,” says Xu.
After developing the algorithms that predicted the historical order in which word senses have emerged, the team used the Historical Thesaurus of English database to test the predictions against records of English over the past millennium. The findings suggest that word senses emerge through an efficient mechanism that expresses new ideas via a compact set of words.
“When emerging ideas are encoded in English, they are more likely to be encoded via extending the meaning of an existing word than through creation of a new word,” says Malt. “A popular idea may be that when you have a new idea you need to make up a new word for it, but we found this strategy is actually less common.”
Last year, the same team identified a set of principles governing another aspect of language development: metaphorical mapping, in which word senses evolve from literal domains to metaphorical domains. Words that originally had only concrete meaning (like “grasping” a physical object) have grown to have meaning in the abstract (as in “grasping” an idea).
The team was the first to show that this progression has followed a set of psychological and cognitive principles and that the movement can be predicted.
“Together,” explains Srinivasan, “our studies are beginning to show that the ways in which words have developed new meanings is not arbitrary, but instead reflect fundamental properties of how we think and communicate with one another.”