The internet is a vast network of knowledge, containing the sum of humanity’s greatest accomplishments, algorithms, and stories. However, accessing this information usually requires the critical eye of a human user. Greg Durrett, a Texas Computer Science Assistant Professor, is using statistical machine learning to change just that.
Under the broad scope of natural language processing, one of Durrett’s interests is "entity identification," or "figuring out who the actors are in a given text and how they relate to each other.” For example, phrases like “President Obama,” “Barack Obama,” and “the former president,” are all word sequences that have the same meaning. While this might be obvious to the user, computers often don’t recognize these similarities.
Durrett's research teaches computers how to identify subtle nuances of natural language. In addition to identifying common entities, his model also links the actors in a given text to their respective disambiguated Wikipedia pages. The model gathers and applies background knowledge and contextual tools, like a human would, to understand the difference between Texas longhorns (the breed of cattle), and Texas Longhorns (the football team).
This end result, probabilistic models coupled with a practical application, is a portrait of Durrett's passion for natural language processing. "What led me to NLP was machine learning, and its applied areas are this nice intersection of concepts from math and probabilistic modeling and optimization,” Durrett said. “It’s also very applied so you still get to build something that produces something cool at the end of the day. I really like that synergy. "
Durrett believes that the current trend in NLP is deep learning. Durrett said that innovations in neural network research and their applications to language processing have enabled great strides in tasks like automatic summarization.
This is the technology that enables Google to translate text into different languages, and the technology that will continue to allow researchers like Durrett to drive innovation in the near future. Durrett’s research will allow us to one day have more advanced automated systems that can recognize the intricacies of the human language for even smarter data processing and organization.