In his freshman year, Baade started his undergraduate research. Three years later, he published a paper introducing a new architecture for textless voice conversion. The model aims to convert a phrase from one person's voice to another just by listening. Baade is now finalizing another paper and working on a fourth.
Computer Science and Mathematics senior Alan Baade really enjoys spending hours on problems.
Especially the particularly hard ones, he said. Spending 40 hours on one equation with a small break for sleep somewhere in the middle is rewarding to him.
“I think it's because you can tell at the end of this you are going to understand the material,” Baade said. “You're going to understand computers.”
After completing a UT high school internship research program, Baade came to UT Austin without knowing exactly what he wanted to focus on. Talking to other students in the Turing Scholars Honors program helped him find a place in undergraduate research, cultivating his central interest in machine learning.
As a Turing Scholar in the Department of Computer Science, students have the chance for closer peer-to-peer relationships and access to advanced academic opportunities. This includes taking specialty honors classes with 30-60 students, access to professors and staff within the department, and the option to live in Honors housing with other Turing scholars and students from various other honors programs.
“We take really talented and motivated students from around the country, around the world, and we put them in an environment where we give them lots of opportunities and challenges,” said Calvin Lin, Distinguished Teaching professor in Computer Science and Director of the Turing Honor Scholars program. “They sort of work together and support each other and inspire each other and hopefully do great things.”
Baade received his first undergraduate research opportunity after cold-emailing a professor on the advice of a Sophomore Turing Scholar one night during his freshman year. Three years later, Baade published his second official research paper in July, currently finalizing another and actively working on a fourth.
“I was not planning on doing a PhD whatsoever until at least well after the first semester freshman year,” Baade said.
Baade is credited as the primary author of a July research paper titled Neural Codec Language Models for Disentangled and Textless Voice Conversion. From UT’s Speech, Audio, and Language Technologies lab, the paper presents a new architecture for textless voice conversion by introducing a new type of semantic unit to a modified version of Microsoft’s Vall-E, a text-to-speech model. The paper’s model would be able to convert a phrase said by one person to another person’s voice just by listening.
According to Puyuan Peng, a fourth-year graduate student and co-author of the paper, Baade was the primary researcher testing each possible structure and working on its various components. The models introduced in the paper could replicate speech without using text.
“The Microsoft model came out in January of 2023 and it's a (text-to-speech) model. It's very powerful, but the problem is, there are about 7000 languages in the world, and more than half of them don't have a (written) form,” Peng said. “Which means even if this model is super powerful, it cannot be applied to more than half of the languages in the world.”
Earlier tested models took about eight days to train and were mixed and matched with each other for each system. Around eight models were finalized for the paper, each taking about four days to train using two graphic processing units. It was like putting together a puzzle, Baade said.
“As an undergrad, you have a lot of responsibilities (like) classes and things like that, and to be able to balance an independent research project on top of all of that, and not just make progress, but to actually push it all the way to completion, and write it up into a paper and submit it into a top international conference in the field, and get it accepted and travel there and present it to an audience, is pretty impressive to say the least,” said David Harwath, assistant computer science professor and the paper’s third co-author.
The paper’s co-authors hope the architecture, with increased computing power, will be able to convert in languages without a text form. Harwath, who is also principle investigator at the Speech, Audio, and Language Technologies lab, said applications could be in creative spheres, legal industries and even allowing people who previously lost their voice a chance to get it back.
“(AI)’s an incredibly fast-moving field right now, and which makes it very exciting to work in,” Harwath said. “We're continuing to do work in this lab to make these models more capable and more generalizable to other languages and with a wider array of applications.”
Turing scholars can excel in many ways, Lin said, either through research, in school work, or past graduation. Seeing students like Baade exemplifies the purpose of the Turing Program, he said.
“In some ways, what we're doing is we're challenging and hopefully inspiring students to set the standards high, to maybe do things that they hadn't even thought of,” Lin said.