Hi Medium, I’m a data scientist at Deeply Inc, an audio deep-learning-based start-up in Seoul, South Korea. Our company mainly focuses on building services to facilitate a better understanding of what event is taking place in the environmental sound scene (e.g. emergency detection, elderly monitoring system, …), using deep learning and digital signal processing techniques.
INTERSPEECH
The conference on INTERSPEECH, one of the biggest conferences on the science and technology of spoken language processing, was held at Songdo ConvensiA, in Incheon, South Korea, from Sep. 18 to 22, 2022. Integrating two previous series of conferences (EUROSPEECH and ICSLP), the first INTERSPEECH was held in 2000, in Beijing. Since then, INTERSPEECH has gained popularity and held the 23rd event this year, 2022.
Despite a small discrepancy between our main focus and the conference theme, because our company primarily concentrates on nonverbal audio signals other than speech itself, many papers from INTERSPEECH have aided our research so far. So we decided to sign up for the conference this time, and the location was a huge plus for us. We had two main goals for this visit: First, to familiarize ourselves with the challenges and breakthroughs in audio and speech processing. Second, to connect with other researchers and engineers from academia and industry.
Of course, there were so many interesting and informative sessions, including tutorials, keynotes, oral sessions, and industry talks, just to name a few. However, my favorite parts of the conference were oral & poster sessions and industry talks.
Oral & Poster Sessions
These sessions were especially compelling because I was able to up be to date with the eminent challenges and the novel ideas easily due to a well-categorized curation of the accepted works. More importantly, I attended many presentations presenting novel techniques that were similar to my work of interest, and I found promising methods that can be applied to our company’s recent projects.
To be specific, in the work of K Koutini and his team from Johannes Kepler University Linz, they proposed a method to train a transformer network more efficiently with audio signals. As one of their contributions, they open-sourced a pre-trained network using Google AudioSet. Since our team thought that this model had learned rich representation in the acoustic domain, Our team decided to fine-tune this network to our dataset. Sure enough, it turned out that this technique mitigated some of the issues we have been facing.
Industry Talks
I attended the talk by Changwon Han and Jonghoon Jeong, from Samsung Research. They elaborated the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) technologies embedded in their home appliances. It was interesting that they also integrated conventional signal processing techniques along with deep learning models. Also, it was notable that they put great emphasis on model compression techniques due to computational limits of small home appliances.
I wish they made more time for the industry talks for the next conference because they were primarily packed into lunchtime. And it was hard to skip lunch for the talks since there was a large volume of information to process in so little time that it was calorie-consuming.
Wrap-ups
Since it was my first in-person international conference, it was quite overwhelming on the first day. However, as I acclimated to the atmosphere, I enjoyed a lot interacting with the participants and had a lot of fun. And I’m looking forward to attending more in-person conferences in the future, including INTERSPEECH 2023 which will be held in Dublin, Ireland.