- 초청세미나 [6/29] Self-supervised learning of speech representations (from unlabelled audio and video)
- 게시글 내용
아래와 같이 초청 세미나를 개최하오니 많은 참여 부탁드립니다.
[세미나 개최 안내]
◎ 일시 : 2022년 6월 29일(수) 오전 11시
◎ 장소 : 제 4공학관 404호
◎ 제목 : Self-supervised learning of speech representations (from unlabelled audio and video)
◎ 강연자 : 정준선 교수 (KAIST 전기전자공학부)
◎ 초청자: 전기전자공학과 강홍구 교수
Supervised learning with deep neural networks has brought phenomenal advances to many fields of research, but the performance of such systems relies heavily on the quality and quantity of annotated databases tailored to the particular application. It can be prohibitively difficult to manually collect and annotate databases for every task. There is a plethora of data on the internet that is not used in machine learning due to the lack of such annotations. Self-supervised learning allows a model to learn representations using properties inherent in the data itself, such as natural co-occurrence.
In this talk, I will introduce my work on self-supervised learning of speech representations. Our work demonstrates that phonetic and identity representations can be learnt from unlabelled audio and video. The learnt representations can be used for downstream tasks such as automatic speech recognition, speaker recognition, face recognition and lip reading, on some of which we outperform fully supervised baselines. Other noteworthy applications of self-supervision include separating simultaneous speech from video, and generating talking head animation from audio recordings.
Joon Son Chung is an assistant professor at the School of Electrical Engineering, KAIST, where he is directing Multimodal AI Lab. Previously, he was a research scientist / team leader at Naver Corporation, where he managed the development of speech recognition models for various applications including Clova Note. He received his BA and PhD from the University of Oxford, working with Prof. Andrew Zisserman. He published in top tier publications including TPAMI and IJCV, and has been the recipient of best paper awards at Interspeech and ACCV. His research interests include speaker recognition, cross-modal learning, visual speech synthesis and audio-visual speech recognition.