Toggle Main Menu Toggle Search

Open Access padlockePrints

ByT5 model for massively multilingual grapheme-to-phoneme conversion

Lookup NU author(s): Dr Cong Zhang

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

In this study, we tackle massively multilingual grapheme-tophoneme conversion through implementing G2P models based on ByT5. We have curated a G2P dataset from various sources that covers around 100 languages and trained large-scale multilingual G2P models based on ByT5. We found that ByT5 operating on byte-level inputs significantly outperformed the tokenbased mT5 model in terms of multilingual G2P. Pairwise comparison with monolingual models in these languages suggests that multilingual ByT5 models generally lower the phone error rate by jointly learning from a variety of languages. The pretrained model can further benefit low resource G2P through zero-shot prediction on unseen languages or provides pretrained weights for finetuning, which helps the model converge to a lower phone error rate than randomly initialized weights. To facilitate future research on multilingual G2P, we make available our code and pretrained multilingual G2P models at: https: //github.com/lingjzhu/CharsiuG2P.


Publication metadata

Author(s): Zhu J, Zhang C, Jurgens D

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: Interspeech 2022

Year of Conference: 2022

Pages: 446-450

Print publication date: 18/09/2022

Online publication date: 18/09/2022

Acceptance date: 06/04/2022

Publisher: International Speech Communication Association (ISCA)

URL: https://doi.org/10.21437/Interspeech.2022-538

DOI: 10.21437/Interspeech.2022-538


Share