Common Voice Dataset
We’re building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.
Includes both Cantonese and Mandarin Chinese!!
抽样粤语(Chinese Hong Kong)语音数据的质量不好,录音人声音不够清晰(不是声优级别的声音),背景噪音较大,标记文件有错。另外还有个Cantonese的分类。
感觉可能用现有的TTS生成数据质量会好得多。
评论