Common Voice Dataset

By admin , 11 十月, 2024

https://commonvoice.mozilla.org/en/datasets

We’re building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

Includes both Cantonese and Mandarin Chinese!!

抽样粤语（Chinese Hong Kong）语音数据的质量不好，录音人声音不够清晰（不是声优级别的声音），背景噪音较大，标记文件有错。另外还有个Cantonese的分类。

感觉可能用现有的TTS生成数据质量会好得多。

6/25/2025粤语音频统计：

总文件数：123195 个
总时长：8552分7.33秒（513127.33 秒）
平均时长：4.17秒（4.17 秒）
最长时长：1分42.5秒（102.50 秒）
最短时长：0.2秒（0.20 秒）

标签

TTS
AI

评论

您的名字

CAPTCHA

本站使用的软件

请输入"Drupal"

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

最新内容

量子力学纲要
2 days 5 hours ago
布洛赫球
2 days 5 hours ago
量子云平台
3 days 6 hours ago
Quantum Computing in Practice with Qiskit and IBM Quantum Experience
3 days 6 hours ago
IBM量子信息基础课程
3 days 6 hours ago
量子算法全集
2 weeks ago
爱给素材
1 month ago
AI世界生成工具
1 month ago
geogebra数学工具
1 month ago
能级跃迁
1 month ago

最新评论

Mate从LTS版本中移除。变成全部都是短期的版本… 3 months 2 weeks ago
关于ubuntu-mate 3 months 3 weeks ago
鱼与漁 4 months 3 weeks ago
SC娛樂城 9 months ago
感谢分享 9 months 2 weeks ago
我没有做过很全面仔细的测试，但在我测试不多的句子里… 10 months ago
语速不一有遇到过吗 10 months ago
26个拼音字母 1 year 4 months ago
如果要把基金从场内转到场外，需要先在场外购买对应基金… 1 year 4 months ago
GPL-2… 1 year 5 months ago