百度语音识别接口:http://ai.baidu.com/docs#/ASR-API/top
HTML5录音接口:
- https://www.html5rocks.com/en/tutorials/getusermedia/intro/
- https://developers.google.com/web/updates/2016/01/mediarecorder
- https://developers.google.com/web/fundamentals/media/recording-audio/
- https://juejin.im/post/5b8bf7e3e51d4538c210c6b0
- RecordRTC - https://github.com/muaz-khan/RecordRTC
MediaRecorder API can only be used from secure origins only: HTTPS or localhost.
音高检测demo:https://webaudiodemos.appspot.com/pitchdetect/index.html
语音记事:https://voice-memos.appspot.com/
原理:https://www.zhihu.com/question/20398418
- Spectrum:频谱
- Pitch:音高(基频音高)
- Fundamental Frequency:基频(男: 62 ~ 523 Hz,女:110 ~ 1000 Hz)
- Intensity:音强
- Formant:共振峰
- Pauses:脉冲
- 宽带语图(元音、辅音):男(0-5000Hz),女(0-5500Hz)
- 窄带语图(声调、语调):男(0-1200Hz),女(0-2000Hz)
一些朴素的想法:
1. 从波形中提取一个拼音的音频,规范化采样率、音量
2. 把拼音的波形切成帧(比如:25ms一帧、15ms交叠、帧移10ms)
3. 每一帧中提取以下参数,每一参数有一权重系数
3.1 Pitch范围
3.2 Intensity范围
3.3 F1(乘以Pitch系数后)范围
3.4 F2(乘以Pitch系数后)范围
3.5 F3(乘以Pitch系数后)范围
3.6 F4(乘以Pitch系数后)范围
4. 把帧的数量规范化为一个常量(比如100帧),得出规范化后的一个参数矩阵表
5. 比较两个参数矩阵表的相似度,得出音频最可能是哪个拼音
评论