This is continue for Coqui Cantonese Development Notes
2024-10-30
--> TIME: 2024-10-29 13:49:43 -- STEP: 129/157 -- GLOBAL_STEP: 68425
| > decoder_loss: 0.4340108335018158 (0.41140215849691586)
| > postnet_loss: 0.40669307112693787 (0.39625994965087535)
| > stopnet_loss: 0.2480345070362091 (0.3161741283743881)
| > decoder_coarse_loss: 0.42141956090927124 (0.3995050713997479)
| > decoder_ddc_loss: 0.004036255180835724 (0.0066533084226006916)
| > ga_loss: 0.00028112504514865577 (0.0008964127673254916)
| > decoder_diff_spec_loss: 0.17114706337451935 (0.16164369188075847)
| > postnet_diff_spec_loss: 0.19191323220729828 (0.18958402153595474)
| > decoder_ssim_loss: 0.6188733577728271 (0.6018433372179665)
| > postnet_ssim_loss: 0.6118573546409607 (0.6032356453496356)
| > loss: 1.179294466972351 (1.2175781227821527)
| > align_error: 0.722508043050766 (0.7450022502231969)
| > amp_scaler: 1024.0 (1024.0)
| > grad_norm: tensor(1.0017, device='cuda:0') (tensor(0.9195, device='cuda:0'))
| > current_lr: 1.275e-06
| > step_time: 2.3045 (0.6297789155974869)
| > loader_time: 2.3656 (0.14574224265046823)
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-29-2024_10+40AM-7dc2f6fd
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1360, in train_step
outputs, loss_dict_new, step_time = self.optimize(
^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1226, in optimize
outputs, loss_dict = self._compute_loss(
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1157, in _compute_loss
outputs, loss_dict = self._model_train_step(batch, model, criterion)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1116, in _model_train_step
return model.train_step(*input_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 338, in train_step
loss_dict = criterion(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 511, in forward
decoder_ssim_loss = self.criterion_ssim(decoder_output, mel_input, output_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 148, in forward
assert not torch.isnan(y_hat_norm).any(), "y_hat_norm contains NaNs"
AssertionError: y_hat_norm contains NaNs
real 191m19.757s
user 389m26.424s
2025-4-8
Download last model from platform, which is half a year ago. And try to listen the output sound effect. It's similar to model_10000.
/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_63586.pth --out_path ./demo2.wav
The MDCC database is not from the same speaker.
I am going to build a Chinese corpus to do some experiment: https://github.com/hgneng/ChineseCorpus
- Maybe I can generate a one-character-corpus with only jyutping without tone to train a model that can understand jyutping.
- Then I generate waves of the corpus from existing TTS include Ekho TTS.
- To verify the result, I can use STT. I am not sure this is so call GAN. And I am not sure how to do it in coqui. I may need to ask AI whether this a workable way.
2025-4-9
这里有一篇很好的文章:语音合成技术(深度学习方法简介)
昨天的想法应该是行不通的,太原始了。按照昨天的思路,即使是不同发音者的数据库也可以训练出期望的结果。当然,好的效果肯定应该是同一个发音者的。通过STT去验证也是不可行的,太慢了。
尝试把错误修复,把训练的量提升一个数量级,看效果有没有变化。
--> TIME: 2025-04-09 14:47:04 -- STEP: 154/157 -- GLOBAL_STEP: 68450
| > decoder_loss: 0.4403502345085144 (nan)
| > postnet_loss: 0.5414097309112549 (nan)
| > stopnet_loss: 0.2241305559873581 (0.30289583720944147)
| > decoder_coarse_loss: 0.42627546191215515 (0.4042315759829111)
| > decoder_ddc_loss: 0.002295088255777955 (nan)
| > ga_loss: 0.00029222219018265605 (0.0007955534260813754)
| > decoder_diff_spec_loss: 0.16289205849170685 (nan)
| > postnet_diff_spec_loss: 0.2963632047176361 (nan)
| > decoder_ssim_loss: 0.5058364868164062 (0.6033724380003945)
| > postnet_ssim_loss: 0.5834265947341919 (0.6041104174279548)
| > loss: 1.1825339794158936 (nan)
| > align_error: 0.7805334031581879 (nan)
| > amp_scaler: 512.0 (1017.3506493506494)
| > grad_norm: tensor(1.2060, device='cuda:0') (tensor(0.9337, device='cuda:0'))
| > current_lr: 1.275e-06
| > step_time: 1.3925 (0.6281918287277224)
| > loader_time: 0.0102 (0.012237853818125541)
> EVALUATION
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-April-09-2025_12+47PM-7dc2f6fd
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
self.eval_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
self.model.eval_log(
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
logger.eval_audios(steps, audios, self.ap.sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 58, in add_audios
self.add_audio(
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 34, in add_audio
self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
assert array.ndim == 1, "input tensor should be 1 dimensional."
^^^^^^^^^^^^^^^
AssertionError: input tensor should be 1 dimensional.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
[!] Waveform is not finite everywhere. Skipping the GL.
real 121m1.519s
user 227m28.137s
sys 308m34.519s
打了个补丁后,又有新错误:
[!] Character '睛' not found in the vocabulary. Discarding it.
2025-4-25
这个问题依然存在:
[!] Character '睛' not found in the vocabulary. Discarding it.
本地可以重现,实在奇怪。先在本地解决了再在虚拟机上跑。
virtualcloud的源有问题,要换清华的源,下载截图给June看。已解压的MDCC数据已经添加到缓存中,下次不用自己解压了,自己解压需要18分钟。
New init steps:
- clone code: git clone https://github.com/hgneng/TTS.git
- install deps: cd TTS && pip install -e .[all,dev]
- patch: cp /gemini/code/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
- copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /tmp/mdcc-dataset/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /tmp/mdcc-dataset/
- link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /tmp/mdcc-dataset
- run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out
Model test command:
~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_10000_411.pth --out_path ./demo.wav
2025-4-27
之前可以运行的这条命令也不能运行了,原因不明:
~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_63586.pth --out_path ./demo3.wav
Traceback (most recent call last):
File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 494, in <module>
main()
File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 338, in main
from TTS.api import TTS
ModuleNotFoundError: No module named 'TTS'
2025-4-28
No module named 'TTS'的问题解决了,原因是昨天安装一个软件的时候里面添加了一个设置环境变量source /etc/profile.d/obd.sh,和conda的设置冲突了。
在YUE_CN_Phonemizer里打了log,但是没有显示出来,似乎是没有运行导致汉字没有翻译成拉丁字母。
2025-4-29
以下问题已解决,原因是配置文件里use_phonemes改称了false,不知道什么时候改的,git也没有记录。但是查输出,以前成功运行的时候确实是true的,实在是奇怪。
[!] Character '睛' not found in the vocabulary. Discarding it.
New init steps:
- clone code: git clone https://github.com/hgneng/TTS.git
- install deps: cd TTS && pip install -e .[all,dev]
- patch: cp /gemini/code/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
- #copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
- #link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /gemini/data-2/dataset mdcc-dataset
- #cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-dataset/
- #cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-dataset/
- run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out
2025-4-30
--> TIME: 2025-04-29 16:51:42 -- STEP: 1004/1018 -- GLOBAL_STEP: 85725
| > decoder_loss: 0.4629353880882263 (0.42889915789622224)
| > postnet_loss: 0.439887136220932 (0.4946356853581523)
| > stopnet_loss: 0.20786581933498383 (0.30227529310966406)
| > decoder_coarse_loss: 0.44925352931022644 (0.4157491997478018)
| > decoder_ddc_loss: 0.0023014589678496122 (0.006216903324276063)
| > ga_loss: 0.0002632320101838559 (0.0008917728220433813)
| > decoder_diff_spec_loss: 0.16706256568431854 (0.1651524688336123)
| > postnet_diff_spec_loss: 0.1775813102722168 (0.30110443852456015)
| > decoder_ssim_loss: 0.646834135055542 (0.6270531970428758)
| > postnet_ssim_loss: 0.6436614990234375 (0.6645315740094231)
| > loss: 1.1851838827133179 (1.2952861307389247)
| > align_error: 0.7453666925430298 (0.7455814101425301)
| > amp_scaler: 4.0 (4.0)
| > grad_norm: tensor(1.0548, device='cuda:0') (tensor(1.1489, device='cuda:0'))
| > current_lr: 6.000000000000001e-07
| > step_time: 4.5975 (0.5963088309622381)
| > loader_time: 0.1885 (0.010768146628877555)
> EVALUATION
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-April-29-2025_11+24AM-d34af9a2
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
self.eval_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
self.model.eval_log(
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
logger.eval_audios(steps, audios, self.ap.sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 68, in add_audios
self.add_audio(
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 44, in add_audio
self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
assert array.ndim == 1, "input tensor should be 1 dimensional."
^^^^^^^^^^^^^^^
AssertionError: input tensor should be 1 dimensional.
Warning: Result contains NaN values. Returning zero tensor.
...
Warning: Result contains NaN values. Returning zero tensor.
[!] Waveform is not finite everywhere. Skipping the GL.
real 330m21.618s
user 733m40.008s
sys 991m49.889s
继续做一些强制的维度修复。目前训练效果越来越差,8万次的训练不及6万次的。有可能是因为数据集发生了变化导致的。再训练一段时间看看效果。
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
self.eval_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
self.model.eval_log(
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
logger.eval_audios(steps, audios, self.ap.sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 68, in add_audios
self.add_audio(
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 44, in add_audio
self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
if audio.ndim != 1:
^^^^^
UnboundLocalError: cannot access local variable 'audio' where it is not associated with a value
2025-5-13
--> TIME: 2025-05-12 18:39:43 -- STEP: 1017/1018 -- GLOBAL_STEP: 105450
| > decoder_loss: 0.45329123735427856 (nan)
| > postnet_loss: 0.45325377583503723 (nan)
| > stopnet_loss: 0.2002808004617691 (nan)
| > decoder_coarse_loss: 0.4443933069705963 (0.41429214013947374)
| > decoder_ddc_loss: 0.0018430311465635896 (nan)
| > ga_loss: 0.00033928133780136704 (nan)
| > decoder_diff_spec_loss: 0.16729258000850677 (nan)
| > postnet_diff_spec_loss: 0.23089689016342163 (nan)
| > decoder_ssim_loss: 0.657738447189331 (0.6257411204022058)
| > postnet_ssim_loss: 0.6858585476875305 (0.66164923212404)
| > loss: 1.2005012035369873 (nan)
| > align_error: 0.7821131497621536 (nan)
| > amp_scaler: 2.0 (4.975417895771876)
| > grad_norm: tensor(0.9057, device='cuda:0') (tensor(1.1377, device='cuda:0'))
| > current_lr: 6.000000000000001e-07
| > step_time: 0.917 (0.642910648587883)
| > loader_time: 0.0076 (0.011983617803222898)
> EVALUATION
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-May-12-2025_01+35PM-d34af9a2
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
self.eval_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
self.model.eval_log(
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
logger.eval_audios(steps, audios, self.ap.sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 68, in add_audios
self.add_audio(
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 44, in add_audio
self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 705, in audio
array = array[0]
~~~~~^^^
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
Warning: Result contains NaN values. Returning zero tensor.
。。。
Warning: Result contains NaN values. Returning zero tensor.
[!] Waveform is not finite everywhere. Skipping the GL.
警告:输入的音频张量不是一维的,需要进行处理。
代码里有一句是ping服务器的,似乎是收集使用数据,这句经常连不上(因为在国外),注释掉。
2025-5-15
终于成功运行超过1天没有报错了。
2025-5-16
运行到超过160K steps了,不过loss数据基本上都在上升,似乎不太乐观。好的一面是在100K step左右产生了新的best_model。
2025-5-19
已经训练到500K steps了,但是best_model还是在100K step,看起来这个训练没有什么效果。
~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_10000_411.pth --out_path ./demo.wav
There is new error:
Traceback (most recent call last):
File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 494, in <module>
main()
File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 423, in main
synthesizer = Synthesizer(
^^^^^^^^^^^^
File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 93, in __init__
self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 185, in _load_tts
raise ValueError("Phonemizer is not defined in the TTS config.")
ValueError: Phonemizer is not defined in the TTS config.
修改下面一行配置后,对于旧的模型可以生成音频:
"phonemizer": "yue_cn_phonemizer", //"espeak",
但是对于新模型有报错:
$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/best_model_101019.pth --out_path ./demo2.wav
> Using model: Tacotron2
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:50.0
| > mel_fmax:7600.0
| > pitch_fmin:1.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:scale_stats.npy
| > base:10
| > hop_length:256
| > win_length:1024
/home/hgneng/code/hgneng/TTS/TTS/utils/io.py:54: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return torch.load(f, map_location=map_location, **kwargs)
Traceback (most recent call last):
File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 494, in <module>
main()
File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 423, in main
synthesizer = Synthesizer(
^^^^^^^^^^^^
File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 93, in __init__
self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 192, in _load_tts
self.tts_model.load_checkpoint(self.tts_config, tts_checkpoint, eval=True)
File "/home/hgneng/code/hgneng/TTS/TTS/tts/models/base_tacotron.py", line 105, in load_checkpoint
state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hgneng/code/hgneng/TTS/TTS/utils/io.py", line 54, in load_fsspec
return torch.load(f, map_location=map_location, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/coqui/lib/python3.11/site-packages/torch/serialization.py", line 1114, in load
return _legacy_load(
^^^^^^^^^^^^^
File "/opt/miniconda3/envs/coqui/lib/python3.11/site-packages/torch/serialization.py", line 1338, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input
有可能是phonemizer配置修改导致的,尝试重新训练。
New init steps:
- clone code: git clone https://github.com/hgneng/TTS.git
- chnage repo: pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
- install deps: cd TTS && pip install -e .[all,dev]
- patch 1: cp /gemini/code/TTS/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
- patch 2: cp /gemini/code/TTS/patch/summary.py /root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/
- patch 3: cp /gemini/code/TTS/patch/trainer.py /root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py
- run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out
训练完后,把最新的checkpoint.pth链接到best_model.pth,下次训练时才能继续上次的进度。
把event文件复制到/gemini/output/,可以在tensorboard里看到训练趋势。不要创建符号链接,似乎有bug会把目录删掉并终止训练。
Model test command:
~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_10000_411.pth --out_path ./demo.wav
评论