This is continue for Coqui Cantonese Development Notes
2024-10-30
--> TIME: 2024-10-29 13:49:43 -- STEP: 129/157 -- GLOBAL_STEP: 68425
| > decoder_loss: 0.4340108335018158 (0.41140215849691586)
| > postnet_loss: 0.40669307112693787 (0.39625994965087535)
| > stopnet_loss: 0.2480345070362091 (0.3161741283743881)
| > decoder_coarse_loss: 0.42141956090927124 (0.3995050713997479)
| > decoder_ddc_loss: 0.004036255180835724 (0.0066533084226006916)
| > ga_loss: 0.00028112504514865577 (0.0008964127673254916)
| > decoder_diff_spec_loss: 0.17114706337451935 (0.16164369188075847)
| > postnet_diff_spec_loss: 0.19191323220729828 (0.18958402153595474)
| > decoder_ssim_loss: 0.6188733577728271 (0.6018433372179665)
| > postnet_ssim_loss: 0.6118573546409607 (0.6032356453496356)
| > loss: 1.179294466972351 (1.2175781227821527)
| > align_error: 0.722508043050766 (0.7450022502231969)
| > amp_scaler: 1024.0 (1024.0)
| > grad_norm: tensor(1.0017, device='cuda:0') (tensor(0.9195, device='cuda:0'))
| > current_lr: 1.275e-06
| > step_time: 2.3045 (0.6297789155974869)
| > loader_time: 2.3656 (0.14574224265046823)
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-29-2024_10+40AM-7dc2f6fd
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1360, in train_step
outputs, loss_dict_new, step_time = self.optimize(
^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1226, in optimize
outputs, loss_dict = self._compute_loss(
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1157, in _compute_loss
outputs, loss_dict = self._model_train_step(batch, model, criterion)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1116, in _model_train_step
return model.train_step(*input_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 338, in train_step
loss_dict = criterion(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 511, in forward
decoder_ssim_loss = self.criterion_ssim(decoder_output, mel_input, output_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 148, in forward
assert not torch.isnan(y_hat_norm).any(), "y_hat_norm contains NaNs"
AssertionError: y_hat_norm contains NaNs
real 191m19.757s
user 389m26.424s
2025-4-8
Download last model from platform, which is half a year ago. And try to listen the output sound effect. It's similar to model_10000.
/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_63586.pth --out_path ./demo2.wav
The MDCC database is not from the same speaker.
I am going to build a Chinese corpus to do some experiment: https://github.com/hgneng/ChineseCorpus
- Maybe I can generate a one-character-corpus with only jyutping without tone to train a model that can understand jyutping.
- Then I generate waves of the corpus from existing TTS include Ekho TTS.
- To verify the result, I can use STT. I am not sure this is so call GAN. And I am not sure how to do it in coqui. I may need to ask AI whether this a workable way.
2025-4-9
这里有一篇很好的文章:语音合成技术(深度学习方法简介)
昨天的想法应该是行不通的,太原始了。按照昨天的思路,即使是不同发音者的数据库也可以训练出期望的结果。当然,好的效果肯定应该是同一个发音者的。通过STT去验证也是不可行的,太慢了。
尝试把错误修复,把训练的量提升一个数量级,看效果有没有变化。
--> TIME: 2025-04-09 14:47:04 -- STEP: 154/157 -- GLOBAL_STEP: 68450
| > decoder_loss: 0.4403502345085144 (nan)
| > postnet_loss: 0.5414097309112549 (nan)
| > stopnet_loss: 0.2241305559873581 (0.30289583720944147)
| > decoder_coarse_loss: 0.42627546191215515 (0.4042315759829111)
| > decoder_ddc_loss: 0.002295088255777955 (nan)
| > ga_loss: 0.00029222219018265605 (0.0007955534260813754)
| > decoder_diff_spec_loss: 0.16289205849170685 (nan)
| > postnet_diff_spec_loss: 0.2963632047176361 (nan)
| > decoder_ssim_loss: 0.5058364868164062 (0.6033724380003945)
| > postnet_ssim_loss: 0.5834265947341919 (0.6041104174279548)
| > loss: 1.1825339794158936 (nan)
| > align_error: 0.7805334031581879 (nan)
| > amp_scaler: 512.0 (1017.3506493506494)
| > grad_norm: tensor(1.2060, device='cuda:0') (tensor(0.9337, device='cuda:0'))
| > current_lr: 1.275e-06
| > step_time: 1.3925 (0.6281918287277224)
| > loader_time: 0.0102 (0.012237853818125541)
> EVALUATION
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-April-09-2025_12+47PM-7dc2f6fd
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
self.eval_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
self.model.eval_log(
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
logger.eval_audios(steps, audios, self.ap.sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 58, in add_audios
self.add_audio(
File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 34, in add_audio
self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
assert array.ndim == 1, "input tensor should be 1 dimensional."
^^^^^^^^^^^^^^^
AssertionError: input tensor should be 1 dimensional.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
[!] Waveform is not finite everywhere. Skipping the GL.
real 121m1.519s
user 227m28.137s
sys 308m34.519s
打了个补丁后,又有新错误:
[!] Character '睛' not found in the vocabulary. Discarding it.
New init steps:
- clone code: git clone https://github.com/hgneng/TTS.git
- patch: cp /gemini/code/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
- install deps: cd TTS && pip install -e .[all,dev]
- copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /tmp/mdcc-dataset/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /tmp/mdcc-dataset/
- link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /tmp/mdcc-dataset
- run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out
Model test command:
~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_10000_411.pth --out_path ./demo.wav
评论