This is continue for Coqui Cantonese Development Notes
2024-10-30
--> TIME: 2024-10-29 13:49:43 -- STEP: 129/157 -- GLOBAL_STEP: 68425
| > decoder_loss: 0.4340108335018158 (0.41140215849691586)
| > postnet_loss: 0.40669307112693787 (0.39625994965087535)
| > stopnet_loss: 0.2480345070362091 (0.3161741283743881)
| > decoder_coarse_loss: 0.42141956090927124 (0.3995050713997479)
| > decoder_ddc_loss: 0.004036255180835724 (0.0066533084226006916)
| > ga_loss: 0.00028112504514865577 (0.0008964127673254916)
| > decoder_diff_spec_loss: 0.17114706337451935 (0.16164369188075847)
| > postnet_diff_spec_loss: 0.19191323220729828 (0.18958402153595474)
| > decoder_ssim_loss: 0.6188733577728271 (0.6018433372179665)
| > postnet_ssim_loss: 0.6118573546409607 (0.6032356453496356)
| > loss: 1.179294466972351 (1.2175781227821527)
| > align_error: 0.722508043050766 (0.7450022502231969)
| > amp_scaler: 1024.0 (1024.0)
| > grad_norm: tensor(1.0017, device='cuda:0') (tensor(0.9195, device='cuda:0'))
| > current_lr: 1.275e-06
| > step_time: 2.3045 (0.6297789155974869)
| > loader_time: 2.3656 (0.14574224265046823)
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-29-2024_10+40AM-7dc2f6fd
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1360, in train_step
outputs, loss_dict_new, step_time = self.optimize(
^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1226, in optimize
outputs, loss_dict = self._compute_loss(
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1157, in _compute_loss
outputs, loss_dict = self._model_train_step(batch, model, criterion)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1116, in _model_train_step
return model.train_step(*input_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 338, in train_step
loss_dict = criterion(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 511, in forward
decoder_ssim_loss = self.criterion_ssim(decoder_output, mel_input, output_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 148, in forward
assert not torch.isnan(y_hat_norm).any(), "y_hat_norm contains NaNs"
AssertionError: y_hat_norm contains NaNs
real 191m19.757s
user 389m26.424s
New init steps:
- clone code: git clone https://github.com/hgneng/TTS.git
- install deps: cd TTS && pip install -e .[all,dev]
- copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /tmp/mdcc-dataset/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /tmp/mdcc-dataset/
- link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /tmp/mdcc-dataset
- run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh | tee out
Model test command:
~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_10000_411.pth --out_path ./demo.wav
评论