Fine tune Coqui Cantonese TTS

By admin , 28 十月, 2024

This is continue for Coqui Cantonese Development Notes

2024-10-30

   --> TIME: 2024-10-29 13:49:43 -- STEP: 129/157 -- GLOBAL_STEP: 68425
    | > decoder_loss: 0.4340108335018158  (0.41140215849691586)
    | > postnet_loss: 0.40669307112693787  (0.39625994965087535)
    | > stopnet_loss: 0.2480345070362091  (0.3161741283743881)
    | > decoder_coarse_loss: 0.42141956090927124  (0.3995050713997479)
    | > decoder_ddc_loss: 0.004036255180835724  (0.0066533084226006916)
    | > ga_loss: 0.00028112504514865577  (0.0008964127673254916)
    | > decoder_diff_spec_loss: 0.17114706337451935  (0.16164369188075847)
    | > postnet_diff_spec_loss: 0.19191323220729828  (0.18958402153595474)
    | > decoder_ssim_loss: 0.6188733577728271  (0.6018433372179665)
    | > postnet_ssim_loss: 0.6118573546409607  (0.6032356453496356)
    | > loss: 1.179294466972351  (1.2175781227821527)
    | > align_error: 0.722508043050766  (0.7450022502231969)
    | > amp_scaler: 1024.0  (1024.0)
    | > grad_norm: tensor(1.0017, device='cuda:0')  (tensor(0.9195, device='cuda:0'))
    | > current_lr: 1.275e-06 
    | > step_time: 2.3045  (0.6297789155974869)
    | > loader_time: 2.3656  (0.14574224265046823)
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-29-2024_10+40AM-7dc2f6fd
Traceback (most recent call last):
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
   self._fit()
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
   self.train_epoch()
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
   outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1360, in train_step
   outputs, loss_dict_new, step_time = self.optimize(
                                       ^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1226, in optimize
   outputs, loss_dict = self._compute_loss(
                        ^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1157, in _compute_loss
   outputs, loss_dict = self._model_train_step(batch, model, criterion)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1116, in _model_train_step
   return model.train_step(*input_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 338, in train_step
   loss_dict = criterion(
               ^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
   return self._call_impl(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
   return forward_call(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 511, in forward
   decoder_ssim_loss = self.criterion_ssim(decoder_output, mel_input, output_lens)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
   return self._call_impl(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
   return forward_call(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 148, in forward
   assert not torch.isnan(y_hat_norm).any(), "y_hat_norm contains NaNs"
AssertionError: y_hat_norm contains NaNs
real    191m19.757s
user    389m26.424s

2025-4-8

Download last model from platform, which is half a year ago. And try to listen the output sound effect. It's similar to model_10000.

/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_63586.pth --out_path ./demo2.wav

The MDCC database is not from the same speaker.

I am going to build a Chinese corpus to do some experiment: https://github.com/hgneng/ChineseCorpus

Maybe I can generate a one-character-corpus with only jyutping without tone to train a model that can understand jyutping.
Then I generate waves of the corpus from existing TTS include Ekho TTS.
To verify the result, I can use STT. I am not sure this is so call GAN. And I am not sure how to do it in coqui. I may need to ask AI whether this a workable way.

2025-4-9

这里有一篇很好的文章：语音合成技术（深度学习方法简介）

昨天的想法应该是行不通的，太原始了。按照昨天的思路，即使是不同发音者的数据库也可以训练出期望的结果。当然，好的效果肯定应该是同一个发音者的。通过STT去验证也是不可行的，太慢了。

尝试把错误修复，把训练的量提升一个数量级，看效果有没有变化。

   --> TIME: 2025-04-09 14:47:04 -- STEP: 154/157 -- GLOBAL_STEP: 68450
    | > decoder_loss: 0.4403502345085144  (nan)
    | > postnet_loss: 0.5414097309112549  (nan)
    | > stopnet_loss: 0.2241305559873581  (0.30289583720944147)
    | > decoder_coarse_loss: 0.42627546191215515  (0.4042315759829111)
    | > decoder_ddc_loss: 0.002295088255777955  (nan)
    | > ga_loss: 0.00029222219018265605  (0.0007955534260813754)
    | > decoder_diff_spec_loss: 0.16289205849170685  (nan)
    | > postnet_diff_spec_loss: 0.2963632047176361  (nan)
    | > decoder_ssim_loss: 0.5058364868164062  (0.6033724380003945)
    | > postnet_ssim_loss: 0.5834265947341919  (0.6041104174279548)
    | > loss: 1.1825339794158936  (nan)
    | > align_error: 0.7805334031581879  (nan)
    | > amp_scaler: 512.0  (1017.3506493506494)
    | > grad_norm: tensor(1.2060, device='cuda:0')  (tensor(0.9337, device='cuda:0'))
    | > current_lr: 1.275e-06 
    | > step_time: 1.3925  (0.6281918287277224)
    | > loader_time: 0.0102  (0.012237853818125541)
> EVALUATION
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-April-09-2025_12+47PM-7dc2f6fd
Traceback (most recent call last):
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
   self._fit()
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
   self.eval_epoch()
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
   self.model.eval_log(
 File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
   logger.eval_audios(steps, audios, self.ap.sample_rate)
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
   self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 58, in add_audios
   self.add_audio(
 File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 34, in add_audio
   self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
 File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
   audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
   assert array.ndim == 1, "input tensor should be 1 dimensional."
          ^^^^^^^^^^^^^^^
AssertionError: input tensor should be 1 dimensional.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
Warning: Result contains NaN values. Returning zero tensor.
[!] Waveform is not finite everywhere. Skipping the GL.
real    121m1.519s
user    227m28.137s
sys     308m34.519s

打了个补丁后，又有新错误：

 [!] Character '睛' not found in the vocabulary. Discarding it.

2025-4-25

这个问题依然存在：

 [!] Character '睛' not found in the vocabulary. Discarding it.

本地可以重现，实在奇怪。先在本地解决了再在虚拟机上跑。

virtualcloud的源有问题，要换清华的源，下载截图给June看。已解压的MDCC数据已经添加到缓存中，下次不用自己解压了，自己解压需要18分钟。

New init steps:

clone code: git clone https://github.com/hgneng/TTS.git
install deps: cd TTS && pip install -e .[all,dev]
patch: cp /gemini/code/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /tmp/mdcc-dataset/
cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /tmp/mdcc-dataset/
link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /tmp/mdcc-dataset
run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out

Model test command:

~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_10000_411.pth --out_path ./demo.wav

2025-4-27

之前可以运行的这条命令也不能运行了，原因不明：

~/code/hgneng/TTS$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/model_63586.pth --out_path ./demo3.wav
Traceback (most recent call last):
 File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 494, in <module>
   main()
 File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 338, in main
   from TTS.api import TTS
ModuleNotFoundError: No module named 'TTS'

2025-4-28

No module named 'TTS'的问题解决了，原因是昨天安装一个软件的时候里面添加了一个设置环境变量source /etc/profile.d/obd.sh，和conda的设置冲突了。

在YUE_CN_Phonemizer里打了log，但是没有显示出来，似乎是没有运行导致汉字没有翻译成拉丁字母。

2025-4-29

以下问题已解决，原因是配置文件里use_phonemes改称了false，不知道什么时候改的，git也没有记录。但是查输出，以前成功运行的时候确实是true的，实在是奇怪。

[!] Character '睛' not found in the vocabulary. Discarding it.

New init steps:

clone code: git clone https://github.com/hgneng/TTS.git
install deps: cd TTS && pip install -e .[all,dev]
patch: cp /gemini/code/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
#copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
#link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /gemini/data-2/dataset mdcc-dataset
#cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-dataset/
#cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-dataset/
run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out

2025-4-30

   --> TIME: 2025-04-29 16:51:42 -- STEP: 1004/1018 -- GLOBAL_STEP: 85725
     | > decoder_loss: 0.4629353880882263  (0.42889915789622224)
     | > postnet_loss: 0.439887136220932  (0.4946356853581523)
     | > stopnet_loss: 0.20786581933498383  (0.30227529310966406)
     | > decoder_coarse_loss: 0.44925352931022644  (0.4157491997478018)
     | > decoder_ddc_loss: 0.0023014589678496122  (0.006216903324276063)
     | > ga_loss: 0.0002632320101838559  (0.0008917728220433813)
     | > decoder_diff_spec_loss: 0.16706256568431854  (0.1651524688336123)
     | > postnet_diff_spec_loss: 0.1775813102722168  (0.30110443852456015)
     | > decoder_ssim_loss: 0.646834135055542  (0.6270531970428758)
     | > postnet_ssim_loss: 0.6436614990234375  (0.6645315740094231)
     | > loss: 1.1851838827133179  (1.2952861307389247)
     | > align_error: 0.7453666925430298  (0.7455814101425301)
     | > amp_scaler: 4.0  (4.0)
     | > grad_norm: tensor(1.0548, device='cuda:0')  (tensor(1.1489, device='cuda:0'))
     | > current_lr: 6.000000000000001e-07 
     | > step_time: 4.5975  (0.5963088309622381)
     | > loader_time: 0.1885  (0.010768146628877555)
 > EVALUATION 
 ! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-April-29-2025_11+24AM-d34af9a2
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
    self.eval_epoch()
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
    self.model.eval_log(
  File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
    logger.eval_audios(steps, audios, self.ap.sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
    self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 68, in add_audios
    self.add_audio(
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 44, in add_audio
    self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
    audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
    assert array.ndim == 1, "input tensor should be 1 dimensional."
           ^^^^^^^^^^^^^^^
AssertionError: input tensor should be 1 dimensional.
Warning: Result contains NaN values. Returning zero tensor.
...
Warning: Result contains NaN values. Returning zero tensor.
 [!] Waveform is not finite everywhere. Skipping the GL.
real    330m21.618s
user    733m40.008s
sys     991m49.889s

继续做一些强制的维度修复。目前训练效果越来越差，8万次的训练不及6万次的。有可能是因为数据集发生了变化导致的。再训练一段时间看看效果。

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
    self.eval_epoch()
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
    self.model.eval_log(
  File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
    logger.eval_audios(steps, audios, self.ap.sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
    self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 68, in add_audios
    self.add_audio(
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 44, in add_audio
    self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
    audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 702, in audio
    if audio.ndim != 1:
       ^^^^^
UnboundLocalError: cannot access local variable 'audio' where it is not associated with a value

2025-5-13

   --> TIME: 2025-05-12 18:39:43 -- STEP: 1017/1018 -- GLOBAL_STEP: 105450
     | > decoder_loss: 0.45329123735427856  (nan)
     | > postnet_loss: 0.45325377583503723  (nan)
     | > stopnet_loss: 0.2002808004617691  (nan)
     | > decoder_coarse_loss: 0.4443933069705963  (0.41429214013947374)
     | > decoder_ddc_loss: 0.0018430311465635896  (nan)
     | > ga_loss: 0.00033928133780136704  (nan)
     | > decoder_diff_spec_loss: 0.16729258000850677  (nan)
     | > postnet_diff_spec_loss: 0.23089689016342163  (nan)
     | > decoder_ssim_loss: 0.657738447189331  (0.6257411204022058)
     | > postnet_ssim_loss: 0.6858585476875305  (0.66164923212404)
     | > loss: 1.2005012035369873  (nan)
     | > align_error: 0.7821131497621536  (nan)
     | > amp_scaler: 2.0  (4.975417895771876)
     | > grad_norm: tensor(0.9057, device='cuda:0')  (tensor(1.1377, device='cuda:0'))
     | > current_lr: 6.000000000000001e-07 
     | > step_time: 0.917  (0.642910648587883)
     | > loader_time: 0.0076  (0.011983617803222898)
 > EVALUATION 
 ! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-May-12-2025_01+35PM-d34af9a2
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1787, in _fit
    self.eval_epoch()
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1665, in eval_epoch
    self.model.eval_log(
  File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 417, in eval_log
    logger.eval_audios(steps, audios, self.ap.sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/base_dash_logger.py", line 83, in eval_audios
    self.add_audios(scope_name="EvalAudios", audios=audios, step=step, sample_rate=sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 68, in add_audios
    self.add_audio(
  File "/root/miniconda3/lib/python3.11/site-packages/trainer/logging/tensorboard_logger.py", line 44, in add_audio
    self.writer.add_audio(title, audio, step, sample_rate=sample_rate)
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/writer.py", line 845, in add_audio
    audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/summary.py", line 705, in audio
    array = array[0]
            ~~~~~^^^
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
Warning: Result contains NaN values. Returning zero tensor.
。。。
Warning: Result contains NaN values. Returning zero tensor.
 [!] Waveform is not finite everywhere. Skipping the GL.
警告：输入的音频张量不是一维的，需要进行处理。

代码里有一句是ping服务器的，似乎是收集使用数据，这句经常连不上（因为在国外），注释掉。

2025-5-15

终于成功运行超过1天没有报错了。

2025-5-16

运行到超过160K steps了，不过loss数据基本上都在上升，似乎不太乐观。好的一面是在100K step左右产生了新的best_model。

2025-5-19

已经训练到500K steps了，但是best_model还是在100K step，看起来这个训练没有什么效果。

There is new error:

Traceback (most recent call last):
  File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 494, in <module>
    main()
  File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 423, in main
    synthesizer = Synthesizer(
                  ^^^^^^^^^^^^
  File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 93, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 185, in _load_tts
    raise ValueError("Phonemizer is not defined in the TTS config.")
ValueError: Phonemizer is not defined in the TTS config.

修改下面一行配置后，对于旧的模型可以生成音频：

"phonemizer": "yue_cn_phonemizer", //"espeak",

但是对于新模型有报错：

$ TTS/bin/synthesize.py --text "ngo5 wui2 syut3 jyut6 jyu5" --config_path recipes/mdcc/tacotron2-DDC/tacotron2-DDC.json --model_path recipes/mdcc/tacotron2-DDC/best_model_101019.pth --out_path ./demo2.wav
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
/home/hgneng/code/hgneng/TTS/TTS/utils/io.py:54: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(f, map_location=map_location, **kwargs)
Traceback (most recent call last):
  File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 494, in <module>
    main()
  File "/home/hgneng/code/hgneng/TTS/TTS/bin/synthesize.py", line 423, in main
    synthesizer = Synthesizer(
                  ^^^^^^^^^^^^
  File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 93, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/home/hgneng/code/hgneng/TTS/TTS/utils/synthesizer.py", line 192, in _load_tts
    self.tts_model.load_checkpoint(self.tts_config, tts_checkpoint, eval=True)
  File "/home/hgneng/code/hgneng/TTS/TTS/tts/models/base_tacotron.py", line 105, in load_checkpoint
    state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hgneng/code/hgneng/TTS/TTS/utils/io.py", line 54, in load_fsspec
    return torch.load(f, map_location=map_location, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/coqui/lib/python3.11/site-packages/torch/serialization.py", line 1114, in load
    return _legacy_load(
           ^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/coqui/lib/python3.11/site-packages/torch/serialization.py", line 1338, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

有可能是phonemizer配置修改导致的，尝试重新训练。

New init steps:

clone code: git clone https://github.com/hgneng/TTS.git
chnage repo: pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
install deps: cd TTS && pip install -e .[all,dev]
patch 1: cp /gemini/code/TTS/patch/tensorboard_logger.py /root/miniconda3/lib/python3.11/site-packages/trainer/logging/
patch 2: cp /gemini/code/TTS/patch/summary.py /root/miniconda3/lib/python3.11/site-packages/torch/utils/tensorboard/
patch 3: cp /gemini/code/TTS/patch/trainer.py /root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py
run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh 2>&1 | tee out

训练完后，把最新的checkpoint.pth链接到best_model.pth，下次训练时才能继续上次的进度。

把event文件复制到/gemini/output/，可以在tensorboard里看到训练趋势。不要创建符号链接，似乎有bug会把目录删掉并终止训练。

Model test command:

下面页面继续: https://cto.eguidedog.net/node/1407

Fine tune Coqui Cantonese TTS

2024-10-30

2025-4-8

2025-4-9

2025-4-25

2025-4-27

2025-4-28

2025-4-29

2025-4-30

2025-5-13

2025-5-15

2025-5-16

2025-5-19

标签

评论

Restricted HTML

最新内容

最新评论