粤语TTS实现笔记
1. Implement Cantonese Frontend
参照文档Implementing a New Language Frontend 和 普通话代码实现相应的粤语部分
在https://github.com/coqui-ai/TTS/tree/dev/TTS/tts/utils/text/phonemizers实现粤语的phonemizers。
关于phoneme,不知道coqui的普通话部分用的是什么标准,有点像IPA,但又不是。例如"ao": ["aʌ"],在IPA里没有ʌ。也许可以参考amazon的文档实现粤语的phoneme映射:Chinese (Cantonese) (yue-CN)(可能在wikiepedia里Jyutping 是一样的系统),再查一下eSpeak里面用的是不是一样的。尽量往IPA和eSpeak方向统一(如果发现eSpeak总体和IPA一致,但有些不同,可能需要修正eSpeak)。
2024-9-4
Implemented cantonese/jyutpingToPhonemes.py. Use pycantonse.jyutping_to_ipa (https://github.com/jacksonllee/pycantonese/issues/44) to do the translation. pycantonese is a useful Cantonese Linguistics and NLP library. The result of pycantonse.jyutping_to_ipa should be same to wikipedia. The IPA map in amazon seems not fully the same to IPA in wikipedia. It seems that situation in eSpeak is more complicated and I don't want to touch it. I plan to modify other phonemizer files tomorrow.
2024-9-5
Finish cantonese phonemizer. code at https://github.com/hgneng/TTS
reinstall conda coqui environment.
pycantonese's copus is not complete enough. "这是,样本中文。" is translated as ||si6||||||bun2||zung1|man2||| Two characters are missing. I need to write my cantonese to jyutping python module myself (or extend pycantonese).
2024-9-6
使用opencc模块把文本统一转换为繁体中文就可以使用pycantonese了,无需扩展pycantonese。
下面的步骤不太确定,没有找到config.phoneme_language,不知道实现是否正确。
https://docs.coqui.ai/en/latest/implementing_a_new_language_frontend.html
After you implement your phonemizer, you need to add it to the
TTS/tts/utils/text/phonemizers/__init__.py
to be able to map the language code in the model config -config.phoneme_language
- to the phonemizer class and initiate the phonemizer automatically.
下面这个步骤暂时没有做(TODO1):
You should also add tests to
tests/text_tests
if you want to make a PR.
这个项目有粤语语音库(8G,仅用于学术研究),音质很好,属于播音员级别的录音。
https://github.com/HLTCHKUST/cantonese-asr
P.S. Orca在Firefox上有很多快捷键,输入中文时会触发这些快捷键,这导致输入中文很困难。
2. Training Model
参考文档:Training a Model ,先和普通话一样,选择Tacotron2模型,以后考虑使用xtts模型。
似乎参考这个例子,调通了就能训练:https://github.com/coqui-ai/TTS/blob/dev/recipes/kokoro/tacotron2-DDC/run.sh
2024-9-9
MDCC Dataset requires a signed license, I send it today.
I plan to read its paper later: http://arxiv.org/pdf/2201.02419
The paper said "Common Voice zh-HK"(from wikipedia in 2019) is "the biggest existing dataset". I should take a look at it some time later.
2024-9-10
Read 3/8 of the paper.
copy kokoro foler to mdcc and modify based on it: recipes/mdcc/tacotron2-DDC
Don't understand setting in characters in tracotron2-DDC.json. It seems not correct but it may has no use.
Ready to train tomorrow.
2024-9-11
Finish paper reading. MDCC use "Fairseq S2T Transformer" to do the training. Use CER(character error rate) as evaluation metric and result of 10.15% comparing to 8.69% with Common Voice zh-HK. (What's the advantage of MDCC??) This decision is based on the fact that the MDCC data are cleaner, shorter and therefore easier to learn that those in Common Voice zh-HK.
~/code/hgneng/TTS$ bash recipes/mdcc/tacotron2-DDC/run.sh
> Avg mel spec mean: -2.2504470286478218
> Avg mel spec scale: 0.7808214242600025
> Avg linear spec mean: -1.4933440478801683
> Avg linear spec scale: 0.8465782427037042
> stats saved to /home/hgneng/code/hgneng/TTS/recipes/mdcc/tacotron2-DDC/scale_stats.npy
...
AttributeError: module 'TTS.tts.datasets' has no attribute 'mdcc'
I need to implement a method call mdcc in TTS/tts/datasets/formatters.py
2024-9-12
Implemented TTS/tts/datasets/formatters.py
got following error:
File "/home/hgneng/code/hgneng/TTS/TTS/tts/utils/text/tokenizer.py", line 206, in init_from_config
raise ValueError(
ValueError: No phonemizer found for language yue-cn.
You may need to install a third party library for this language.
And I also need to take a look at multi-speaker training.
2024-9-13
For multi-speaker, we need to implement SpeakerManager.
Added code to TTS/tts/utils/text/tokenizer.py to support yue-cn
Got following error:
File "/home/hgneng/code/hgneng/TTS/TTS/tts/models/tacotron2.py", line 386, in _create_logs
pred_spec = postnet_outputs[0].data.cpu().numpy()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Got unsupported ScalarType BFloat16
Need to upload data and code to https://platform.virtaicloud.com/ to run.
P.S. 当ibus中文输入法无法使用的时候,可以通过下面方法重启:
$ ibus-daemon -drx
$ ibus restart // optional
2024-9-14
- Setup an environment with Pyhon 3.11, cuda >= 2.1
- setup mdcc-dataset link
- edit run.sh to restore some one time logic
- install dependencies: gemini/code/TTS# pip install .[all,dev]
- It's very slow to generate scale_stats.npy on virtualcloud, generate it locally and upload it.
I should fix following error:
> EPOCH: 0/1000
--> /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-14-2024_10+24AM-133a5102
[*] Pre-computing phonemes...
0%| | 0/65120 [00:00<?, ?it/s]jau5waak6bei2keoi5saau2siu2ge3jan4leoi6ci2zou2aa3dong1tung4maai4haa6waa1
[!] Character '5' not found in the vocabulary. Discarding it.
jau5waak6bei2keoi5saau2siu2ge3jan4leoi6ci2zou2aa3dong1tung4maai4haa6waa1
[!] Character '6' not found in the vocabulary. Discarding it.
jau5waak6bei2keoi5saau2siu2ge3jan4leoi6ci2zou2aa3dong1tung4maai4haa6waa1
[!] Character '2' not found in the vocabulary. Discarding it.
jau5waak6bei2keoi5saau2siu2ge3jan4leoi6ci2zou2aa3dong1tung4maai4haa6waa1
[!] Character '3' not found in the vocabulary. Discarding it.
jau5waak6bei2keoi5saau2siu2ge3jan4leoi6ci2zou2aa3dong1tung4maai4haa6waa1
[!] Character '4' not found in the vocabulary. Discarding it.
jau5waak6bei2keoi5saau2siu2ge3jan4leoi6ci2zou2aa3dong1tung4maai4haa6waa1
[!] Character '1' not found in the vocabulary. Discarding it.
2%|██▉ | 1491/65120 [00:33<17:08, 61.85it/s]
当我修改字符集合配置,重启环境运行后遇到以下错误。这个问题和网络有关,多试几次可能就可以了。
> Start Tensorboard: tensorboard --logdir=/gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-14-2024_11+14AM-850dac97
/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py:552: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
self.scaler = torch.cuda.amp.GradScaler()
> Model has 47872052 parameters
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 712, in _error_catcher
yield
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 812, in _raw_read
data = self._fp_read(amt) if not fp_closed else b""
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 797, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/http/client.py", line 473, in read
s = self.fp.read(amt)
^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/socket.py", line 706, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/ssl.py", line 1314, in recv_into
return self.read(nbytes, buffer)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/ssl.py", line 1166, in read
return self._sslobj.read(len, buffer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/requests/models.py", line 816, in generate
yield from self.raw.stream(chunk_size, decode_content=True)
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 934, in stream
data = self.read(amt=amt, decode_content=decode_content)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 877, in read
data = self._raw_read(amt)
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 811, in _raw_read
with self._error_catcher():
File "/root/miniconda3/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/root/miniconda3/lib/python3.11/site-packages/urllib3/response.py", line 717, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.") from e # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='user-images.githubusercontent.com', port=443): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/gemini/code/TTS/TTS/bin/train_tts.py", line 71, in <module>
main()
File "/gemini/code/TTS/TTS/bin/train_tts.py", line 58, in main
trainer = Trainer(
^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 583, in __init__
ping_training_run()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/analytics.py", line 12, in ping_training_run
_ = requests.get(URL, timeout=5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 725, in send
history = [resp for resp in gen]
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 725, in <listcomp>
history = [resp for resp in gen]
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 266, in resolve_redirects
resp = self.send(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 747, in send
r.content
File "/root/miniconda3/lib/python3.11/site-packages/requests/models.py", line 899, in content
self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/requests/models.py", line 822, in generate
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='user-images.githubusercontent.com', port=443): Read timed out.
一个epoch大概需要20分钟时间,完整训练预估300小时。可以尝试性能更好的机器。
Pre-computing phonemes...完成后,卡住在这个阶段:
> DataLoader initialization
| > Tokenizer:
| > add_blank: False
| > use_eos_bos: False
| > use_phonemes: True
| > phonemizer:
| > phoneme language: yue-cn
| > phoneme backend: yue_cn_phonemizer
| > Number of instances : 65120
CTRL+C终止后自动保存了一个checkpoint
> Keyboard interrupt detected.
> Saving model before exiting...
> CHECKPOINT : /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-14-2024_11+52AM-850dac97/checkpoint_0.pth
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-14-2024_11+52AM-850dac97
2024-9-18
start a machine and pip install . but it seems try to download all versions of every packages that make it never finish. Don't know why. try again tomorrow.
2024-9-19
After changing to another machine and change pip repo to tsinghua, it works again. However, the traing time of one epch increase from 20 minutes to 2 hours.
After realizing not using GPU, change run.sh and run again. It is stuck at
> DataLoader initialization
| > Tokenizer:
| > add_blank: False
| > use_eos_bos: False
| > use_phonemes: True
| > phonemizer:
| > phoneme language: yue-cn
| > phoneme backend: yue_cn_phonemizer
| > Number of instances : 65120
2024-9-20
The process stop after 2 hours because phoneme_cache is incomplete. It's related to scale_stats.npy. I need to upload it from local. The virtual machine's disk is very slow. Large quantity of files should be processed locally.
Network timeout issue resists. modify the timeout can solve it:
File "/root/miniconda3/lib/python3.11/site-packages/trainer/analytics.py", line 12, in ping_training_run
_ = requests.get(URL, timeout=5)
got error:
> TRAINING (2024-09-20 10:41:04)
! Run is removed from /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-20-2024_09+46AM-850dac97
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1503, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/root/miniconda3/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 212, in __getitem__
return self.load_data(idx)
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 268, in load_data
token_ids = self.get_token_ids(idx, item["text"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 251, in get_token_ids
token_ids = self.get_phonemes(idx, text)["token_ids"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 228, in get_phonemes
out_dict = self.phoneme_dataset[idx]
~~~~~~~~~~~~~~~~~~~~^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 614, in __getitem__
ph_hat = self.tokenizer.ids_to_text(ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/utils/text/tokenizer.py", line 120, in ids_to_text
return self.decode(id_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/utils/text/tokenizer.py", line 84, in decode
text += self.characters.id_to_char(token_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/utils/text/characters.py", line 305, in id_to_char
return self._id_to_char[idx]
~~~~~~~~~~~~~~~~^^^^^
KeyError: 51
Caused by not rebuiding phoneme_cache after adding 1-6 tone number. (scale_stats.npy is identical after adding 1-6 tones)
Got following error:
> TRAINING (2024-09-20 12:29:55)
/root/miniconda3/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
--> TIME: 2024-09-20 12:30:12 -- STEP: 0/1018 -- GLOBAL_STEP: 0
| > decoder_loss: 1.3073298931121826 (1.3073298931121826)
| > postnet_loss: 3.433417797088623 (3.433417797088623)
| > stopnet_loss: 2.9247257709503174 (2.9247257709503174)
| > decoder_coarse_loss: 1.3068547248840332 (1.3068547248840332)
| > decoder_ddc_loss: 0.018958045169711113 (0.018958045169711113)
| > ga_loss: 0.06158587336540222 (0.06158587336540222)
| > decoder_diff_spec_loss: 0.1912860870361328 (0.1912860870361328)
| > postnet_diff_spec_loss: 4.505105018615723 (4.505105018615723)
| > decoder_ssim_loss: 0.8266807794570923 (0.8266807794570923)
| > postnet_ssim_loss: 0.7951085567474365 (0.7951085567474365)
| > loss: 6.987125873565674 (6.987125873565674)
| > align_error: 0.8974450752139091 (0.8974450752139091)
| > amp_scaler: 32768.0 (32768.0)
| > grad_norm: 0 (0)
| > current_lr: 2.5000000000000002e-08
| > step_time: 5.318 (5.318024396896362)
| > loader_time: 11.2825 (11.282504320144653)
warning: audio amplitude out of range, auto clipped.
! Run is removed from /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-20-2024_11+30AM-850dac97
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1503, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1325, in _next_data
return self._process_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/root/miniconda3/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 212, in __getitem__
return self.load_data(idx)
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 268, in load_data
token_ids = self.get_token_ids(idx, item["text"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 251, in get_token_ids
token_ids = self.get_phonemes(idx, text)["token_ids"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/TTS/tts/datasets/dataset.py", line 230, in get_phonemes
assert len(out_dict["token_ids"]) > 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Next time, upload dataset.tar and untar it to /tmp/ which should work around data disk slow issue.
2024-9-24
start a new machine with pytorch 2.2:
- Change pip repo: pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
- clone code: git clone https://github.com/hgneng/TTS.git
- install deps: pip install -e .[all,dev]
- copy dataset: cp -r /gemini/data-2/mdcc-dataset /tmp/
- link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /tmp/mdcc-dataset
- run training: cd /gemini/code/TTS && bash recipes/mdcc/tacotron2-DDC/run.sh
For error of "assert len(out_dict["token_ids"]) > 0":
- don't install TTS. pip install -e .[all,dev]
- add debug like https://github.com/coqui-ai/TTS/issues/1624
- remove item from csv: 447_1804052037_72795_743.7_744.36 (keep fit)
- TODO: clean all items that has no Chinese characters in csv file.
2024-9-25
map all unknown character/phoneme to "v" to fix error of "assert len(out_dict["token_ids"]) > 0".
It starts training. CPU = 400%/400%, MEM=12G/12G, GPU=20%-40%/100%, GPU_MEM=1.5G/6G
TODO: although we can train now. English text and audio should be do harm to the training. I should remove it in future when improving the quality. Following code can detect whether a file contains Chinese:
grep -P "[\x{4E00}-\x{9FFF}]" <file>
Maybe this is not enough. We should also detect whether a file contains characters other than Chinese and filter them out.
--> EVAL PERFORMANCE
| > avg_loader_time: 0.5965745482836499 (+0)
| > avg_decoder_loss: 1.0214300359950192 (+0)
| > avg_postnet_loss: 1.232000221611758 (+0)
| > avg_stopnet_loss: 1.1567586131879342 (+0)
| > avg_decoder_coarse_loss: 1.021938735128462 (+0)
| > avg_decoder_ddc_loss: 0.0019014181905437951 (+0)
| > avg_ga_loss: 0.00905237895490669 (+0)
| > avg_decoder_diff_spec_loss: 0.15972242662656735 (+0)
| > avg_postnet_diff_spec_loss: 0.5937308297616564 (+0)
| > avg_decoder_ssim_loss: 0.9673015845058323 (+0)
| > avg_postnet_ssim_loss: 0.9616055363636179 (+0)
| > avg_loss: 3.203245741112037 (+0)
| > avg_align_error: 0.9778342093512542 (+0)
> BEST MODEL : /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-25-2024_11+50AM-9ab3ff82/best_model_1018.pth
> Number of output frames: 5
> EPOCH: 1/1000
--> /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-25-2024_11+50AM-9ab3ff82
> TRAINING (2024-09-25 12:31:38)
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-September-25-2024_11+50AM-9ab3ff82
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1327, in train_step
batch = self.format_batch(batch)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1058, in format_batch
batch = self.model.format_batch(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/base_tts.py", line 215, in format_batch
stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[64, 8, -1]' is invalid for input of size 2688
real 41m35.819s
user 48m57.862s
sys 37m51.650s
related issues:
2024-10-10
fixed a git network issue:
# git pull --verbose
POST git-upload-pack (155 bytes)
error: RPC failed; curl 16 Error in the HTTP2 framing layer
fatal: expected flush after ref listing
(base) root@gjob-dev-499388056936370176-taskrole1-0:/gemini/code/TTS# git config --global http.version HTTP/1.1
Change transformers>=4.33.0 to transformers>=4.45.1 can prevent pip tries to download too many version information files and significantly speed up pip install. Remove unnecessary dependencies also helps.
2024-10-11
According to https://docs.coqui.ai/en/stable/training_a_model.html#multi-speaker-training , We need to config SpeakerManager. We should label speaker for the database
So we may be better synthesized training data from a good TTS for test. Then record data of ourselves.
> EPOCH: 1/1000
--> /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-11-2024_10+55AM-3fa542c8
> TRAINING (2024-10-11 11:42:41)
[cameron debug]text_input: torch.Size([64, 10])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 8])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 10])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 9])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 48])
[cameron debug]stop_targets: torch.Size([64, 42])
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-11-2024_10+55AM-3fa542c8
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1327, in train_step
batch = self.format_batch(batch)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1058, in format_batch
batch = self.model.format_batch(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/base_tts.py", line 217, in format_batch
stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[64, 8, -1]' is invalid for input of size 2688
real 47m34.345s
user 53m4.819s
sys 44m36.364s
It seems that validation batch size change from 64 to 16 may cause the issue:
> EPOCH: 1/1000
--> /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-11-2024_11+54AM-3fa542c8
> TRAINING (2024-10-11 12:30:18)
[cameron debug]stop_targets: torch.Size([16, 308])
[cameron debug]text_input: torch.Size([16, 108])
[cameron debug]stop_targets: torch.Size([16, 315])
[cameron debug]text_input: torch.Size([16, 92])
[cameron debug]stop_targets: torch.Size([16, 315])
[cameron debug]text_input: torch.Size([16, 97])
[cameron debug]stop_targets: torch.Size([16, 315])
[cameron debug]text_input: torch.Size([16, 91])
[cameron debug]stop_targets: torch.Size([16, 322])
[cameron debug]text_input: torch.Size([16, 105])
[cameron debug]stop_targets: torch.Size([16, 322])
[cameron debug]text_input: torch.Size([16, 95])
[cameron debug]stop_targets: torch.Size([16, 322])
[cameron debug]text_input: torch.Size([16, 98])
[cameron debug]stop_targets: torch.Size([16, 329])
[cameron debug]text_input: torch.Size([16, 104])
[cameron debug]stop_targets: torch.Size([16, 329])
[cameron debug]text_input: torch.Size([16, 102])
[cameron debug]stop_targets: torch.Size([16, 329])
[cameron debug]text_input: torch.Size([16, 100])
[cameron debug]stop_targets: torch.Size([16, 336])
[cameron debug]text_input: torch.Size([16, 108])
[cameron debug]stop_targets: torch.Size([16, 336])
[cameron debug]text_input: torch.Size([16, 91])
[cameron debug]stop_targets: torch.Size([16, 336])
[cameron debug]text_input: torch.Size([16, 102])
[cameron debug]stop_targets: torch.Size([16, 343])
[cameron debug]text_input: torch.Size([16, 122])
[cameron debug]stop_targets: torch.Size([16, 343])
[cameron debug]text_input: torch.Size([16, 102])
[cameron debug]stop_targets: torch.Size([16, 350])
[cameron debug]text_input: torch.Size([16, 105])
[cameron debug]stop_targets: torch.Size([16, 350])
[cameron debug]text_input: torch.Size([16, 106])
[cameron debug]stop_targets: torch.Size([16, 357])
[cameron debug]text_input: torch.Size([16, 106])
[cameron debug]stop_targets: torch.Size([16, 357])
[cameron debug]text_input: torch.Size([16, 105])
[cameron debug]stop_targets: torch.Size([16, 364])
[cameron debug]text_input: torch.Size([16, 110])
[cameron debug]stop_targets: torch.Size([16, 364])
[cameron debug]text_input: torch.Size([16, 110])
[cameron debug]stop_targets: torch.Size([16, 371])
[cameron debug]text_input: torch.Size([16, 115])
[cameron debug]stop_targets: torch.Size([16, 378])
[cameron debug]text_input: torch.Size([16, 109])
[cameron debug]stop_targets: torch.Size([16, 378])
[cameron debug]text_input: torch.Size([16, 115])
[cameron debug]stop_targets: torch.Size([16, 385])
[cameron debug]text_input: torch.Size([16, 114])
[cameron debug]stop_targets: torch.Size([16, 392])
[cameron debug]text_input: torch.Size([16, 129])
[cameron debug]stop_targets: torch.Size([16, 399])
[cameron debug]text_input: torch.Size([16, 113])
[cameron debug]stop_targets: torch.Size([16, 399])
[cameron debug]text_input: torch.Size([16, 110])
[cameron debug]stop_targets: torch.Size([16, 406])
[cameron debug]text_input: torch.Size([16, 134])
[cameron debug]stop_targets: torch.Size([16, 413])
[cameron debug]text_input: torch.Size([16, 112])
[cameron debug]stop_targets: torch.Size([16, 413])
[cameron debug]text_input: torch.Size([16, 135])
[cameron debug]stop_targets: torch.Size([16, 427])
[cameron debug]text_input: torch.Size([16, 130])
[cameron debug]stop_targets: torch.Size([16, 427])
[cameron debug]text_input: torch.Size([16, 127])
[cameron debug]stop_targets: torch.Size([16, 434])
[cameron debug]text_input: torch.Size([16, 127])
[cameron debug]stop_targets: torch.Size([16, 441])
[cameron debug]text_input: torch.Size([16, 137])
[cameron debug]stop_targets: torch.Size([16, 448])
[cameron debug]text_input: torch.Size([16, 132])
[cameron debug]stop_targets: torch.Size([16, 462])
[cameron debug]text_input: torch.Size([16, 144])
[cameron debug]stop_targets: torch.Size([16, 462])
[cameron debug]text_input: torch.Size([16, 144])
[cameron debug]stop_targets: torch.Size([16, 469])
[cameron debug]text_input: torch.Size([16, 126])
[cameron debug]stop_targets: torch.Size([16, 476])
[cameron debug]text_input: torch.Size([16, 129])
[cameron debug]stop_targets: torch.Size([16, 476])
[cameron debug]text_input: torch.Size([16, 141])
[cameron debug]stop_targets: torch.Size([16, 483])
[cameron debug]text_input: torch.Size([16, 134])
[cameron debug]stop_targets: torch.Size([16, 497])
[cameron debug]text_input: torch.Size([16, 145])
[cameron debug]stop_targets: torch.Size([16, 511])
[cameron debug]text_input: torch.Size([16, 148])
[cameron debug]stop_targets: torch.Size([16, 525])
[cameron debug]text_input: torch.Size([16, 146])
[cameron debug]stop_targets: torch.Size([16, 546])
[cameron debug]text_input: torch.Size([16, 163])
[cameron debug]stop_targets: torch.Size([16, 560])
[cameron debug]text_input: torch.Size([16, 167])
[cameron debug]stop_targets: torch.Size([16, 574])
[cameron debug]text_input: torch.Size([16, 187])
[cameron debug]stop_targets: torch.Size([16, 609])
[cameron debug]text_input: torch.Size([16, 177])
[cameron debug]stop_targets: torch.Size([16, 637])
[cameron debug]text_input: torch.Size([16, 214])
[cameron debug]stop_targets: torch.Size([16, 672])
[cameron debug]text_input: torch.Size([16, 219])
[cameron debug]stop_targets: torch.Size([16, 728])
[cameron debug]text_input: torch.Size([16, 217])
[cameron debug]stop_targets: torch.Size([16, 777])
[cameron debug]text_input: torch.Size([16, 242])
[cameron debug]stop_targets: torch.Size([16, 861])
[cameron debug]text_input: torch.Size([11, 227])
[cameron debug]stop_targets: torch.Size([11, 931])
> Number of output frames: 5
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-11-2024_11+54AM-3fa542c8
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1327, in train_step
batch = self.format_batch(batch)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1058, in format_batch
batch = self.model.format_batch(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/base_tts.py", line 217, in format_batch
stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[64, 8, -1]' is invalid for input of size 2688
[cameron debug]text_input: torch.Size([64, 10])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 8])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 10])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 9])
[cameron debug]stop_targets: torch.Size([64, 35])
[cameron debug]text_input: torch.Size([64, 48])
[cameron debug]stop_targets: torch.Size([64, 42])
real 35m53.726s
user 54m0.334s
sys 43m25.322s
2024-10-12
To make debug faster, I decrease the training size.
The error or invalid size is caused by gradual_training in config. According to https://github.com/coqui-ai/TTS/issues/370#issuecomment-796887391 It may imply that I should not use gradual_training here.
After remove gradual_training, training can go on.
Training stop after 10 epoch. It seems caused by image generation.
| > Synthesizing test sentences.
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-12-2024_12+21PM-a76a8482
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1789, in _fit
self.test_run()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1698, in test_run
test_outputs = self.model.test_run(self.training_assets)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/base_tacotron.py", line 172, in test_run
test_figures["{}-alignment".format(idx)] = plot_alignment(
^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/utils/visual.py", line 18, in plot_alignment
im = ax.imshow(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/__init__.py", line 1473, in inner
return func(
^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 5895, in imshow
im.set_data(X)
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/image.py", line 729, in set_data
self._A = self._normalize_image_array(A)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/image.py", line 697, in _normalize_image_array
raise TypeError(f"Invalid shape {A.shape} for image data")
TypeError: Invalid shape (2,) for image data
real 43m54.625s
user 83m9.846s
sys 71m53.565s
2024-10-14
It seems that wrong data pass to plot_alignment:
plot_alignment: tensor([[[0.3459, 0.3393, 0.3148],
[0.3423, 0.3409, 0.3168]]], device='cuda:0')
> EVALUATION
[cameron debug] plot_alignment: [[0.00525638 0.00515768 0.00483796 ... 0.00428045 0.00399799 0.00383315]
[0.00524201 0.00514426 0.00482417 ... 0.00427541 0.00399991 0.00382796]
[0.00521557 0.00512407 0.00481664 ... 0.00427768 0.00400931 0.00384087]
...
[0.0048878 0.00469761 0.00445868 ... 0.00380792 0.00346853 0.00341414]
[0.00488604 0.00469578 0.0044569 ... 0.00380466 0.00346461 0.00341131]
[0.00488425 0.00469378 0.00445499 ... 0.00380148 0.00346084 0.00340842]]
[cameron debug] plot_alignment: [[0.00447653 0.00450786 0.00442694 ... 0.00460552 0.00456613 0.00436251]
[0.00448597 0.00451314 0.00443333 ... 0.00460422 0.00456982 0.00438775]
[0.00448847 0.00452268 0.00444501 ... 0.00461531 0.00457178 0.0044022 ]
...
[0.00443988 0.00470385 0.00454191 ... 0.00565624 0.00548315 0.00526332]
[0.00443774 0.00470567 0.00454191 ... 0.00567711 0.00550073 0.00527762]
[0.00443601 0.00470734 0.00454206 ... 0.00569411 0.00551517 0.00528983]]
| > Synthesizing test sentences.
[cameron debug] plot_alignment: tensor([[[0.3459, 0.3393, 0.3148],
[0.3423, 0.3409, 0.3168]]], device='cuda:0')
[cameron debug] alignment.detach().cpu().numpy().squeeze(): [[0.3459418 0.33926764 0.31479052]
[0.34226763 0.34093702 0.31679538]]
[cameron debug] plot_alignment: tensor([[[1.],
[1.],
[1.],
[1.],
[1.]]], device='cuda:0')
[cameron debug] alignment.detach().cpu().numpy().squeeze(): [1. 1. 1. 1. 1.]
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-14-2024_11+00AM-591c0f5f
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1789, in _fit
self.test_run()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1698, in test_run
test_outputs = self.model.test_run(self.training_assets)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/base_tacotron.py", line 172, in test_run
test_figures["{}-alignment".format(idx)] = plot_alignment(
^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/utils/visual.py", line 20, in plot_alignment
im = ax.imshow(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/__init__.py", line 1473, in inner
return func(
^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 5895, in imshow
im.set_data(X)
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/image.py", line 729, in set_data
self._A = self._normalize_image_array(A)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/matplotlib/image.py", line 697, in _normalize_image_array
raise TypeError(f"Invalid shape {A.shape} for image data")
TypeError: Invalid shape (5,) for image data
real 25m36.855s
user 72m23.987s
sys 64m50.134s
Here is the fix: https://github.com/coqui-ai/TTS/commit/a42fc37d659263a1fc0adeb63e8c476563e6a456#diff-9c5e107d293345eea14a3314a4c17f89e5123723b93d74b84a75add31bd65f87L14
2024-10-17
It takes about 10 days to train the data. I'm going to learn how to check progress in Tensorboard. It seems that file events.out.tfevents.1728968264.gjob-dev-501230719612473344-taskrole1-0.18959.0 within log folder is data file for Tensorboard. Tensorboard need to be setup when create virtaicloud project. I have to wait next training. I only need to link output file dir to /gemini/output. It will work.
2024-10-21
--> TIME: 2024-10-18 06:09:40 -- STEP: 153/157 -- GLOBAL_STEP: 62325
| > decoder_loss: 0.4451596736907959 (0.41713695452104205)
| > postnet_loss: 0.5398772954940796 (0.39982837750241645)
| > stopnet_loss: 0.19535207748413086 (0.29992604051150534)
| > decoder_coarse_loss: 0.4318344295024872 (0.4041759136065939)
| > decoder_ddc_loss: 0.0022528718691319227 (0.006061769612559596)
| > ga_loss: 0.00025947566609829664 (0.0007990106915160285)
| > decoder_diff_spec_loss: 0.1679614633321762 (0.16270109212476444)
| > postnet_diff_spec_loss: 0.29700812697410583 (0.1883718377234889)
| > decoder_ssim_loss: 0.5047341585159302 (0.6026039742955973)
| > postnet_ssim_loss: 0.5795313715934753 (0.6017567484207406)
| > loss: 1.1585509777069092 (1.2064239261976257)
| > align_error: 0.7649699449539185 (0.7485802002202452)
| > amp_scaler: 1024.0 (1024.0)
| > grad_norm: tensor(0.8445, device='cuda:0') (tensor(1.0120, device='cuda:0'))
| > current_lr: 9.9e-06
| > step_time: 1.729 (0.7916449144774789)
| > loader_time: 0.2858 (0.31298755508622306)
! Run is kept in /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/mdcc-ddc-October-15-2024_12+57PM-a42fc37d
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1360, in train_step
outputs, loss_dict_new, step_time = self.optimize(
^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1226, in optimize
outputs, loss_dict = self._compute_loss(
^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1157, in _compute_loss
outputs, loss_dict = self._model_train_step(batch, model, criterion)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/trainer/trainer.py", line 1116, in _model_train_step
return model.train_step(*input_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/models/tacotron2.py", line 338, in train_step
loss_dict = criterion(
^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 494, in forward
decoder_ssim_loss = self.criterion_ssim(decoder_output, mel_input, output_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/layers/losses.py", line 132, in forward
ssim_loss = self.loss_func((y_norm * mask).unsqueeze(1), (y_hat_norm * mask).unsqueeze(1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gemini/code/TTS/TTS/tts/utils/ssim.py", line 251, in forward
score = ssim(
^^^^^
File "/gemini/code/TTS/TTS/tts/utils/ssim.py", line 129, in ssim
_validate_input([x, y], dim_range=(4, 5), data_range=(0, data_range))
File "/gemini/code/TTS/TTS/tts/utils/ssim.py", line 65, in _validate_input
assert data_range[0] <= t.min(), f"Expected values to be greater or equal to {data_range[0]}, got {t.min()}"
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Expected values to be greater or equal to 0, got nan
real 3913m2.327s
user 4830m37.921s
sys 3289m43.980s
Here is a related issue: https://github.com/coqui-ai/TTS/issues/2398
Here is a possible fix by AI: https://github.com/hgneng/TTS/commit/0c6a3f5e8eae6c8dd99e13fb5e6036d0cc2d104e
2024-10-28
virtual machine crashed at EPOCH: 411/1000. The model is not recognizable at all. Need to restore training.
New init steps:
- clone code: git clone https://github.com/hgneng/TTS.git
- install deps: cd TTS && pip install -e .[all,dev]
- copy dataset: cd /gemini/data-1/ && time tar xvf mdcc-dataset.tar -C /tmp/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/train.csv /tmp/mdcc-dataset/
- cp /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/valid.csv /tmp/mdcc-dataset/
- link dataset: cd /gemini/code/TTS/recipes/mdcc/tacotron2-DDC/ && ln -sf /tmp/mdcc-dataset
- run training: cd /gemini/code/TTS && time bash recipes/mdcc/tacotron2-DDC/run.sh | tee out
continue at https://cto.eguidedog.net/node/1391
评论13
关于这个TTS
我已经在debian上安装了这个TTS 请问如何调用,谢谢
这个TTS是小草莓告诉我的,我只是收藏一下…
关于这个TTS
有没有可能基于这个TTS开发一个orca可以调用的版本呢
先解决青蛙TTS怎么支持中文,后续再考虑支持Orca…
关于这个TTS
这个还不支持中文吗,有没有可能让他支持中文呢
还不支持。让它支持中文的难度不确定。还没有时间研究…
还不支持。让它支持中文的难度不确定。还没有时间研究。我估计需要有一个中文深度学习用的语料库,然后进行训练,创建中文模型才能支持。就我个人目前能力来看,还是太困难了。
这个TTS我知道但没用过,可能只是一个框架?
除了这个还有MaryTTS,以我的理解,这些是不是一个训练用的框架,要另外获取训练好的语音?
我的理解是,对于英文,已经有训练好的数据模型…
我的理解是,对于英文,已经有训练好的数据模型,只要录制少量语音,模型就能提取特征值合成出和录音人非常相似的声音效果。对于中文,模型应该还不存在。要做出模型应该是需要深度学习训练出来的。
长度12s
将句号作为显式的终止符,在短文本后面人为加上句号,就不会出现意外的颤音了。比如
tts.tts_to_file("你好。")
非常感谢高人指点!这真是一个神奇的魔法!
非常感谢高人指点!这真是一个神奇的魔法!
非常感谢高人指点!这真是一个神奇的魔法!
大佬能教教我吗?这两天想简单的调用一下。一直没能成功github上看到各种项目和看天书差不多,感觉入不了门。我家电脑配置也不行 现在想用google 的Colaboratory试试
Coqui是基于PyTorch的…
Coqui是基于PyTorch的,Colab好像是运行TensorFlow的,可能不太行。我的电脑没有GPU,也可以跑Coqui,就是要等半分钟才能出结果。要在Ubuntu上跑才比较容易安装,命令不多,就是等下载的时间比较长。
理论上可以,我没试过
请自备梯子
https://gist.github.com/erogol/97516ad65b44dbddb8cd694953187c5b
https://github.com/coqui-ai/TTS/discussions/1074
colab支持pytorch,SD画图也是用的pytorch,有很多colab模板。