≡ menu
× menu

cnnmmd

Note


The resources covered here are adult content.

Rights: Use


This plugin retrieves the following content:

https://huggingface.co/kaunista/kaunista-style-bert-vits2-models/tree/main/Anneli-nsfw

This plugin uses the following resources:

Speech data (speech synthesis): Trained model: Anneli-nsfw / kaunista (Style-Bert-VITS2) [※F]
Character (image generation): Trained model: Animagine XL / cagliostrolab / linaqruf (Stable Diffusion XL) [*1]

*1
The people in the generated images are of adult age.
※F
The natural intonation of this model demonstrates the power of SBV2, but the source data is unknown - if any similarities with other performers are identified, we will replace it with another model.

Installation ~ Startup


Add the following resources to each of the custom patch files: [*1] [*2]

${dirtop}/cnnmmd/manage/cnf/cnfsrc_custom.txt
cnnmmd_xoxxox_tlkweb_201 + master https://pubgit.xoxxox.net/cnnmmd/cnnmmd_xoxxox_tlkweb_201
cnnmmd_xoxxox_ttsvit_201 + master https://pubgit.xoxxox.net/cnnmmd/cnnmmd_xoxxox_ttsvit_201
cnnmmd_xoxxox_mgrcmf_cmf_ply_vit_sim_201 + master https://pubgit.xoxxox.net/cnnmmd/cnnmmd_xoxxox_mgrcmf_cmf_ply_vit_sim_201
${dirtop}/cnnmmd/manage/cnf/depend_custom.txt
cnnmmd_xoxxox_mgrcmf_cmf_ply_vit_sim_201
- cnnmmd_xoxxox_tlkweb_201
- cnnmmd_xoxxox_ttsvit_201
- cnnmmd_xoxxox_tlkweb
- cnnmmd_xoxxox_ttsvit
- cnnmmd_xoxxox_appweb
- cnnmmd_xoxxox_appcmf
- cnnmmd_xoxxox_appmid

Get it started like this:

# Get:
$ yes | ./manage.sh create cnnmmd_xoxxox_mgrcmf_cmf_ply_vit_sim_201 -d
# boot:
$ yes | ./manage.sh launch cnnmmd_xoxxox_mgrcmf_cmf_ply_vit_sim_201 -d

After launching, apply the following workflow to ComfyUI and run the flow: [※3]

> ${dirtop}/cnnmmd/export/app/xoxxox/appcmf/doc
 
> flwcmf_cmf_ply_vit_sim_201.json

*1
This plugin comes from a private Git repository.
*2
From a security standpoint, it is safe to just download the plugin and then allow the script to run if there are no problems:
Reference: howist
*3
This flow is a simple flow that makes a character speak from input text, with the addition of two patterns (0 - 1) of image switching to express emotions (the voice emotion expression depends on the number of exclamation marks (!) in the text, etc.).