≡ menu
× menu

cnnmmd

overview


This is a tool that connects various characters with various AI engines.



*
Click here for an explanation of this video (including copyright information).

The main features of this tool are:

Various characters and AI engines can be flexibly combined [※1]
Any number of triggers and actions can be set .
Part selection and flow creation (configuration and flow) can be done via both GUI and CLI [※3]
Full customization and plugin creation/sharing [※4]

*1
You can connect various characters (self-made to ready-made, virtual (2D images to 3D models, web browser/desktop/VR/MR) to reality (dolls/figures), etc.) with various AI engines (speech recognition/speech synthesis/language generation/sentiment analysis/object recognition, local/remote/cloud, CPU/GPU/API, etc.).
※2
The basic flow is one-to-one conversation (chat) between the user and the character, but many-to-many conversations between characters are also possible (memory function). In addition, since there is no limit to the number of endpoints that can be used for interaction, it is possible to make the character behave in a variety of ways by controlling triggers (changes in the environment) and actions (character movements) (changing facial expressions based on the results of emotion analysis, making the character behave appropriately when touched, etc.).
*3
Conversation flows can be created visually, so there is no need to write code (ComfyUI version). On the other hand, complex flows such as branching and repetition can also be handled (script version). In either case, flows can be composed and recombined like building blocks (for example, a generated image can be sent to a microcontroller screen, or a flow for a web browser can be repurposed for a game engine by changing one conversion node, etc.).
*4
You can customize all of the tools and add plugins (even the core parts of the tools are plugins) -- you can incorporate your own models and servers, and run the containers you need individually -- and the interfaces between clients (characters) and servers, between servers and engines, and between intermediate connectors and server groups, as well as the procedures for installing, removing, and starting and stopping plugins, are all unified.
*
Now, by utilizing generative AI, it is easy to set up the network/server and create code in various languages. This tool also allows you to customize everything and create and publish plugins - from creating client (character) side apps to extending server side code, you should be able to create prototypes that suit your situation and needs.

images
images
images
images
images
images
images
images
images
images

environment


Currently, we have verified the connection of the following characters and AI engines:

Operating environment for intermediate connectors
PC + OS (Windows (WLS2))

+ Containers (Docker / Docker Desktop)
PC (Mac) + OS (macOS)

+ Containers (Docker Desktop)
PC + OS (Linux (Ubuntu))

+ Container (Docker)
Microcomputer (Raspberry Pi) + OS (Linux (Ubuntu))

+ Container (Docker)
*
Engine (AI related model)
Speech recognition model:

・Local (wav2vec2-large-japanese [gpu] / OpenAI: Whisper [cpu / gpu])

・Service (OpenAI: Whisper [api])
Speech synthesis model:

・Local (VOICEVOX [cpu/gpu] / Style-Bert-VITS2 [cpu/gpu])

・Service (NIJI Voice [api])
Language generation model:

・Local (Vecteus-v1 [gpu])

・Services (OpenAI: GPT [api] / NovelAI [api])
Sentiment analysis model:

・Local (luke-japanese-large-sentiment-analysis-wrime [gpu])

・Service (OpenAI: GPT [api])
Image generation model:

・Local (*[gpu])

・Service (OpenAI: DALL-E [api] / NovelAI [api])
Object Recognition Model:

・Local (OpenCV: Haar Cascades [cpu])
*
Client (character side)
On Workflow (ComfyUI)

+ Mobile device + OS (iOS / Android) / PC + OS (Windows / macOS / Linux)
Web browser

+ Mobile device + OS (iOS / Android) / PC + OS (Windows / macOS / Linux)
Desktop App (Electron)

+ PC + OS (Windows / macOS / Linux)
Game engine (Unity)

+ PC + OS (Windows / macOS)
VR/MR App (Unity / Virt-A-Mate)

+ VR/MR equipment (Quest) + PC (Windows)
Microcontroller: Bare metal (M5Stack)

+ LCD/Microphone (M5Stack)
Figures (Nendoroid Dolls)

+ Microcontroller: Bare metal (M5Stack) + Microphone/Camera/Speaker/Motor (M5Stack)

*
The intermediate connector works in environments ranging from PCs to microcomputers. The AI engine is testing voice recognition, voice synthesis, language generation, emotion analysis, image generation, and object recognition, all of which are compatible with CPU/GPU operation and API calls. The characters are being tested for connection to media ranging from virtual to real, including images, videos, 3D, VR, MR, dolls, and figures.

*
All server applications will run on containers, so containers (Docker) are essential.
*
At this time, the client-side (character-side app) and various server-side (AI engine and other app) code is provided only as a sample (we do not guarantee that it will work properly) - this tool provides intermediate connector functionality and workflow creation.

constraints


Currently, the following constraints exist:

A container is required to use the tool. [※1]
At this time, conversation connections are unstable. [※2]
Backwards compatibility is not guaranteed
*1
We use OS container technology (Docker) to prevent conflicts between apps and to allow for the use of various AI engines without restrictions.
※2
Since it is a common specification with low-performance devices such as microcontrollers, communication is currently only by HTTP polling (it is connectionless at the TCP level -- functions to maintain connections such as WebSockets are planned to be added in the future). For now, users who can deal with this problem are eligible to use it.

right


The rights to the original code of this tool are subject to the GNU GPL license. [※1]

The resources used by this tool (models, libraries, applications) are each copyrighted - resources that require special care are clearly stated in the description of each plugin. [※2]


*1
Any code that uses this code must be made publicly available.
※2
Some of the models used by this tool's engine (speech synthesis models, image generation models, video generation models, etc.) can use audio, images, and video that have been trained by individuals - beyond personal use, this tool is not intended to be used for any acts that infringe on portrait rights, copyrights, or moral rights of authors.