A speech-to-text-to-speech program that does things.
| .vscode | ||
| src | ||
| .envrc | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| config.json | ||
| en_GB-cori-high.onnx.json | ||
| flake.lock | ||
| flake.nix | ||
| README.md | ||
| rust-toolchain.toml | ||
| voices.json | ||
vocalization
Vocalization (from now on just voc) is a speech to text to speech program.
The process it uses is as follows:
- Loads a
whisper.cppcompatible model (available from https://huggingface.co/ggerganov/whisper.cpp) - Initializes access to
espeak - Live listens to the microphone, with the settings from
config.jsontaken into account (live-reload) - Sends the audio to
whisperto decode- If there is
>=~1sof audio, then it goes through as normal - If there is
<~1sof audio, then it gets padded with silence
- If there is
- The text from there is sent to
espeakto speak out.
Future Goals
- Integrate a neural TTS (
piperis the leading option) - Output to a fake microphone instead
- UX improvements
- Better silence detection and noise suppression
Usage
It's evil, you shouldn't. ROCm only and probably requires some system libs that I can't list here because I already had them installed and don't know which ones they would be.
But if you want to:
- Change the model path in the
config.jsonfile - Start the program and let it do its work
Extras
Yes I know I've committed the config. No I don't think it matters. ;3 There's nothing in the config that's useful to anyone outside of my home.