# Vonovox

Last update: June 2, 2025


# Introduction

  • Vonovox is a voice changer that uses RVC for its conversion.

  • Vonovox was developed by dr87.


# Is Vonovox Safe?

RVC Models are PyTorch Models, a Python library used for AI. PyTorch uses serialization via Pythons' Pickle Module, converting the model to a file. Since pickle can execute arbitrary code when loading a model, it could be theoretically used for malware, but Vonovox has a built-in feature to prevent code execution along the model. Also, HuggingFace has a Security Scanner which scans for any unsafe pickle exploits and uses also ClamAV for scanning dangerous files.


# System & Hardware Requirements


  • Windows 10 or Later

and

  • At least 6GB of RAM
  • At least 6GB of free disk storage

# For GPU-conversion

TLDR: Make sure you have Nvidia RTX 20xx better. GTX 10xx or RX 900 will also work, but may run into issues with games and higher delay. If you have an iGPU (mostly AMD Radeon Graphics or Vega) use online hosted alternative instead.

Long answer:

Minimum:

  • A dedicated graphics card: Nvidia GeForce GTX 900 Series or later.

Recommended:

  • A dedicated graphics card Nvidia GeForce RTX 20XX Series or later.

# Virtual Audio Cable

# A Virtual Audio Cable (VAC) is what you need to use the voice changer on Discord & Games.

  • Run setup64, not 64a, after extracting the zip to a new folder

  • After installing the Virtual Cable, it changes your default audio system. Click Yes when it asks you to open the audio device settings (or press WIN+R, type "mmsys.cpl" if you closed it already), and change your Recording and Playback devices back to your usual devices. Same for communications device aswell (right click -> set as default communication device)


# Windows

  • Make sure you have a Nvidia and a good enough one to run Vonovox. You don't know what GPU you have? Open Task Manager > Performance tab and check for your GPU0 and GPU1 names.
image

#


# Download NVIDIA on Windows

  • Go to Vonovox's github repo and download the latest release of Vonovox.

# Opening on Windows

  • First Make sure you have 7zip or WinRAR for extracting / unzipping.

  • After the download extract the zip file. Open the folders until you see an .bat file called setup.bat and run that.

  • Vonovox will start downloading everything it needs to run. Be patient as it can take up to 5 minutes to download everything it needs.

  • Once it's done downloading everything it will display Setup complete! in the command line. You can now go ahead and run start.bat.


# Voice Models


# Adding Models

image
#
  • Click on Select .pth file on the blue square located around the the top
  • Only RVC models will work. If you have a gpt-sovits one or any other, they will not work.
  • Select your .pth file and click upload.
  • No need for an Index file.

# Changing Models

If you wish to use a different a model, you can overwrite the model you are currently using with a new model.


# Audio Setup


# Discord & Games

In Vonovox select:

  • Input: Your microphone
  • Output: Virtual Cable or your headphones if you wish to hear the model first

On discord and games, you select:

  • Input: Virtual Cable
  • Output: Your headphones

# Settings


  • Embedder: Select between contentvec or spin trained models. Most current models are trained on contentvec. Make sure you read the model's description to find out what embedder it uses.

  • F0 det: Pitch algorithm. Both RMVPE and FCPE are good options.

  • Pitch smoothing factor: Pitch smoothing will dampen pitch changes. It still follows the exact curve of the f0 predictor allowing it to maintain 100% accuracy, just to a lower magnitude. This allows normal speaking voices to have better stability, since sometimes f0 can be over aggressive and cause pitch wobble on minor pitch fluctuations.

  • Output volume: Controls how loud the output volume is.


# Noise Reduction:

  • RNNoise Reduction: Greatly filters input background noise for very minimum latency. This can mitigate the chances of Vonovox trying to infer on noise.

  • VAD Noise Reduction: Completely mutes the output when speech is not detected. When speech is detected, it uses a 400ms release window. It is also much better at filtering breathe noises than RNNoise.

  • AP-BWE 48k Upscaler: This is an upscaler that extends the bandwidth of speech by adding missing frequency information up to 48k.


# Voice Settings:

  • Pitch: This is the pitch. Going into negative will make it lower pitch, going higher will make it higher pitch. If you have a male voice using a female voice, aim for 10 - 14, this depends on your voice, try around those numbers until you find a sweet spot.

  • Formant Shift: Alters harmonic frequencies and changes the voice timbre without affecting the pitch

  • Block Size: Critical setting. The optimal block size is the lowest you can get without audio being choppy. Listen to your output. This is GPU dependent, the more powerful the gpu, the lower the block size you can use. However the optimizations I made allow much smaller block sizes to work on lower end GPUs. At extremely low block sizes, quality may be reduced.

  • Lookahead Buffer: Gives the model more or less context to work with. Recommended 2.0 for best quality/latency ratio. The added latency of this setting is far less impactful than the block size.



# Extras


# Realtime Sound File Inferencing

You are able to load and play sound files, converted to your model's voice in realtime.

The sound file replaces your input mic while active. Whatever sound is coming from your loaded file is your "new microphone" while the sound is playing. That means it will infer and play the sound file as if it was your own voice in realtime. You can play speech, singing, or whatever you want. Just make sure the audio is clean, as the client still needs to inference it, no different than the real mic.

When a sound file is playing, it will zero out the input from your real mic, meaning you don't have to worry about overlapping your voice with playback. Mic will automatically unmute when sound is playing again. Also mute and unmute is handled properly when pausing and resuming the playback of audio files.

Seek timer and playback timer so you can go to specific times in your sound file.

image


# Models to try

  • You will need to connect your account to weights.gg to be able to download these models
    • Click on the 3 dots (...) on weights.gg models, then Download model.

Female:

Duckus Egirl voice made by lusbert

Psych2Go voice made by dan

Male:

Bob Ross voice made by dieseldog34

Markiplier voice made by hobqueer


# FAQ


# Do I need an extremely expensive mic for good quality?

We had a conversation about this in https://discord.com/channels/1159260121998827560/1159290161683767298/1352325982689951765 & https://discord.com/channels/1159260121998827560/1159290161683767298/1356265862704926907, RVC works by downsampling your audio voice to 16khz because f0 estimators only works at that sample rate, after that the model outputs the results using it's original sample rate (without any upscaling). So there won't be the need of having a super extremely expensive, a decent one should do the job.


#

# You have reached the end.

Report Issues