``Last update: April 15,2026``
***
:::content-center
## Introduction
:::

- W-Okada is a realtime voice changer that uses RVC for its conversion.

- There are 3 versions of this realtime voice changer, the [Offical Original W-Okada](https://github.com/w-okada/voice-changer) made by Wok, the [Deiteris' Fork](https://github.com/deiteris/voice-changer) made by Deiteris, and the [Tg-Develop's Fork](https://github.com/tg-develop/voice-changer) made by Tg-Develop. Note that those 3 links are just for reference to the Source Code Github Repositories of both projects, you should instead follow the guide below.

- This guide will be about the Wokada Deiteris' fork since it has better preformance and quality compared to the Original Wokada.

- RVC does **NOT** mean realtime voice changer. RVC means Retrieval-based-Voice-Conversion.

***
#### Is this Program Safe?

RVC Models are PyTorch Models, a Python library used for AI.
PyTorch uses serialization via Pythons' Pickle Module, converting the model to a file.
Since pickle can execute arbitrary code when loading a model, it could be theoretically used for malware, but this software has a **built-in feature to prevent code execution along the model.**
Also, **HuggingFace has a [Security Scanner](https://huggingface.co/docs/hub/security-pickle#hubs-security-scanner)** which scans for any unsafe pickle exploits and uses also ClamAV for scanning dangerous files.

***
‎      
#### Pros & Cons :icon-tasklist:
==- *Learn more*
!!! *The pros & cons are subjective to your necessities.*        
!!! 
||| ✔️ **PROS** 
- Currently stable
- Good Performance
- Supports: Nvidia GPUs, AMD GPUs, Intel GPUs on Windows, Intel Mac, Apple Silicon Mac, x86_64 CPUs, Linux, Windows
- Uses a Web User Interface, meaning it can be run on the Cloud
- Uses FP16 Inference by default, and let's you choose to use FP32 for better quality/precision
||| ❌ **CONS** 
- Uses a Web User Interface, having issues on some browsers, and bugs with renaming or deleting models on it
- Doesn't have an active development recently
- Has Cut Off Issues Using an Extra superior to 2.7
- Doesn't let you choose the embedder, using only RVC models trained on contentvec (the majority)
- Doesn't support: Intel GPUs on Linux, ARM64 CPUs, NPUs.
|||
===
***
###### ‎

***

## System & Hardware Requirements

***

- Windows 10 or Later
- macOS 12 Monterey or later. With Apple Silicon or Intel CPU.
- Any Linux Distro

and

- At least 6GB of RAM
- At least 6GB of free disk storage

***
##### For GPU-conversion

`Minimum:`

- An integrated graphics card: AMD Radeon Vega 7 (with AMD Ryzen 5 5600G) or later; with 2GB VRAM (in FP32 mode), ~1GB VRAM (in FP16 mode, if supported). But this is NOT recommended at all and we will most likely not recommend you to download the realtime voice changer with iGPUs.

- A dedicated graphics card: Nvidia GeForce GTX 900 Series or later, or AMD Radeon RX 400 series or later, or Intel Arc A300 series or later (Windows Only).

`Recommended:`

- A dedicated graphics card Nvidia GeForce RTX 20 Series or later, or AMD Radeon RX 5000 series or later, or Intel Arc A500 series or later (Windows Only).

***
##### For CPU-conversion

TLDR: don't bother. You can't run games, discord usage might be the only thing that will work decently, but you might potentially damage your CPU. People with no GPU usually have old CPU's, so delay will be high too. Not worth it.

`Minimum:`
- Intel Core i5-4690K or AMD FX-6300.

`Recommended:`
- Intel Core i5-10400F or AMD Ryzen 5 1600X.

!!!warning CPU-Conversion is not recommended at all
If you plan on playing games at the same, do not use CPU-conversion. With CPU, the delay will be massive and your PC will not run smoothly at all. If you have a higher-end CPU you can make it work, but those that have higher end CPUs most likely also have higher end GPUs, so you should be using your GPU if possible.
!!!

***
## Virtual Audio Cable

#### A Virtual Audio Cable (VAC) is what you need to use the realtime voice changer on Discord & Games.

- A VAC (Virtual Audio Cable) makes a fake audio device, used to re-route the audio of different programs.
- In AI Realtime Voice Changing context, it's used to get the output of AI Converted Voice Output as the input in other programs such as Discord.

!!! For Windows
Download this: [VAC Lite (Virtual-Audio-Cable by Muzychenko)](https://software.muzychenko.net/freeware/vac470lite.zip).
(Be sure to not use any toher vac like VB Audio Cable.)
!!!

- Run `setup64`, not 64a, after extracting the zip to a new folder

- After installing the VAC Lite, it changes your default audio system. Click **Yes** when it asks you to open the audio device settings (or press WIN+R, type "mmsys.cpl" if you closed it already), and change your **Recording** and **Playback** devices back to your usual devices. Same for communications device aswell (right click -> set as default communication device)

!!! For Mac
Download either: 
[Blackhole Virtual Audio Cable](https://existential.audio/blackhole)
or
[VB-Audio](https://vb-audio.com/Cable)
!!!

!!! For Linux
For Debian / Ubuntu-based Systems (Ubuntu, Mint, Pop!_OS), run in the terminal:
```bash
sudo apt-get update && sudo apt-get install -y portaudio19-dev
```

For Fedora / RHEL-based Systems (CentOS, Rocky Linux), run in the terminal:
```bash
sudo yum install -y portaudio
```

For Arch / Arch-based Systems (Endeavour, Manjaro Linux), run in the terminal:
```bash
sudo pacman -Syu portaudio
```
!!!

***
## Windows

- Download based on your GPU. You don't know what GPU you have? Open Task Manager > Performance tab and check for your GPU0 and GPU1 names. Prioritize the Nvidia one if you have one, else use the other.

<img src="../wokada-deiteris-img/cap.png" alt="Task Manager" width="600" height="auto">

####
!!!
Use Online Hosted if you have an integrated GPU (AMD Radeon Graphics ; AMD Radeon Vega ; Intel UHD) and if you do not have a GPU at all
!!!
***

### Download for Nvidia GPUs on Windows

- The latest version as of December 7th 2024 is: [nvidia-b2332 (click here to download)](https://huggingface.co/Shadicti/deiteris-Fork/blob/main/voice-changer-windows-nvidia-b2332.zip)

!!!danger
If you have a GTX 700 card or below, use AMD/Intel version instead.
!!!

***
### Download for Nvidia RTX 5000-series GPUs on Windows

- Nvidia RTX-5000 series, the newest release of GPU's, require a separate download. You do not need it if you have an older GPU, follow the normal Nvidia link in that case. [nvidia-5000-Series (click here to download)](https://github.com/IllIlIlIllIl/voice-changer/releases/tag/b2335)

!!!danger
Download all 3 files, then extract the .zip file, it will automatically extract ALL 3 FILES into one. Then open the `MMVCServerSIO` folder and run `MMVCServerSIO.exe` (or called `MMVCServerSIO` if you don't have extensions activated).
!!!

***
### Download for AMD, INTEL GPUs and x86_64 CPUs on Windows

- The latest version as of December 7th 2024 is: [dml-b2332 (click here to download)](https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-windows-amd64-dml.zip).  DirectML should work for any modern DirectX 12 compatible GPU.

!!!danger
Intel UHD Graphics do NOT work at this point in time. Use Online Alternative.
!!!

***
### Opening on Windows

- First Make sure you have [7zip](https://www.7-zip.org/) or [WinRAR](https://www.win-rar.com/download.html) for extracting / unzipping.

- After the download, you extract the zip file. You open the folders until you see an exe application called `MMVCServerSIO` and run that.

!!!warning
If nothing opens, then open a browser and type in `http://127.0.0.1:18888/`. This is a local URL, it runs on the WebUI.
!!!

***
## Mac

***
### Download for Mac Silicon

- The latest version as of December 7th 2024 is: [arm-b2332 (click here to download)](https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-macos-arm64-cpu.tar.gz)

***
### Download for Mac Intel

- The latest version as of December 7th 2024 is: [macos-amd-b2332 (click here to download)](https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-macos-amd64-cpu.tar.gz)

***
### Opening on Mac

- Double click the voice-changer-macos-arm64-cpu.tar.gz file. The realtime voice changer will unpack and the MMVCServerSIO folder will appear.

- Open the extracted MMVCServerSIO folder.

- Double-click `MMVCServerSIO` to run the realtime voice changer.

!!! Apple quarantine stops you from running the realtime voice changer
You do not get a popup notification for this, so if it does not open or says "Pytorch is damaged", do the following:

1. Open the Terminal

2. Run the following command: `xattr -dr com.apple.quarantine <PUT IN THE PATH TO YOUR MMVCServerSIO FOLDER HERE>`
For example, if you extracted the realtime voice changer to your desktop, the command may look as follows: `xattr -dr com.apple.quarantine ~/Desktop/MMVCServerSIO`

3. Now, open the extracted MMVCServerSIO folder and run `MMVCServerSIO` to run the realtime voice changer.
!!!

!!!warning
If nothing opens, then open a browser and type in `http://127.0.0.1:18888/`. This is a local URL, it runs on the WebUI.
!!!

***
## Linux

Installation of CUDA Toolkit or AMD **HIP SDK is NOT REQUIRED**. All other necessary libraries are bundled with the application.

### Download for Nvidia GPUs on Linux

you need to download both these files:

https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-linux-amd64-cuda.tar.gz.aa

https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-linux-amd64-cuda.tar.gz.ab

### Download for AMD GPUs on Linux

you need to download all these files:

https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-linux-amd64-rocm.tar.gz.aa

https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-linux-amd64-rocm.tar.gz.ab

https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-linux-amd64-rocm.tar.gz.ac

### Download for x86_64 CPUs on Linux

you need only this file:

https://github.com/deiteris/voice-changer/releases/download/b2332/voice-changer-linux-amd64-cpu.tar.gz

### Opening on Linux

I'm not sure about the capabilities of UI tar archive extractors, but you can extract these archive parts with the following command that will merge them and extract: `cat voice-changer-linux-amd64-cuda.tar.gz.* | tar xzf -`, change **cuda** to **rocm** or **cpu** depending on your PC GPU.

After you extract the files using the command above, a new folder called `MMVCServerSIO` will appear.

- Open a Terminal and navigate into that folder:
  ```bash
  cd MMVCServerSIO
  ```

- You may need to make the application executable. Run this command just in case:
  ```bash
  chmod +x ./MMVCServerSIO
  ```

- Now, run the realtime voice changer:
  ```bash
  ./MMVCServerSIO
  ```

!!!warning
After the server finishes loading in your terminal, it will not open a window on its own. Open a web browser and go to `http://127.0.0.1:18888/` to access the user interface.
!!!

***
## Opening on Multi-PC Setups

This is only for the people that have 2 PCs, and want to use 1 PC for Gaming, the other only for Wokada Deiteris Fork.

- Create a file named `.env` on the same folder where `MMVCServerSIO.exe` is located. Open it up with a notepad, copy paste the settings from the [GitHub link](https://github.com/deiteris/voice-changer/issues/180#issuecomment-2359166278).

- After that, you create another file with the file extension ending `.bat`, open it up with a notepad, copy paste what is needed in there again from the [GitHub link](https://github.com/deiteris/voice-changer/issues/180#issuecomment-2359166278). 

- Now run the bat file. After it starts, you should be able to open the link. For example, if you specified `HOST=192.168.0.1` and `ALLOWED_ORIGINS='["https://192.168.0.1:18888"]')`, you should be able to open `https://192.168.0.1:18888` in your browser and use the realtime voice changer UI from other machines in your local network.

***
## Voice Models
***

### Managing Models

#### Adding Models

<img src="../wokada-deiteris-img/edit.png" alt="Edit Button in Wokada Deiteris Fork to Add Models" width="430" height="auto">

#####

- Click on `Edit` on the small blue square located around the the top left side
- Pick any slot you want, click `upload`
- Only RVC models will work. If you have a gpt-sovits one or any other, they will not work.
- Select `Type: RVC`, then `select file` on the `Model` slot and upload your `.pth` file.
- No need for an `Index` file, but you can upload it. This controls the accent of the voice model.

***
#### Renaming Models

!!!danger A Common Bug
Attempting to rename a model directly within the Web User Interface will cause the program to crash. This is a known bug. Use one of the two methods below to safely rename your models.
!!!

**Method 1: Re-uploading the Model**

This is the simplest method.

1.  Find the model's `.pth` file on your computer.
2.  Rename the file to your desired new name.
3.  In the voice changer UI, click `Edit`, select the slot of the model you want to rename, and click `upload`.
4.  Re-upload the renamed `.pth` file to the same slot. This will overwrite the old model and update its name.

**Method 2: Editing the Configuration File**

This method doesn't require re-uploading.

1.  Navigate to your `MMVCServerSIO` folder.
2.  Inside, open the `model_dir` folder. You will see several numbered folders, each corresponding to a model slot in the UI.
3.  Open the folder for the slot number you want to rename.
4.  Inside this folder, you will find a `params.json` configuration file. Open this file with a text editor like Notepad.
5.  Look for the `"name":` field in the file. Change the text in the quotes to your desired new model name.

<img src="../wokada-deiteris-img/json-rename-model.png" alt="Editing the model name in the JSON file with Notepad" width="800" height="auto">

6.  Save the `.json` file. The name will be updated in the voice changer UI.

***
#### Deleting Models

If you wish to delete a model, you can simply overwrite the slot with a new model by following the steps in the **Adding Models** section. If you want to completely empty a slot, navigate to the `MMVCServerSIO/model_dir` folder, open the folder of the slot number you want to delete, and delete all the files inside it.

***
### Merging Models (Merge Lab) :icon-git-merge:

The Merge Lab allows you to combine multiple RVC V2 voice models (.pth Weights only, not indexs too) into a single, new hybrid model. This is useful for creating unique voices.

1.  **Open Merge Lab:** Scroll down in the user interface and click on the `Merge Lab` button.

  <img src="../wokada-deiteris-img/merge-lab-button.png" alt="Merge Lab Button in Wokada Deiteris Fork" width="300" height="auto">

2.  **Select Model Type:** From the `Type` dropdown menu, choose the type of models you wish to merge. Only models that share the same sample rate and type (e.g., "pyTorchRVCv2, 32000Hz, 768" which are all RVC v2 models with the 32kHz Sample Rate, or "pyTorchRVC, 40000Hz, 256" which are all RVC v1 models with the 40kHz Sample Rate) will be shown and can be merged together.

  <img src="../wokada-deiteris-img/merge-lab.png" alt="Merge Lab in Wokada Deiteris Fork" width="600" height="auto">

3.  **Adjust Weights:** Use the sliders next to each model's name to set its "weight" (RVC models are PyTorch files, the .pth is the weight containing the voice) or influence in the merged model. The numbers (from 0 to 100) represent the percentage of each voice in the mix.

4.  **Merge and Download:** Once you have set the desired proportions, click the `Merge` button. Your browser will automatically download the new, `merged.pth` model file, which you can rename to whatever you want.

!!! Manual Download
The merged model is **not** automatically added to your model list. You must upload it to an empty slot yourself by following the steps in the **Adding Models** section.
!!!

!!!danger Index Merging
You **can't merge indexs** (in rvc context, the trained accent of the voice). Only the .pth actual voice file.
!!!

***
## Audio Setup
***

### Discord & Games

On the realtime voice changer app wokada, you select:

- Input: Your microphone
- Output: Virtual Audio Cable
- Monitor (if you wish to hear the realtime voice changer on your headphones aswell): Your headphones

On discord and games, you select:

- Input: Virtual Audio Cable
- Output: Your headphones

***
### Client and Server Setup

Audio: `CLIENT`

- Uses MME (normal audio processed through windows. You use this automatically with every application)
- You can use the boxes echo, sup1, sup2 using this
***
Audio: `SERVER`

- Use S.R. 48000
- I recommend using [Windows WASAPI] on all prefixes for less delay, because this uses your audio devices (e.g. microphone) directly, before processing through windows.
- Both Input and Output has to be the same (Windows WASAPI), you can't use MME for input and then Windows WASAPI for Output.
- You can not use the in-built noise suppressions in this mode
***

ASIO > WASAPI > MME as a general thumbrule (this also affects delay)

Sometimes Client does not work, then use SERVER with prefix "MME" or "Windows WASAPI". You can not use the in-built noise suppression and echo fix if you use SERVER.

***
### Settings Explained
***
- `PASSTHRU button:` Sends your actual voice and not the realtime voice changer through the virtual audio cable. You want this to be GLOWING GREEN or GREY (grey for dark mode users) for the realtime voice changer to work.

- `F0 det:` Pitch extraction algorithm. Both RMVPE (for the best precision and robustness) and FCPE (for less precision & robustness but lower delay) are good options.

- `Chunk:` Controls the delay (lower number means less delay, but please check out the recommended settings for what your GPU is capable of).

- `Extra:` Controls voice model quality. 2.7s is the max, anything above can cause cutoff issues.
***
#### `VOL:`
- `in:` This raises the microphone volume before it goes into the realtime voice changer (Recommended to leave it on the default or if needed, not to go too high, else it increases background noise and makes the voice sound worse).

- `OUT:` Raising realtime voice changer volume on the output.

- `MON:` Increases volume of your headphones that you set on "mon" if you selected to hear yourself with the realtime voice changer.
***
- `Pitch:` This is the pitch. Going into negative will make it lower pitch, going higher will make it higher pitch. If you have a male voice using a female voice, aim for 10 - 14, this depends on your voice, try around those numbers until you find a sweet spot.

- `Formant Shift:` Alters harmonic frequencies and changes the voice timbre without affecting the pitch

- `Index:` This controls the accent of the voice model. In most cases, using Index on Realtime Voice Changer can add realism if you speak the language the model was trained in. If you have a heavy foreign accent, you may use this at a low rate. Beware, this increases CPU usage
***
- `In. Sens:` microphone threshold, increasing this will cause less background noise to get picked up if it's a problem

- `Sup2:` Noise suppression on your microphone.

- `Sup1:` Noise suppression but weaker, not recommended to use this at all, because it barely has any impact whilst reportedly, making the voice inconsistent

- `Echo:` if you experience echo issues despite having sup2, In. Sens to the right and having lowered your windows system value, then this will help you as a last resort

***
## Settings

***
### Advanced Settings

- `Protocol:` rest (Use SIO if you want less delay but if you encounter any issues with SIO switch back to rest. Rest has slightly more delay than SIO)
- `Crossfade length:` Controls how smoothly the AI stitches different processed parts "chunks" of your voice back together. 0.1 or 0.15 (0.1 for fastest voice, 0.15 for improved quality but increases delay by 50 ms)
- `SilenceFront:` Reduce GPU usage when idle. This only reduces GPU resources when you're not talking or making sounds
- `Force FP32 mode:` on (THIS IS OFF BY DEFAULT! Turning this on improves stability. Increases VRAM usage by 200 MB)
- `Disable JIT compilation:` off for faster loading speed of the program, on for slightly better performance (10-15 ms) for Nvidia only.
- `Convert to ONNX:` Reduces delay and slightly reduces gpu usage. Enabling this increases CPU usage by around 5-10%. Reduces the quality of the voice a bit. If you decide to enable this, pair it with rmvpe_onnx for even less delay
- `Protect:` Reduces the occurrence of robotic sibilants and robotic breathing, but also reduces the effect of the index file. Lower values increase this protection, higher values decrease it. The default value is 0.5, which means that the protection is disabled, reduce this value to 0.33 to enable it

***
### Finding my own settings for Chunk

First start with 500 ms, check what number your perf is and go closer to that number but not lower.

Example: if your perf is 200, go down to 250 with your chunk. Chunk affects perf value, and Extra as well.

<img src="../wokada-deiteris-img/green.png" alt="Wokada Deiteris Fork Green Perf Value" width="170" height="auto">

If your perf value is green, your selected chunk is stable. You can experiment and go down in chunk for less delay, or increase extra for more quality (would not recommend to go above 2.7s extra. Anything above uses more resource for no clear benefit).

<img src="../wokada-deiteris-img/yellow.png" alt="Wokada Deiteris Fork Yellow Perf Value" width="170" height="auto">

If your perf value is yellow, your selected chunk is enough, but audio may be unstable if you run other processes at the same time. Operation in this range will also incur high GPU usage. Increasing Chunk size or reducing Extra is recommended.

<img src="../wokada-deiteris-img/red.png" alt="Wokada Deiteris Fork Red Perf Value" width="170" height="auto">

If your perf value is red, the realtime voice changer is unstable. Increase chunk size or reduce Extra.

***
### Known working settings for Chunk and Extra

!!! These settings are intentionally higher than what your GPU is capable of
If you are playing a video game with the realtime voice changer, you will have to increase the chunk higher than what you usually can handle.
This is because the game runs on GPU and the realtime voice changer aswell. The game will always take higher priority by default, so the listed settings are safe options that should run with most games.
If you run into issues, you will need to lower quality and limit your FPS, or increase chunk. It is best to first tweak your game's settings first

It is recommended to go up to Finding my own settings after you are comfortable with the program
!!!

+++ Nvidia

||| GPU
:::content-left
RTX xx90 (e.g. 3090)

RTX xx80 Ti (e.g.3080 Ti)

RTX xx80 (e.g. 3080)

RTX xx70 Ti (e.g. 3070 Ti)

RTX xx70 (e.g. 3070)

RTX xx60 Ti (e.g. 3060 Ti)

RTX xx60 (e.g. 3060)

RTX xx50 (e.g. 3050)

GTX 16xx-series

GTX 10xx-series

GTX 900-series

MX 330
:::
||| Max Settings
:::content-center
30 - 60 ms chunk + 2.7s extra

30 - 60 ms chunk + 2.7s extra

100 - 120 + 2.7s extra

50 - 80 ms chunk + 2.7s extra

50 - 80 ms chunk + 2.7s extra

50 - 90 ms chunk + 2.7s extra

60 - 90 ms chunk + 2.7s extra

110 - 130 ms chunk + 2.7s extra

140 - 180 ms chunk + 2.7s extra

200 ms chunk + 2.0s extra

250 ms chunk + 1.0s extra

500 ms chunk + 0.6s extra
:::
||| For gaming
:::content-right
perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 50 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 80 ms chunk

perf number + 80 ms chunk

perf number + 100 ms chunk
:::
|||

+++ AMD
||| GPU
:::content-left
7xxx XT cards

6xxx XT cards

5xxx XT cards

7xxx cards

6xxx cards

5xxx cards

RX 6600M

RX 580

RX 570

RX 560
:::
||| Max Settings
:::content-center
60 - 80 ms + 2.7s extra

70 - 100 ms + 2.7s extra

80 - 120 ms + 2.7s extra

*bugged* 256 ms + 2.7s extra

128 ms + 2.7s extra

140 - 200ms + 2.0s extra

128ms + 2.7s extra

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk
:::
||| For gaming
:::content-right
perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 40 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 60 ms chunk

perf number + 80 ms chunk
:::
|||
+++ AMD iGPU
||| GPU
:::content-left
AMD Radeon(TM) Graphics (with Ryzen 7 5800H)

AMD Radeon RX Vega 10 (with Ryzen 7 3700U)

AMD Radeon RX Vega 8 (with Ryzen 3 3200G)
:::
||| Chunk + Extra
256 ms + 2.7s extra

600 ms + 0.6s extra

700 ms + 1.0s extra
|||
+++ Mac & CPU's
||| Mac and CPU
Mac M1

Mac M1 Air

Mac M2

Mac M2 Air

Ryzen 7 5800x
||| F0 + Chunk + Extra
fcpe ; for chunk check the perf number and add 50 to it ; 1.0s extra

fcpe + 230ms + 2.7s extra

rmvpe_onnx + 650ms + 1.0s extra

fcpe ; for chunk check the perf number and add 50 to it ; 2.7s extra

rmvpe_onnx + 260 ms + 0.6s extra
|||
+++ 

***
## Extras
***

### Information

!!! What's the best choice for AMD users?
This fork is a lot better for AMD GPU's compared to the original w-okada. On the original it requires converting models to onnx models which is annoying, requires more CPU and GPU resources, has a lot more delay and other little inconveniences/bugs.

Example: AMD RX 6650 XT lowest latency is 298 ms chunk on original w-okada. On this fork lowest latency is around 60 - 80 ms chunk
!!!

!!! Which is better for Nvidia original w-okada or Deiteris' fork?
Deiteris' fork is better for Nvidia users who normally use the prebuilt w-okada version, because this version uses GPU accelerated extra compared to the original which uses CPU.

For the RTX GPUs the delay performance differences are minimal, but quality performance is better. For older cards like GTX or MX, this fork performs better in all aspects.

Example: Nvidia RTX 3070 on prebuilt w-okada reaches 170 - 213 ms chunk latency. On manually set up environment of w-okada reaches 42 ms chunk latency. On this fork it can reach 30 - 38 ms chunk latency, depending on the extra set. Keep in mind these are settings tested to the max, without a video game or intense operations running in the background
!!!

***
### Reduce more Delay (Windows Only)
***
#### Prerequisite: Match Sample Rates (for both WASAPI & ASIO)
This first step is mandatory for both methods. You must select the same `sample rate` for your microphone and the virtual audio cable before proceeding.

!!!
If you don't know how to open your sound devices, press **WIN+R**, type "**mmsys.cpl**", then hit enter.
!!!

1.  Navigate to the `Recording` tab, right-click on your microphone, and select `Properties`.
2.  Go to the last tab, `Advanced`, and set the sample rate to **48000 Hz**.
3.  Ensure both options for **Exclusive Mode** are activated.

<img src="../wokada-deiteris-img/microphone-properties.png" alt="Microphone Properties" width="450" height="auto">

4.  Now, go to the `Playback` tab. Right-click on your virtual audio cable (e.g., Line 1) and go to `Properties`.
5.  In the `Advanced` tab, adjust the sample rate to match your microphone: **48000 Hz**.

<img src="../wokada-deiteris-img/vac-properties.png" alt="Virtual Audio Cable Properties" width="450" height="auto">

With the sample rates matched, you can now proceed to configure either WASAPI or ASIO.

+++ WASAPI
!!! What does WASAPI do?
WASAPI accesses your audio devices directly, while the driver that you use by default (which is "MME") *goes through multiple layers within the Windows audio subsystem*, causing more delay. This will in total cut down **50-80ms delay**.
!!!

#### Enable WASAPI
Assuming you completed the prerequisite step, you can now select the correct inputs and outputs in the voice changer as follows:

- **AUDIO:** `SERVER`
- **S.R.:** Match the sample rate you chose above, which should be `48000`.
- **Input:** `[WINDOWS WASAPI] Your Microphone`
- **Output:** `[WINDOWS WASAPI] Your Virtual Audio Cable (e.g., Line 1)`

<img src="../wokada-deiteris-img/wasapi-server.png" alt="Wokada WASAPI Server Settings" width="600" height="auto">

!!!warning
You cannot use the noise suppression (`sup1`, `sup2`) or `echo` functions in `SERVER` mode.
!!!

Then, on your game or Discord, you select:

- **Input:** Your Virtual Audio Cable (e.g., Line 1 Output)
- **Output:** Your Headphones/Speakers

#### Common Errors
!!!danger sounddevice.PortAudioError: Error opening Stream: Invalid sample rate [PaErrorCode -9997]
You did not match the sample rate of your virtual audio cable to your microphone. Return to the prerequisite step and ensure both are set to the same value (48000 Hz).
!!!
+++ ASIO
!!!
I would recommend using WASAPI first if you are a normal user, as ASIO is more complex to set up.
!!!
!!! What does ASIO do?
Like WASAPI, ASIO accesses your audio devices directly, bypassing multiple layers within the Windows audio subsystem that "MME" (the default driver) has to go through. It has a lower algorithmic delay and can reduce total delay by **50-80ms**.
!!!

#### Step 1: Download and Install FlexASIO
- Download and run the installer from here: [FlexASIO Download](https://github.com/dechamps/FlexASIO/releases/download/flexasio-1.9/FlexASIO-1.9.exe)

#### Step 2: Download and Install FlexASIO GUI
- First, you need the .NET Desktop runtime. Download and install it from here: [.NET 6.x Desktop runtime](https://dotnet.microsoft.com/en-us/download/dotnet/6.0)
- Afterwards, download and install the FlexASIO GUI: [FlexASIO GUI Download](https://github.com/flipswitchingmonkey/FlexASIO_GUI/releases/download/v0.35/FlexASIO.GUIInstaller_0.35.exe)

#### Step 3: Configuring FlexASIO GUI
Run `FlexASIO GUI`. If it doesn't open, you missed installing the .NET runtime from the previous step. Copy the following settings:

- **Backend:** `Windows WASAPI`
- **Buffer Size:** ✅ Set to `256`
- **Input Device:** Select your Microphone.
- **Output Device:** Select your Virtual Audio Cable (e.g., Line 1).
- **Latency:** ✅ Set Input Latency: `0.2` ; ✅ Set Output Latency: `0.2`
- **Output:** ✅ Set: `;` ✅ AutoConvert

<img src="../wokada-deiteris-img/flexasio-gui.jpg" alt="FlexASIO GUI Configuration" width="600" height="auto">

!!! Latency Explanation
Having the input latency at 0.0 can make your microphone crackle. Using 0.1 often works fine. If you experience crackles, experiment with this value (e.g., 0.12, 0.15) until it stops. The lower you can go, the better. If you don't want to experiment, you can keep it at `0.2`.
!!!
!!!danger
Click **SAVE TO DEFAULT FLEXASIO.TOML**. Do not forget this step. You can close the GUI afterwards.
!!!

#### Step 4: Setting it up on the voice changer
!!!warning
The Deiteris Fork works with ASIO, while some older versions of the original w-okada do not.
!!!
In the voice changer app:
- Select **AUDIO:** `Server`
- Select **S.R.:** `48000`
- Select the **input** and **output** from ASIO. You can select "ALL" in the first column to filter for ASIO devices to make it easier.
- **Ch.:** For both input and output, it's best to leave them to "default", the numbers are for true asio devices which flex isnt.
- **Monitor:** You can use the WASAPI Windows, you could also use windows directsound but that might cause an issue if matching sample rates doesnt fix it.

<img src="../wokada-deiteris-img/flexasio-server.png" alt="Wokada FlexASIO Server Settings" width="600" height="auto">

Then, on your game or Discord, you select:

- **Input:** Your Virtual Audio Cable (e.g., Line 1 Output)
- **Output:** Your Headphones/Speakers

#### Common Errors
!!!danger sounddevice.PortAudioError: Error opening Stream: Invalid sample rate [PaErrorCode -9997]
You did not match the sample rate of your virtual audio cable to your microphone. Return to the prerequisite step and ensure both are set to 48000 Hz.
!!!
+++

***
:::content-center
## Troubleshooting
:::

==- :icon-alert: Important: HAGS on Windows 10/11
If you are experiencing lag, stuttering, or slow training speeds, **disable HAGS** (Hardware-Accelerated GPU Scheduling) on Windows 10/11. It is known to interfere with VRAM management for Local AI apps. 
[Read the HAGS Glossary Entry](https://docs.aihub.gg/extra/glossary/#hags-hardware-accelerated-gpu-scheduling) for the full explanation and how to disable it.
===

==- :icon-terminal: Getting Detailed Server Logs (Debug Mode)
The web interface (client) is just a control panel; the actual voice conversion and backend processes happen on the server. If the server closes unexpectedly or you run into issues, you can launch it in **Debug Mode** to see the exact error logs before the window closes.

To do this, you need to append `--log-level debug` when launching the server.

+++ Windows (Command Prompt)
1. Open your extracted `MMVCServerSIO` folder.
2. Click on the File Explorer's address bar at the top, type `cmd`, and press **Enter**.
3. In the black window that appears, run the following command:
```cmd
MMVCServerSIO.exe --log-level debug
```
*(Alternatively, you can create a shortcut of the `.exe`, open its Properties, and add ` --log-level debug` to the end of the **Target** line).*

+++ Windows (PowerShell)
1. Open your extracted `MMVCServerSIO` folder.
2. Click on the File Explorer's address bar at the top, type `powershell`, and press **Enter**.
3. Run the following command (note the `.\` required by PowerShell):
```powershell
.\MMVCServerSIO.exe --log-level debug
```

+++ Mac & Linux
1. Open your Terminal and navigate to your `MMVCServerSIO` folder.
2. Run the executable with the debug flag:
```bash
./MMVCServerSIO --log-level debug
```
+++

Once the server runs or crashes, **copy all the text from the command line window**. Save it to a `.txt` file, or paste it to a site like [Pastebin](https://pastebin.com/) to share with others when asking for support.

!!!info Other Log Levels
By default, the server runs on the `info` log level. While the server also supports `warning`, `error`, and `critical` levels, you should avoid using them for troubleshooting. They filter out background information, hiding the context developers need to figure out why your server crashed.
!!!
===

==- :icon-download: Failed to download or verify
- After you start the program for the first time and it finished downloading files, but you have slow/unstable internet connection it might say Failed to download or verify: ... followed by "Press Enter to continue" at the end, then the pretrain download failed. You have 2 methods to fix it.
- **Method 1:** Retry with a better connection later.
- **Method 2:** 
  1. Go to the "pretrain" folder in the `MMVCServerSIO` folder.
  2. Delete everything inside it.
  3. Download the [Zipped Version of the Pretrained folder](https://github.com/Nick088Official/Wokada-Deiteris-Fork-Pretrain/releases/download/b2332/pretrain.zip)
  4. Extract the contents from pretrain.zip (ensure there is no nested "pretrain" folder).
  5. Run the MMVCServerSIO.exe again.
===

==- :icon-zap: Crackle Fix
1. Open Task Manager > Details.
2. Right-click `audiodg.exe` > Set Priority > **High**.
3. Right-click `audiodg.exe` > Set Affinity > Uncheck everything except **CPU 2** (only keep CPU 2 active).
4. *Automation Tip:* Use [ProcessLasso](https://bitsum.com/) to automate this, or create a `.bat` file with: 
  `powershell "ForEach($PROCESS in GET-PROCESS audiodg) { $PROCESS.ProcessorAffinity=4; $PROCESS.PriorityClass='High' }"`
===

==- :studio_microphone: Discord Crackle Fix
- Ensure you have performed the general Crackle Fixes above first.
- If it only glitches in Discord:
  - Turn off **Echo Cancellation** in Discord.
  - Turn off **Noise Suppression** in Discord.
===

==- :icon-rocket: GPU Idling / Performance
1. In the W-Okada folder, run `force_gpu_clocks.bat` to keep GPU speeds steady.
2. Run `reset_gpu_clocks.bat` when you are finished using the app to return to normal GPU behavior.
===

==- :icon-alert: Pipeline not initialized
- Ensure you are on the latest OS version and GPU drivers.
- Ensure you have selected an RVC model in the UI *before* clicking "Start Server".
- Ensure your model name and folder path contain **no spaces or special characters**.
===

==- :icon-bug: Silent Crash / XHR Poll Error (Pro Audio & DJ Hardware)
If in a Wokada Fork, you click "Start Server" and the command-line window closes instantly without an error, followed by `[SIO] rconnection failed Error: xhr poll error` pop-ups in the browser:

- **Possible Cause:** You have Professional Audio or DJ **hardware drivers** installed (e.g., Native Instruments Traktor Kontrol). When the voice changer uses `PortAudio` to scan for audio devices on startup, querying an ASIO/hardware driver while its physical device is **unplugged** causes a fatal low-level C crash. This kills the server instantly and silently. This was found at https://discord.com/channels/1159260121998827560/1488310350603616476

!!!info DAW Software is Safe
You do **not** need to uninstall your DAW (Traktor Pro 4, FL Studio, Ableton, etc.). The issue is strictly caused by the hardware drivers, not the music software itself.
!!!

**Solutions:**
- **Method 1 (Recommended): Disable the drivers**
  1. Right-click your Windows Start button and open **Device Manager**.
  2. Expand the **Sound, video and game controllers** category.
  3. Find your pro-audio hardware drivers (e.g., `Native Instruments Traktor Kontrol S8 Driver`).
  4. Right-click them and select **Disable device**. You can easily re-enable them later when you need to use your audio equipment.

- **Method 2: Uninstall the drivers**
  - If you no longer use the physical hardware, completely uninstall the specific hardware drivers from your system. The voice changer will open normally immediately after.

- **Method 3: Plug the hardware in**
  - Plugging your physical DJ Controller or Audio Interface into your PC and turning it on before starting the voice changer may prevent the driver from crashing when scanned.

- **Method 4 (Advanced): Hide the ASIO driver in Registry**
  - If you suspect an ASIO driver is causing the issue and Method 1 didn't work, you can temporarily hide it from Windows. Open the Registry Editor (`regedit`), navigate to `Computer\HKEY_LOCAL_MACHINE\SOFTWARE\ASIO`, right-click the folder of your specific audio software, and rename it (e.g., add `.backup` to the end). Rename it back when you want to use it again.
===

==- :question: I couldn't find my answer.
- Report your issue [here](https://docs.aihub.gg/contributions).
===

***
## FAQ
***
### Why does it run in a browser and not it's own window?

Because it uses a Web User Interface (WebUI) coded in JavaScript & TypeScript, the majority of (Open Source) AI programs are designed to run on the browser (even tho usually using things like Gradio) since it can be used both on cloud and locally. The original wokada also ran on a WebUI, just that it made it's own window.

***
### Which Web Browser should I use?

It's better you try and test, some people had issues on Chrome, some others on Firefox, it might depend on the settings you use and also Java/Type Script having issues. The browser that usually is reported by most people to have issues is OperaGX, which is why we don't suggest it much.

***
### Why are most Video Tutorials like on YouTube old? Is there going to be an updated one?

YouTube Tutorials take way more time to make, and get outdated easily in this case, as AI progresses fast and continues to change in better, with more different settings and versions. Written guides are easier to update, since you don't have to remake an entire video. It's unknown if we will ever release a video since they easily get outdated, but if we will, it will be linked inside of this guide.

***
### Do I need an extremely expensive mic for good quality?

We had a conversation about this in https://discord.com/channels/1159260121998827560/1159290161683767298/1352325982689951765 & https://discord.com/channels/1159260121998827560/1159290161683767298/1356265862704926907,
RVC works by downsampling your audio voice to 16khz because f0 estimators only works at that sample rate, after that the model outputs the results using it's original sample rate (without any upscaling). So there won't be the need of having a super extremely expensive, a decent one should do the job.

***
### Are there unique Voice Models?
RVC Voice Models need to be trained on something, so the models themselves can't be unique, but you can use the [Merge Lab](https://docs.aihub.gg/realtime-voice-changer/local/deiteris-w-okada-fork/#merging-models-merge-lab) to create a new unique merged model.

***
### Is there a way to use Spin embedder rvc voice models?
Wokada Deiteris Fork doesn't support models trained with the Spin embedder (which are very few), but there is a Pull Request for that https://github.com/deiteris/voice-changer/pull/213, which is just the [Tg-Develop's Fork](https://docs.aihub.gg/realtime-voice-changer/local/tg-develops-w-okada-fork/).

***
###### ‎
:::content-center
#### `You have reached the end.`

[!badge variant="info" size="xl" corners="pill" icon="paper-airplane" iconAlign="right" text="Report Issues"](https://docs.aihub.gg/contributions/)
:::
