BeltoVox: The Ultimate User Guide
Everything you need to know to turn your voice into perfectly formatted, polished text in any application.
Welcome to BeltoVox, a professional-grade Windows dictation tool that uses state-of-the-art AI to turn your voice into perfectly formatted, polished text in any application.
Getting Started: The API Keys
BeltoVox is a "Bring Your Own Key" (BYOK) application. This ensures your data stays private and you only pay for what you use directly to the AI providers.
Option A: Groq
Recommended - Fast & Free/CheapGroq is the fastest way to use BeltoVox. Transcriptions are nearly instantaneous.
- Sign Up: Go to Groq Console.
- Generate Key: Navigate to API Keys in the sidebar. Click Create API Key.
- Copy:
Save the key (starts with
gsk_). - Pricing: Groq currently offers a generous Free Tier. For paid usage, it is typically priced by "tokens," but is extremely cost-effective.
Option B: OpenAI
Industry Standard- Sign Up: Go to OpenAI Platform.
- Add Credits: OpenAI requires a prepaid balance (minimum $5). Go to Settings > Billing.
- Generate Key: Go to API Keys and click Create new secret key.
- Copy:
Save the key (starts with
sk-). - Pricing:
- Whisper (Speech): ~$0.006 per minute of audio.
- GPT-4o-mini (Refining): Fractions of a cent per request.
The Dashboard: A Deep Dive
Right-click the tray icon and select Show Dashboard to access these tabs.
Tab 1: API Settings
- Provider Selection: Choose between OpenAI or Groq.
- Credentials: Paste your encrypted keys here.
Tab 2: Triggers & Output
- Global Keyboard Trigger: Set your favorite shortcut (Default: Ctrl + Space). This works even if BeltoVox is hidden.
- Mouse Triggers:
- Button: Map dictation to Side Button 1, 2, or Middle Click.
- Mode: Use Push-to-Talk (hold to speak) or Toggle (tap to start/stop).
- Status Overlay: Enable a floating circle that stays on top and shows if you are recording.
- Delivery Mode:
- Direct Insert: Types the text instantly at your cursor.
- History Only: Only saves the text to the dashboard for later manual copying.
Tab 3: AI & Language
- Input Language: Choose your native language or use Auto-detect. We support English, Hungarian, German, Thai, and more.
- AI Processing Modes:
- Standard: Literal transcription.
- Refiner: Cleans grammar and removes "filler" words.
- Professional: Rewrites your speech into a formal business tone.
- Friendly: Perfect for quick, warm chat messages.
- Translate to English: Automatically translates any foreign speech into English text.
- Smart Formatting: Handles complex punctuation and capitalization automatically.
Tab 4: Audio
- Microphone: Select specific hardware if you have multiple mics.
- Sample Rate: High quality (44.1kHz) or Optimized (16kHz). 16kHz is recommended for faster AI processing.
- Feedback: Adjust the Beep Volume or disable sounds entirely.
Tab 6: Usage & Stats
- Recording Time: Total time spent speaking.
- API Total Requests: How many times the AI was called.
- Est. Spend: A calculated estimate of your cost (using standard OpenAI/Groq rates) to help you track your budget.
Visual Indicators (Status Overlay)
The floating circle changes color to keep you informed without looking at the app:
Ready to listen.
Capturing your voice.
Recording is on hold.
AI is typing or refining.
Quick Control & System Tray
Left-click Tray Icon
Opens a compact "Quick Control" window. This is perfect for when you want a minimal interface while working.
Right-click Tray Icon
- Quickly switch input languages.
- Access Settings.
- Open "Launch on Startup" options.
Pro Tips for Better Accuracy
Short Bursts
AI works best with clips between 5 seconds and 1 minute.
Clear Environment
Background noise can confuse the "Refiner" mode.
Administrator Rights
If your hotkey doesn't work in certain apps (like Task Manager), run BeltoVox as Administrator.