neo.one Model Latency Performance

Dubverse’s neo.one model demonstrates exceptional performance in Text-to-Speech (TTS) generation, offering a balance of speed and quality comparable to industry-leading neural voice models. Here’s a detailed breakdown of its latency metrics and key features.

Understanding TTS Latency

In the context of TTS technology, latency refers to the delay between receiving the input text and generating the audio output. It’s crucial to measure and optimize various comonents of latency for the best user experience.

Components of Latency

Network Latency: The time it takes for your request to reach Dubverse’s servers.
Time to First Byte (TTFB): The time from initiating the API request to receiving the first byte of audio.
Audio Synthesis Latency: The time it takes to generate the complete audio response.

The latency for the first couple of requests might be higher due to cold start times. Subsequent requests will typically show improved performance.

neo.one Latency Metrics

Characters per Second

neo.one: 276.13 (average)
Industry Comparison:
- Amazon Polly (Neural): 459
- LMNT: 337
- Microsoft Azure (Neural): 292
- Google Cloud TTS (Studio): 287

Speed Factor

neo.one: 23.80x (average)
Industry Comparison:
- Amazon Polly (Neural): 27.52
- LMNT: 20.49
- Google Cloud TTS (Studio): 17.23
- Microsoft Azure (Neural): 17.02
- Cartesia: 6.9x

neo.one’s speed factor is highly competitive, outperforming most industry leaders including Cartesia.

These metrics are based on the report published by Artificial Analysis on October 17, 2024. For the full comparison, visit Artificial Analysis TTS Comparison.

Time to First Byte (TTFB)

neo.one: 193.33 ms (average)

Key Features for Optimizing Latency

Streaming Support

neo.one supports audio streaming, allowing you to begin playback as soon as the first audio chunk is received. This significantly reduces perceived latency, especially for longer texts.

Example of streaming implementation:

def get_stream(url, load, headers=None):
    s = requests.Session()
    start_time = time.time()
    ttfb = None
    total_bytes = 0
    
    with s.post(url, json=load, headers=headers, stream=True) as resp:
        for data in resp.iter_content(chunk_size=1024*8):
            if ttfb is None:
                ttfb = time.time() - start_time
            total_bytes += len(data)
            yield data, time.time() - start_time
    
    return ttfb, total_bytes

Customizable Bitrate

neo.one allows you to adjust the bitrate of the audio output, balancing between audio quality and file size. Lower bitrates can reduce transmission time, further minimizing latency.

Flexible Audio Formats

Choose from various audio formats to optimize for your specific use case, considering factors like quality, file size, and compatibility.

Measuring Performance

To accurately measure neo.one’s performance, we provide a comprehensive testing script that calculates various metrics. You can use this script to replicate our results and test the performance in your own environment.

Performance Testing Script

import requests
import time
import wave
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def get_stream(url, load, headers=None):
    s = requests.Session()
    start_time = time.time()
    ttfb = None
    total_bytes = 0
    
    with s.post(url, json=load, headers=headers, stream=True) as resp:
        for data in resp.iter_content(chunk_size=1024*8):
            if ttfb is None:
                ttfb = time.time() - start_time
            total_bytes += len(data)
            yield data, time.time() - start_time
    
    return ttfb, total_bytes

def run_test(url, load, headers=None, file_name="test.wav"):
    orig_data = b''
    start_time = time.time()
    ttfb = None
    total_bytes = 0

    try:
        with open(file_name, 'wb') as file:
            for data, elapsed_time in get_stream(url, load, headers):
                if ttfb is None:
                    ttfb = elapsed_time
                orig_data += data
                file.write(data)
                total_bytes += len(data)
    except KeyboardInterrupt:
        logger.info("Script interrupted by user")

    end_time = time.time()
    total_time = end_time - start_time

    file_size = os.path.getsize(file_name)
    with wave.open(file_name, 'rb') as wav_file:
        channels = wav_file.getnchannels()
        sample_width = wav_file.getsampwidth()
        frame_rate = wav_file.getframerate()
        
        header_size = 44
        data_size = file_size - header_size
        n_frames = data_size // (channels * sample_width)

    audio_duration = n_frames / float(frame_rate)
    network_latency = ttfb * 1000
    overall_latency = total_time * 1000
    speed_factor = audio_duration / total_time
    chars_per_second = len(load["text"]) / total_time

    return {
        "TTFB": f"{ttfb * 1000:.2f} ms",
        "Overall Latency": f"{overall_latency:.2f} ms",
        "Speed Factor": f"{speed_factor:.2f}x",
        "Characters per second": f"{chars_per_second:.2f}",
        "Total Data Received": f"{file_size / 1024:.2f} KB",
        "Audio Duration": f"{audio_duration:.2f} seconds",
        "Sample Width": f"{sample_width * 8} bits",
        "Frame Rate": f"{frame_rate} Hz"
    }

# Example usage
text_static = "मैं आपका डिजिटल सहायक हूं। आज मैं आपकी कैसे मदद कर सकता हूं? मैं आपके लिए मौसम की जानकारी, समाचार अपडेट, कैलेंडर प्रबंधन, या किसी भी सवाल का जवाब दे सकता हूं। बस मुझे बताइए कि आपको क्या चाहिए।"

url = 'https://audio.dubverse.ai/api/tts'
load = {
    "text": text_static,
    "speaker_no": 149,
    "config": {"use_streaming_response": True}
}
headers = {"X-API-Key": "YOUR_API_KEY_HERE"}

print("Testing neo.one API:")
results = run_test(url, load, headers, file_name="test.wav")
for key, value in results.items():
    print(f"{key}: {value}")

To use this script:

Replace 'YOUR_API_KEY_HERE' with your actual API key.
Run the script to test neo.one’s performance.
The script will output various performance metrics, allowing you to compare with our reported averages.

Remember to handle your API key securely and not share it publicly.

Customizable Configuration

neo.one allows you to adjust various parameters to optimize for your specific use case. In the load dictionary of the test script, you can modify:

text: The input text for TTS conversion
speaker_no: The ID of the speaker voice
config: Additional configuration options, such as streaming settings

Experiment with these parameters to find the optimal configuration for your use case.

Tips for Minimizing Latency

Text Chunking: For long texts, split content into smaller chunks and process them sequentially. This allows for faster initial playback.
Optimize Server Location: Choose the server closest to your primary user base to reduce network latency.
Caching: Implement caching strategies for frequently used phrases or responses to eliminate processing time for repeated content.
Parallel Processing: For applications requiring multiple TTS conversions, consider implementing parallel API calls to reduce overall processing time.

Conclusion

neo.one offers competitive latency performance comparable to leading neural TTS models. With features like streaming support, customizable configurations, and flexible audio formats, it provides a robust solution for applications requiring responsive and high-quality TTS capabilities.

For more information on leveraging neo.one’s performance in your projects, refer to our API documentation or contact our support team at [email protected] for custom integration solutions.

Performance Variations

Please note that the reported performance metrics may vary depending on your current plan and usage. Factors such as server location, network conditions, and concurrent requests can affect actual performance. For the most accurate assessment of neo.one’s performance for your specific use case, we recommend running tests using your own infrastructure and typical workload.

If you’re experiencing performance issues or need guidance on optimizing neo.one for your specific needs, please contact our support team at [email protected].

Getting Started

Models & Speakers

Performance

API References

Latency Performance

neo.one Model Latency Performance

Understanding TTS Latency

Components of Latency

neo.one Latency Metrics

Characters per Second

Speed Factor

Time to First Byte (TTFB)

Key Features for Optimizing Latency

Streaming Support

Customizable Bitrate

Flexible Audio Formats

Measuring Performance

Customizable Configuration

Tips for Minimizing Latency

Conclusion

Performance Variations

Getting Started

Models & Speakers

Performance

API References

​neo.one Model Latency Performance

​Understanding TTS Latency

​Components of Latency

​neo.one Latency Metrics

​Characters per Second

​Speed Factor

​Time to First Byte (TTFB)

​Key Features for Optimizing Latency

​Streaming Support

​Customizable Bitrate

​Flexible Audio Formats

​Measuring Performance

​Customizable Configuration

​Tips for Minimizing Latency

​Conclusion

​Performance Variations

neo.one Model Latency Performance

Understanding TTS Latency

Components of Latency

neo.one Latency Metrics

Characters per Second

Speed Factor

Time to First Byte (TTFB)

Key Features for Optimizing Latency

Streaming Support

Customizable Bitrate

Flexible Audio Formats

Measuring Performance

Customizable Configuration

Tips for Minimizing Latency

Conclusion

Performance Variations