In the realm of artificial intelligence (AI), Large Language Models (LLMs) have garnered significant attention for their prowess in generating text and images, revolutionizing core business functions. While the strides in generative AI for chat interfaces and task performance are well-documented, the focus on applying this technology to voice-based conversational experiences is often overlooked.
Enter Large Speech Models (LSMs), the vocal counterparts of LLMs. In the domain of modern contact centers, which currently lean heavily on rigid conversational experiences such as Interactive Voice Response (IVR), LSMs emerge as transformative solutions. These models, developed by IBM Watsonx development teams and IBM Research, leverage transformer technology to process extensive training data and model parameters, ensuring high accuracy in speech recognition. Specifically designed for customer care use cases, including self-service phone assistants and real-time call transcription, IBM’s LSM promises to elevate customer experiences through advanced transcriptions.
The recent deployment of new LSMs in English and Japanese, exclusively available in a closed beta to Watson Speech to Text and Watson Assistant phone customers, marks a significant milestone. These models, built on state-of-the-art transformer technology, are poised to redefine voice-based interactions in the AI landscape.
The performance of IBM’s LSMs stands out, particularly in terms of accuracy and efficiency. Internal benchmarking reveals that the new LSM outperforms OpenAI’s Whisper model on short-form English use cases, achieving a Word Error Rate (WER) that is 42% lower. Notably, IBM’s LSM is also more resource-efficient, being 5x smaller than the Whisper model, with 5x fewer parameters. This translates to a processing speed that is 10x faster when executed on the same hardware. The streaming capability of the LSM allows for real-time processing, ensuring that the model completes its task when the audio concludes, in contrast to Whisper’s block-mode processing, which operates in intervals, regardless of audio length.
In essence, IBM’s introduction of LSMs represents a pivotal advancement in voice-based AI applications, promising unparalleled accuracy, efficiency, and transformative potential for customer care scenarios. As the AI landscape continues to evolve, these models herald a new era in voice-based conversational experiences, revolutionizing the modern contact center paradigm.
Leave a Reply