Smallest.ai and Tenstorrent Partnership Democratises Voice AI - 4x reduction in cost through hardware acceleration

The world's first Text-to-Speech system to achieve production-quality audio under aggressive low-precision compute- unlocking a new cost frontier and bringing on-prem voice AI within reach for any enterprise, at any scale.

Smallest.ai now runs Lightning V2, its real-time text-to-speech model, on Tenstorrent hardware – marking the first production-grade TTS system to match the cost of text tokens without degradation in audio quality.

This move delivers 3.6× lower infrastructure cost with higher throughputs and faster response times than leading GPU alternatives. For enterprises running voice AI at scale, this is not an incremental improvement. It is a structural shift in what voice AI costs to deploy and operate.

Lightning V2 is the first TTS system in the world to achieve production-quality audio under aggressive low-precision compute — and it is available now, exclusively on Tenstorrent, AI compute for any scale.

Built for How Real-Time AI Actually Works

Tenstorrent's architecture was designed for continuous, low-latency inference — the kind of work that real-time voice demands. Data moves directly between compute cores on-chip, without routing through external memory. For a voice model that generates speech through iterative refinement, this translates into faster responses and significantly more efficient execution.

Its native support for low-precision formats such as BlockFloat8 makes it a natural fit for the level of optimisation Lightning V2 requires — hardware and model working in concert, not in compromise.

"Traditional GPUs are built around massively parallel streaming multiprocessors — powerful for throughput, but for single-stream, real-time inference like TTS, the bottleneck isn't compute, it's memory movement. Tenstorrent's architecture is fundamentally different.

We use a Network-on-Chip connecting cores directly with large distributed SRAM, and data moves core-to-core without round-tripping through DRAM. We've built efficient AI hardware that reduces the cost of existing workflows and unlocks applications previously not economically feasible."

— Amr Elashmawi, VP of Strategy & Business Development, Tenstorrent

The Engineering Behind This

This is not just a port. Lightning V2 was rebuilt from the ground up for how Tenstorrent moves data. Over 95% of the model now runs in reduced-precision arithmetic (LoFi), with more than 80% operating in BlockFloat8 — achieving zero audible degradation at production quality. It is the first TTS system in the world to combine both at this scale without compromising voice.

Optimising a voice model is a different problem from optimising a language model. A language model produces tokens — discrete outputs where small numerical errors are naturally absorbed. A voice model produces a continuous audio waveform: tens of thousands of data points per second, each one shaping what a listener hears.

Push the wrong part of the model into lower precision and the result isn't a slightly worse output – it's robotic distortion, background hiss, or pitch that collapses mid-sentence. The human ear catches what standard engineering metrics completely miss.

During development, the team encountered a layer that scored perfectly on every numerical test and was still causing audible breakage. Finding it took over a month. That depth of work — custom compute routines, layer-by-layer experimentation, perceptual validation — is what makes the result possible. Lightning V2 wasn't optimised until it broke. It was co-designed with the hardware until both performed at their best.

"Tenstorrent's architecture is fundamentally different than existing paradigms. Working with the NoC and larger SRAM that define the data movement fluidity, we unlocked 4x gains, driving a structural shift in inference economics."

— Ranjith, Senior AI Inference Performance Engineer, Smallest.ai

The Performance in Numbers

Supporting 550 simultaneous voice calls (assuming zero idle time between requests):

• On NVIDIA L40S: 11 GPUs — approximately $100,000

• On Tenstorrent P100: 27 accelerators — approximately $27,000 — 3.6× lower cost

Voice AI at Population Scale- Finally Possible

In a future where billions of people speak to voice AI every day, the economics of how it runs matters as much as the quality of how it sounds. Until now, the organisations with the most to gain from deploying voice AI at scale have faced the steepest barrier to doing so.

Regulated industries — Financial services, healthcare, and telecoms companies that need data to stay on their own infrastructure can now meet GDPR, HIPAA, and sovereign data requirements without the hardware cost making it impractical.

Growth-stage companies — Teams that couldn't previously justify the capital spend can now access the same quality of voice infrastructure as the largest players in their market.

High-volume operations — Contact centres, voice agents, and multilingual support operations can scale without infrastructure cost becoming the ceiling on what's possible.

"Voice AI has always been held back by the assumption that you need expensive infrastructure to do it well. What we've built with Tenstorrent proves otherwise."

— Akshat Mandloi, CTO, Smallest.ai

Lightning V3 Is Already on the Horizon

Smallest.ai's next model, Lightning V3, already outperforms OpenAI, Cartesia, and ElevenLabs on key voice quality benchmarks. The same co-design methodology is now being applied to bring V3 to Tenstorrent — with early optimisation work pointing to gains that could surpass what V2 has achieved.

Note: Lightning V3 is not part of this release. Lightning V2 is available on Tenstorrent now, with usage-based pricing and no upfront commitments.

About Smallest.ai

Smallest.ai is a research-first Voice AI company building proprietary speech models and production-grade voice agents for regulated enterprises. The company develops state-of-the-art speech-to-text, text-to-speech, and real-time voice systems, enabling end-to-end automation of high-volume conversations across support, collections, onboarding, and servicing- without relying on third-party APIs.

Designed for financial services and other regulated industries, Smallest.ai is SOC 2, GDPR, HIPAA, and PCI compliant, supports on-prem and private cloud deployments, and operates reliably in multilingual environments. Its platform is used in production by enterprises across banking, insurance, BPO, and telecommunications in the US and India.

About Tenstorrent

Tenstorrent is an AI compute company led by CEO Jim Keller — architect of Apple A4/A5, AMD Zen, and Tesla's Full Self-Driving chip. The company builds RISC-V-based AI processors and systems for developers, enterprises, and sovereign infrastructure worldwide. In addition to servers and workstations, Tenstorrent licenses its Ascalon RISC-V CPU and Tensix AI cores to chip designers including Samsung and LG. Backed by Bezos Expeditions, Samsung, LG Electronics, Hyundai Motor Group, Fidelity, and others, Tenstorrent has raised over $1B+ and operates from Santa Clara, Austin, Toronto, Belgrade, Tokyo, and Bangalore.

tenstorrent.com

Smallest.ai and Tenstorrent Partnership Democratises Voice AI - 4x reduction in cost through hardware acceleration

From Banker to First Responder: The Value of Community Service in Times of Crisis

Categories

Main Tags

Latest Posts

Popular Posts

From Banker to First Responder: The Value of Community Service in Times of Crisis

How Speaker Yash Tiwari Inspires To SpeakBOLD In This Modern Era

Launch of Shivajinagar Super Speciality Hospital: A New Era in Healthcare for East Bangalore

Contact Form