The
world's first Text-to-Speech system to achieve production-quality audio under
aggressive low-precision compute- unlocking a new cost frontier and bringing
on-prem voice AI within reach for any enterprise, at any scale.
Smallest.ai
now runs Lightning V2, its real-time text-to-speech model, on Tenstorrent
hardware – marking the first production-grade TTS system to match the cost of
text tokens without degradation in audio quality.
This
move delivers 3.6× lower infrastructure cost with higher throughputs and faster
response times than leading GPU alternatives. For enterprises running voice AI
at scale, this is not an incremental improvement. It is a structural shift in
what voice AI costs to deploy and operate.
Lightning
V2 is the first TTS system in the world to achieve production-quality audio
under aggressive low-precision compute — and it is available now, exclusively
on Tenstorrent, AI compute for any scale.
Built
for How Real-Time AI Actually Works
Tenstorrent's
architecture was designed for continuous, low-latency inference — the kind of
work that real-time voice demands. Data moves directly between compute cores
on-chip, without routing through external memory. For a voice model that
generates speech through iterative refinement, this translates into faster
responses and significantly more efficient execution.
Its
native support for low-precision formats such as BlockFloat8 makes it a natural
fit for the level of optimisation Lightning V2 requires — hardware and model
working in concert, not in compromise.
"Traditional
GPUs are built around massively parallel streaming multiprocessors — powerful
for throughput, but for single-stream, real-time inference like TTS, the
bottleneck isn't compute, it's memory movement. Tenstorrent's architecture is
fundamentally different.
We
use a Network-on-Chip connecting cores directly with large distributed SRAM,
and data moves core-to-core without round-tripping through DRAM. We've built
efficient AI hardware that reduces the cost of existing workflows and unlocks
applications previously not economically feasible."
—
Amr Elashmawi, VP of Strategy & Business Development, Tenstorrent
The
Engineering Behind This
This
is not just a port. Lightning V2 was rebuilt from the ground up for how
Tenstorrent moves data. Over 95% of the model now runs in reduced-precision
arithmetic (LoFi), with more than 80% operating in BlockFloat8 — achieving zero
audible degradation at production quality. It is the first TTS system in the
world to combine both at this scale without compromising voice.
Optimising
a voice model is a different problem from optimising a language model. A
language model produces tokens — discrete outputs where small numerical errors
are naturally absorbed. A voice model produces a continuous audio waveform:
tens of thousands of data points per second, each one shaping what a listener
hears.
Push
the wrong part of the model into lower precision and the result isn't a
slightly worse output – it's robotic distortion, background hiss, or pitch that
collapses mid-sentence. The human ear catches what standard engineering metrics
completely miss.
During
development, the team encountered a layer that scored perfectly on every
numerical test and was still causing audible breakage. Finding it took over a
month. That depth of work — custom compute routines, layer-by-layer
experimentation, perceptual validation — is what makes the result possible.
Lightning V2 wasn't optimised until it broke. It was co-designed with the
hardware until both performed at their best.
"Tenstorrent's
architecture is fundamentally different than existing paradigms. Working with
the NoC and larger SRAM that define the data movement fluidity, we unlocked 4x
gains, driving a structural shift in inference economics."
—
Ranjith, Senior AI Inference Performance Engineer, Smallest.ai
The
Performance in Numbers
Supporting
550 simultaneous voice calls (assuming zero idle time between requests):
• On NVIDIA L40S: 11 GPUs — approximately
$100,000
• On Tenstorrent P100: 27 accelerators — approximately $27,000 — 3.6× lower cost
Voice
AI at Population Scale- Finally Possible
In
a future where billions of people speak to voice AI every day, the economics of
how it runs matters as much as the quality of how it sounds. Until now, the
organisations with the most to gain from deploying voice AI at scale have faced
the steepest barrier to doing so.
Regulated
industries — Financial services, healthcare, and telecoms companies that need
data to stay on their own infrastructure can now meet GDPR, HIPAA, and
sovereign data requirements without the hardware cost making it impractical.
Growth-stage
companies — Teams that couldn't previously justify the capital spend can now
access the same quality of voice infrastructure as the largest players in their
market.
High-volume
operations — Contact centres, voice agents, and multilingual support operations
can scale without infrastructure cost becoming the ceiling on what's possible.
"Voice
AI has always been held back by the assumption that you need expensive
infrastructure to do it well. What we've built with Tenstorrent proves
otherwise."
—
Akshat Mandloi, CTO, Smallest.ai
Lightning
V3 Is Already on the Horizon
Smallest.ai's
next model, Lightning V3, already outperforms OpenAI, Cartesia, and ElevenLabs
on key voice quality benchmarks. The same co-design methodology is now being
applied to bring V3 to Tenstorrent — with early optimisation work pointing to
gains that could surpass what V2 has achieved.
Note:
Lightning V3 is not part of this release. Lightning V2 is available on
Tenstorrent now, with usage-based pricing and no upfront commitments.
About Smallest.ai
Smallest.ai
is a research-first Voice AI company building proprietary speech models and
production-grade voice agents for regulated enterprises. The company develops
state-of-the-art speech-to-text, text-to-speech, and real-time voice systems,
enabling end-to-end automation of high-volume conversations across support,
collections, onboarding, and servicing- without relying on third-party APIs.
Designed for financial
services and other regulated industries, Smallest.ai
is SOC 2, GDPR, HIPAA, and PCI compliant, supports on-prem and private cloud
deployments, and operates reliably in multilingual environments. Its platform
is used in production by enterprises across banking, insurance, BPO, and
telecommunications in the US and India.
About
Tenstorrent
Tenstorrent is an AI
compute company led by CEO Jim Keller — architect of Apple A4/A5, AMD Zen, and
Tesla's Full Self-Driving chip. The company builds RISC-V-based AI processors
and systems for developers, enterprises, and sovereign infrastructure
worldwide. In addition to servers and workstations, Tenstorrent licenses its
Ascalon RISC-V CPU and Tensix AI cores to chip designers including Samsung and
LG. Backed by Bezos Expeditions, Samsung, LG Electronics, Hyundai Motor Group,
Fidelity, and others, Tenstorrent has raised over $1B+ and operates from Santa
Clara, Austin, Toronto, Belgrade, Tokyo, and Bangalore.
tenstorrent.com
