Chipstrat

Chipstrat

Share this post

Chipstrat
Chipstrat
Tokens Per Second Per Watt: A Useful Metric for Edge AI
Copy link
Facebook
Email
Notes
More

Tokens Per Second Per Watt: A Useful Metric for Edge AI

A systems-level perspective on performance and energy efficiency

Austin Lyons's avatar
Austin Lyons
Nov 23, 2024
∙ Paid
3

Share this post

Chipstrat
Chipstrat
Tokens Per Second Per Watt: A Useful Metric for Edge AI
Copy link
Facebook
Email
Notes
More
1
Share

Over recent years, the NPU has emerged as a power-efficient solution for AI workloads. 

In our previous discussion of NPUs (1, 2), we discussed the performance measurement du jour: TOPS. However, TOPS falls short because it doesn’t account for power consumption, which impacts battery performance.

For Small LLMs (SLMs) at the edge, performance efficiency takes precedence over raw performance. Performance is irrelevant if it drains the battery too quickly. Therefore, Performance per Watt, measured as TOPS/W, becomes the important metric—can the required TOPS fit within the power budget?

Yet, TOPS/W doesn’t capture the user experience. The chip might be efficient, but is it delivering the inference speed users require?

For edge SLMs, responsiveness and speed matter most to the user. Responsiveness is measured by time-to-first token (TTFT), while speed is captured by tokens per second (TPS). TTFT affects perceived snappiness, and TPS ensures the answer comes quickly enough for the user.

An edg…

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Austin Lyons
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More