Shopping cart

Subtotal: $0.00

Arm Brings v9 to IoT, GenAI to Edge Devices

Arm has unveiled a new Cortex-A CPU core designed to bring generative AI to edge devices. The Cortex-A320 is the first Arm v9 core for the IoT, and paired with Arm’s Ethos-U85 NPU, it will enable generative and agentic AI use cases in IoT devices, including models with more than one billion parameters.

“Just a few years ago, edge AI workloads were much simpler than today, focused on basic noise reduction or anomaly detection,” said Paul Williamson, senior VP and general manager for the IoT line of business at Arm. “But now the workloads have become much more complex, and we’re trying to meet the demands of much more sophisticated use cases.”

 

These use cases include big models and AI agents, he said.

“This isn’t just an incremental step forward, it represents a fundamental shift in how we’re approaching edge computing and AI processing, and we believe it’s going to drive forward the edge AI revolution for years to come,” Williamson said.

Upgrading to Arm v9 architecture has given the Cortex-A320 better AI performance and better security features versus its predecessor, the A35, which is on Arm v8. New instructions have boosted GEMM (matrix multiplication) an order of magnitude, and scalar compute is 30% faster. SVE2 (scalable vector extension 2) is included for vector processing; this is a combination of Arm’s Neon vector extensions and SVE, the company’s SIMD (single instruction, multiple data) instruction set. Added support for AI-friendly data types includes BF16. Up to four Cortex-A320 cores can be configured in a cluster.

Crucially, as part of the new platform, the new CPU core will get the ability to directly drive the Ethos-U85 NPU, a capability that was previously reserved for Cortex-M cores. The NPU, which supports common transformer operations, can now access a bigger memory space via the A320, which is necessary for large model inference.

The Arm Cortex-A320The Arm Cortex-A320 will allow the Ethos-U85 to access a bigger memory address space than the Cortex-M85, crucial for running large language models (Source: Arm)

“Systems with better memory access performance are becoming necessary to perform more complex use cases,” Williamson said. “Cortex-A processors address this challenge as they’ve got intrinsic support for larger addressable memory than Cortex-M-based platforms, and they are more flexible at handling multiple tiers of memory access latency.”

Used together, Arm expects around an 8× performance uplift for the Cortex-A320 and Ethos-U85, versus the Cortex-M85 driving the NPU.

The Cortex-A320 can also take advantage of Arm v9’s security features. Pointer authentication and branch target identification mitigate against jump- and return-oriented programming attacks. Arm’s memory tagging extension also makes it harder for hackers to exploit memory safety issues, Williamson added.

Software

As a Cortex-A CPU, the A320 can take advantage of Arm’s AI kernel libraries for Cortex-A, collectively called Kleidi AI.

There are many use cases where it might be efficient to run AI workloads on the CPU even if the system has an NPU, Williamson said. His example was a camera system that uses the NPU for always-on image processing, but then takes images that are flagged as interesting and processes them on the CPU with a small LLM.

“[In that case], it might be more efficient to just run it straight on the CPU where you don’t have the overhead of offloading to a neural processor and changing the context,” he said.

For these situations, the A320 requires optimized AI performance. KleidiAI was introduced last year for Cortex-As in the client computing space, but the A320 will bring it to the IoT.

One of the key barriers to edge AI adoption is software development and deployment complexity. Arm has ensured software compatibility across Cortex-A cores so that existing code can be used on the A320.

It is compatible with Linux and Android out of the box, but also supports common real-time operating systems, so code developed for an MCU flow can be migrated to a system with a bigger memory address space, if required. In this way, the A320 offers a path to future-proofing today’s Cortex-M-based AI workloads.

“This gives [developers] access to AI models that perhaps would have been out of reach for a real time system in the past,” Williamson said. “I think you’ll see some interesting completely new configurations, people stretching the boundary of what would have previously been done in a microcontroller, but also giving Linux-based developers optimized performance.”

Cortex-A320-based products are already under development with customers, and Williamson expects to see the core in silicon next year.

Sign up to our newsletter

Receive our latest updates about our products & promotions

Top