Cloud-Native Gets An Edge with COM-HPC Server Modules and Manycore Arm SoCs

PICMG executive member ADLINK Technology is redefining the performance per watt equation in computer-on-module technology with COM-HPC Server Type modules based on Ampere Computing’s “cloud-native” manycore Arm processors. ADLINK Technology’s business development manager for embedded modules, Richard Pinnow, and Joe Speed, Head of Edge at Ampere Computing, explain.

PICMG: What is a “cloud-native processor” and why does industry need it?

SPEED, AMPERE: Ampere’s founders are the people who created processors for the cloud business at Intel. When you look at cloud, the basic exercise is I take each core of a CPU and sell it to a different customer. This is a gross oversimplification, but you get the idea.

To do this you need privacy – you need to know that one customer can’t see into another customer’s business. You need what’s called “freedom from interference,” so customers can run their own workloads with predictably fast response times regardless of what activity others are doing on the same processor.

Everything is virtualized, so each core is virtualized running in its own OS instance or, if they bought a bunch of cores, their application is containerized running different parts of their application on different cores. You need to be able to do this in an energy-efficient way because it’s not just about having more cores-per-socket, it’s about having more cores-per-rack-per-data center using less power.

I started working with Ampere years ago when I was at ADLINK and for me it was all about robotics, industrial, software-defined vehicles, and autonomous driving. So, I can get a predictably good response time, right? For me that’s like deterministic, real-time, low jitter, right? Freedom from interference? I can have my autonomous driving perception pipeline feeding sensor fusion data to localization, path planning, and control algorithms and I can pin all of those to separate cores and even run different kinds of software on different bundles of cores and have this very deterministic real time behavior.

So for me taking cloud-native to the edge just belongs.

PICMG: What’s driving demand for cloud-native technology?

PINNOW, ADLINK: It’s definitely scalability, cost efficiency, and time-to-market or agility.

Customers often require, from the application point of view, the ability to scale, and these services need to be very adaptive. Cloud-native processing allows this scalability, especially when we’re talking about the emphasis of microservices on containers, which allows for very dynamic scaling no matter the underlying hardware. And that’s not just focused on the traditional server market – it applies to the edge devices as well.

In terms of cost efficiency, cloud-native technologies enable customers to pay only when resources are allocated, which can lead to significant cost savings. The flexibility to allocate and deallocate resources on demand. And the third pillar is time-to-market, so it’s very important to rapidly develop PoCs and so on, and cloud-native processing really makes the best use of practices like continuous integration and continuous deployment concepts and pipelines. This is getting more and more important for edge devices as well, as it enables faster development cycles and quicker time to market. So the embedded market is adapting the concepts driven by cloud-native processing.

SPEED, AMPERE: There are stats that say 75% of all data is created at the edge, so you have to move the compute to where the data is. If 75% of data is being generated at the edge, you can sure eat a lot of cores pretty quickly, right? And the thing that was really eye opening for a lot of the cloud providers was computer vision because that one is especially greedy in terms of compute resources and communication bandwidth. Backhauling video to the cloud for processing is an expensive proposition. And for a lot of these use cases, by the time it hits the cloud, runs your analysis, and the decision comes back, it’s too late.

And a lot of these use cases talk about “connected” devices, but as you know, we should think of these as “usually connected” or “occasionally connected” devices. If it’s doing a safety function then it has to always work whether the Internet connection is working or not.

We’re Arm-based and we believe in the Arm architecture. A lot of the success of Arm comes from mobile devices that are power constrained by definition. The performance of most Arm products kind of falls off where we begin, and then you get into overlapping with x86, which is hot and hungry compared to what we do on embedded and the edge. With the Ampere Altra we’ve done some recent benchmarks running Yolo V8 AI streams on our chip and it does 4x as many frames per second per watt compared to a top-of-the-line Xeon D processor, so you can support four times as many cameras on the same power budget.

It’s an architecture that’s extremely energy-efficient and then you can scale up to where you’ve got 32, 64, 96, 128, or even 192 cores with our newest product, with each core being very energy efficient and having freedom from interference. They are single-threaded cores and they’re running at a fixed frequency so there’s none of this frequency scaling. That’s the deterministic part so that you get this predictable, low-latency performance. And then as you start loading more cores, everything runs at a fixed, predictable speed.

PICMG: What embedded edge application would make use of 128 cores?

PINNOW, ADLINK: Embedded applications that require 128 cores would typically involve tasks that demand extremely high levels of parallel compute. Some examples are edge devices such as advanced robotics, autonomous vehicles, and industrial automation systems that really benefit from, for instance, real-time computer vision, natural language processing, machine learning, and so on.

To analyze and react to very complex environments rapidly and do all those tasks in parallel

For instance, an autonomous drone might need to process multiple video streams to perform object detection on incoming camera data, simultaneously manage flight controls, and make navigation decisions, all in real time. 128 cores enables you to assign cores to very specific tasks and do all this in parallel without the “bad neighbor effect” where one core or schedule is impacting the calculation of another task or application. A lot of people are using discrete GPU solutions on the market because it allows you to do a lot of tasks in parallel. Here, you basically have the same with 128 cores of unparalleled resolution, and you can select which core is doing what, when. It’s great.

SPEED, AMPERE: When I was the Field CTO at ADLINK we helped launch this thing called “Project Cassini.” Project Cassini is about cloud server virtualization for embedded systems. My friend Garish Shirasat, who was leading the software-defined vehicle efforts over at Arm, had an idea, “What if we put that in a car? It would be Cassini on wheels.” So we used Ampere’s processor to build the developer platform for this software-defined vehicle platform.

Autonomous driving is an obvious workload for that, but what’s happening is all these automakers are working on future silicon they’ll get in a few years but that doesn’t help them develop today. So we took all this work around Project Cassini and worked out how to make it functionally safe and how to put it into cars. What happened is we developed what’s now the Ampere Altra Developer Platform – a 32- to 128-core Arm workstation for developers of Scalable Open Architecture for Embedded Edge (SOAFEE) – which is a big software-defined vehicle program for automakers and automotive tech companies. The Ampere Altra Developer Platform by ADLINK is the reference platform for that.

Figure 1. ADLINK performed eight weeks of testing on the Ampere Altra COM-HPC developer kit, including thermal shock and vibration to MIL-SPEC and validation out to +85 ºC and -45 ºC using a fanless heatsink.

Alternatively, take application code that’s been written for Raspberry Pi and things of that ilk. Jeff Geerling and Patrick Kennedy of Serve the Home did a benchmarking of this telco AI edge server with an Ampere processor from our friends at Super Micro and benchmarked it against a Raspberry Pi cluster. One of our chips was equal to 100 Raspberry Pi 4s in performance, but the interesting thing is this system with redundant 800 W power supplies was still 22% more energy efficient than the Raspberry Pi 4. We have companies working with us where they need to move so many GB per W at 1 W per core. It’s kind of a brilliant fit for those things.

PICMG: The Ampere Altra is a COM-HPC Server Type module and currently the only ADLINK product that supports a “cloud-native” processor, correct? Why that particular PICMG specification and form factor?

PINNOW, ADLINK: Yes, at the moment, the Ampere offering is solely available on COM-HPC and it’s a perfect match for edge computing systems that prioritize energy efficiency and require a lot of scalability. It is outperforming other platforms easily by three times less energy consumption. And from the I/O point of view, the CPU is providing a lot of PCI Express interfaces, way higher memory capacities, and higher bandwidth interfaces so that there are no constraints to get all those signals down to the carrier without a significant loss in signal integrity.

But there’s still a lot of flexibility to interchange from one Ampere Altra COM-HPC module to another or even to an x86 in case it is needed. And not just the hardware is standardized, but firmware is as well. I’m talking, for instance, about the IPMI interface you use to remotely manage COM-HPC devices, regardless if you’re talking to Arm silicon or x86 silicon. The demand driving this market is that application software is getting more and more independent from the underlying architecture.

COM-HPC Ampere Altra | COM-HPC Server Type | COM | ADLINK

Figure 2. The ADLINK Technology, Inc. Ampere Altra COM-HPC Server Type module equips 64 PCIe Gen 4 lanes and six DDR4 channels.

PICMG: COM-HPC was explicitly intended not to not be x86-centric. But we haven’t seen many non-x86-based COM-HPC modules make it to market yet. Why do you think that is?

PINNOW, ADLINK: x86 solutions are and will be complemented by more and more Arm offerings. This is market driven. We cannot avoid this. And the Ampere Altra is a good example.

Our entry-level and super-high-end COMs are Arm based already, Today, right? Now we’re seeing this Ampere Altra being the most powerful COM solution we have at the moment. A very good question is how fast it will cover the traditionally x86-dominated mid-performance market. It depends how fast and reliably a customer can port their existing code to execute regardless of the underlying hardware, which is a key trend. But containers hypervisors and flexible application frameworks are enablers to support this journey. And we see this happening already today. So I think it’s just a matter of time until we see more and more Arm flavors in the mid performance as well.

For more information visit www.ipi.wiki/products/com-hpc-ampere-altra to find some of ADLINK’s Ampere-based products as well as carrier reference designs, schematics, Ethernet OCP cards and design files, and even bill of materials.

Richard Pinnow is business development manager for embedded modules at ADLINK.

Joe Speed is head of Edge at Ampere.

ADLINK Technology, Inc.
www.adlinktech.com/en

Ampere Computing
https://amperecomputing.com