by  Pavlo Hilei

AI at the Edge: Accelerate Inference 20.2x With CFU

clock-icon-white  8 min read

Real-time AI applications are a trend and a necessity, especially in edge computing environments.

Recognizing this necessity, SoftServe spearheaded in-depth research presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024. There, a presentation by SoftServe R&D Engineer Pavlo Hilei focused on the innovative integration of custom function units (CFUs) with RISC-V processors.

A breakthrough, this shapes up as a cornerstone to enhance AI inference capabilities directly at the edge, where it matters most. Why? Speed of inference is crucial and enables various AI use cases in production.

Based on SoftServe’s research, you will understand how software optimization may speed up by as much as 10 times or more. Meanwhile, the CFU delivers an added 20.2 times in acceleration.

Empower Edge Computing With Advanced AI Tools

Empower Edge Computing With Advanced AI Tools

Edge computing is increasingly preferred over traditional cloud environments because of its ability to drive immediate data processing results at or near the source of data acquisition. This shift is critical in scenarios where real-time processing and data privacy are preeminent, such as in autonomous vehicle navigation and personalized healthcare monitoring systems.

Expected directions in AI applications:

Edge versus cloud

Edge versus cloud
By processing data locally, edge computing dramatically cuts down on latency and bandwidth usage, which is crucial for seamless real-time applications.

Hardware accelerators

Hardware accelerators
Traditional accelerators like GPUs and TPUs have set the stage. But edge computing demands now are more specialized solutions.

Role of RISC-V and CFU

Role of RISC-V and CFU
Using the open standard of RISC-V along with CFUs allows for the creation of custom processing tasks. This boosts computational efficiency and performance.

Rigorous Experimental Framework

Rigorous Experimental Framework

To substantiate claims and evaluate the practicality of these technological solutions, a detailed experimental framework was used. It incorporated both innovative software tools and powerful hardware platforms.

As such, consider that a CFU is a RISC-V ISA extension. This allows CPU developers to add custom RISC-V instructions that will offload some computation to the hardware. Effectively, it empowers developers to rewrite the slowest parts of code completely in the hardware, which prompts a huge performance gain.

Experimental setup:

Software ecosystem

Software ecosystem
Included software tools like TensorFlow Lite Micro for machine learning (ML) and RISC-V GCC Compiler for software development.

Hardware infrastructure

Hardware infrastructure
Featured the Xilinx Arty A7-100T FPGA board to experiment with VexRiscV soft-core implemented in LiteX SoC. VexRiscV supported CFU extension that enabled hardware acceleration.

Evaluation of AI Models for Edge Deployment

Evaluation of AI Models for Edge Deployment

SoftServe’s research focalized on automatic modulation recognition (AMR), which is vital for optimizing communication systems.

To better understand this focus, know that AMR is the classification of modulation specifically used in wireless communication. Signal modulation is a process of modifying a signal — usually wireless — that enables it to carry information from one point to another.

Applications of this solution include cognitive radio, wireless network planning, and electronic warfare. An application in electronic warfare requires near real-time performance. This is vital because AMR is one of the many important challenges in the modern drone epoch used in military warfare.

SoftServe evaluated the architectures of two deep-learning models.

CNN model

CNN model
Fine-tuned to achieve an optimal balance between speed and accuracy. This makes it suitable for immediate deployment.

Transformer model

Transformer model
Although it proved superior accuracy, its slower processing time rendered it less ideal for edge applications. However, on average, it gives 2% higher accuracy, but 10 times slower inference time, which highlights a trade-off in performance versus accuracy.

Deep-dive analysis of model performance:

Performance analysis

Performance analysis
The CNN model proved a powerful performance across various noise levels. This shows reliability in dynamic, real-world conditions.

Efficiency considerations

Efficiency considerations
The slower processing time of the transformer model called out the need for models that deliver both high accuracy and efficiency. You will understand more about this by referencing SoftServe’s presentation — Deep Learning AMR Model Inference Acceleration with CFU for Edge Systems — at ICASSP 2024.

Innovative Results in AI Acceleration

Innovative Results in AI Acceleration

The acceleration tests gave concrete evidence about how CFUs will dramatically reduce AI inference times and transform theoretical advancements into tangible outcomes.

Acceleration highlights:

Software optimization techniques

Software optimization techniques
Through strategic quantization, which is also applicable for large language models (LLMs), and software model-specific optimizations, the inference time of models was reduced by more than 10 times.

Hardware acceleration

Hardware acceleration
CFUs significantly decreased the computational load, which slashed inference times from several seconds to milliseconds. The CFU gave an additional 20.2 times in acceleration, and, overall, for both software and hardware combined, 208 times in acceleration.

Chart Your Course for Next-Gen AI Solutions

Chart Your Course for Next-Gen AI Solutions

The exploration of CFU-enhanced RISC-V processors validates the immense potential of this technology in edge AI applications and sets the stage for future innovations. With improvements in the CFU extension that allow the CFU to access CPU memory, innovations are boundless.

Regardless of how plentiful those future innovations may be, they will redefine how people interact with technology throughout their day-to-day lives.

Future innovations:

Broaden applications

Broaden applications
Further research will aim to generalize these accelerators to support a diverse array of AI models.

Enhance efficiency

Enhance efficiency
Ongoing efforts will focus on the optimization of design and functionality to achieve greater performance gains.

Here are more real-world examples of efficient AI implementations.

Conclusion

As SoftServe leads the charge to transform innovative ideas into real-world solutions that enhance and simplify the daily human experience, this new research shows how nimbly we keep up with technological advancements.

Categorically, the innovative integration of CFUs with RISC-V processors shows how the right optimization techniques for software and the utmost power of hardware significantly impact AI model inference. It allows you to run more advanced models on the edge while reducing costs and implementing new business cases.