Untether AI introduces speedAI architecture

At this week’s HotChips 2022, Untether AI has announced its second generation at-memory computation architecture for AI workloads. speedAI architecture delivers 2Petaflops of performance at 30TFLOPs per W.

It is designed to meet the neural network demands to use AI in a variety of markets, from financial technology, smart city and retail, natural language processing, autonomous vehicles, and scientific applications. These demanding applications require increasing levels of accuracy to ensure safety and quality of results, said the company.

Untether AI’s second generation speedAI architecture enhances the energy efficiency, throughput, accuracy, and scalability which is claimed to be unmatched by any other inference offering available today.

At-memory compute is significantly more energy efficient than traditional von Neumann architectures, said the company, with more TFlops performed for a given power envelope.

The speedAI architecture dramatically improves upon the first generation (runAI) by delivering 30TFLOPs per watt. This energy efficiency is a product of the second generation atmemory compute architecture, over 1,400 optimised RISC-V processors with custom instructions, energy efficient dataflow, and the adoption of a new FP8 datatype, which quadruples efficiency compared to runAI. 

The first member of the family, the speedAI240 device provides 2PetaFlops of FP8 performance and 1 PetaFlop of BF16 performance. This translates into industry leading performance and efficiency on neural networks like BERT-base, which speedAI240 can run at over 750 queries per second per watt, 15x greater than the current state of the art from leading GPUs, said Untether AI.

Each memory bank of the speedAI architecture has 512 processing elements with direct attachment to dedicated SRAM. These processing elements support INT4, FP8, INT8, and BF16 datatypes, along with zero-detect circuitry for energy conservation and support for 2:1 structured sparsity. Arranged in eight rows of 64 processing elements, each row has its own dedicated row controller and hardwired reduce functionality to allow flexibility in programing and efficient computation of transformer network functions such as Softmax and LayerNorm. The rows are managed by two RISC-V processors with over 20 custom instructions designed for inference acceleration. The flexibility of the memory bank allows it to adapt to a variety of neural network architectures, including convolutional, transformer, and recommendation networks as well as linear algebra models.

Two FP8 formats are claimed to provided the best mix of precision, range, and efficiency. A 4-mantissa version (FP8p for “precision”) and a 3-mantissa version (FP8r for “range”) were found to provide the best accuracy and throughput for inference across a variety of different networks. For both convolutional networks like ResNet-50 and transformer networks like BERT-Base, Untether AI’s implementation of FP8 results in less than 1/10th of one per cent of accuracy loss compared to using BF16 data types, with a four fold increase in throughput and energy efficiency.

The speedAI240 device is designed to scale to large models. The memory architecture is multi-leveled, with 238Mbytes of SRAM dedicated to the processing elements offering 1 Petabyte per second memory bandwidth, four 1MB scratchpads, and two 64-bit wide ports of LPDDR5, providing up to 32Gbyte of external DRAM. Host and chip-to-chip connectivity is provided by high-speed PCIExpress Gen5 interfaces.

The imAIgine software development kit provides a path to running networks at high performance, with push-button quantisation, optimisation, physical allocation, and multi-chip partitioning. The imAIgine SDK also provides an extensive visualisation toolkit,

cycle-accurate simulator, and a runtime API and is available now.

The speedAI devices will be offered as standalone chips, m.2 and PCI-Express form factor cards. Sampling is expected to begin in the first half of 2023.

http://www.untether.ai

> Read More

Keysight moves Open RAN architect tests to the cloud

To accelerate deployment and increase flexibility, the Keysight Open Radio Access Network (Open RAN or O-RAN) Architect test suite, or KORA, is moving to cloud-based deployment. 

The KORA suite verifies the functionality and performance of an end-to-end O-RAN wireless infrastructure. A component of KORA is Keysight’s LoadCore 5G core (5GC) testing software. This is now available as a metered, pay as you go model in AWS Marketplace. This will allow users to scale costs with usage, explained the company.

AWS Marketplace is a digital catalogue that enables customers to find, buy, deploy and manage third-party software, data and services to business solutions and run businesses on Amazon Web Services (AWS).

The LoadCore software enables customers to perform network capacity tests, measure device data throughput and model a variety of end user behaviours and mobility scenarios. LoadCore is scalable and can verify that delivered connectivity service remains stable under various demanding conditions such as sudden spikes in network usage caused by disasters or other major events.

“As a cloud-based pay-as-you-go offering, users can purchase and provision KORA test solutions immediately to fit their testing needs,” said Kalyan Sundhar, vice president and general manager of 5G edge to core solutions at Keysight. “This is an exciting new delivery model that offers customers the flexibility and scalability to use Keysight’s test solutions in the environment they want, in the cloud,” he said.

The cloud-based deployment offers customers on-demand scalability to meet changing requirements with a wide range of configurations for different test environments, confirmed Keysight. There is an annual subscription option for customers with high usage of LoadCore for an extended period. LoadCore 5GC testing software with a flexible pay as you go business model will be available to enterprise customers looking to verify their private 5G networks, start ups that need flexibility with specific testing requirements, customers with shorter duration project needs, such as test houses and organisations that want to purchase and test solutions primarily on the cloud.

Keysight Technologies delivers design and validation products that help accelerate innovation to connect and secure the world with software-driven insights and analytics that bring tomorrow’s technology products to market faster across the development lifecycle, in design simulation, prototype validation, automated software testing, manufacturing analysis, and network performance optimisation and visibility in enterprise, service provider and cloud environments. Customers span the worldwide communications and industrial ecosystems, aerospace and defence, automotive, energy, semiconductor and general electronics markets. 

http://www.keysight.com

> Read More

Acromag expands OpenVPX carrier card offering with XMC host modules

VPX carriers introduced by Acromag route power and bus signals to two plug-in XMC mezzanine modules with a 16-lane Gen 3 PCIe interface. The VPX4840 and VPX4850 enable a range of FPGA, GPU, I/O and CPU combinations when interfacing XMC modules to a VPX computer system. 

The VPX4840 and VPX4850 feature two XMC slots with support for front or rear panel I/O. They are available with VITA 42, VITA 61, or VITA 88 connectors to route power and interface bus signals to the plug-in mezzanine modules. Both models support a choice of direct PCIe connection to the VPX backplane via the data or expansion plane. 

The XMC sites have a 16-lane PCIe bus Gen A3 interface enabling rapid data throughput. By inserting XMC mezzanine modules on the carrier, including XMC processor (prXMC) modules, developers can leverage hundreds of available function modules currently unavailable in a VPX platform, said Acromag. 

“With two XMC sites, system integrators can combine FPGA, GPU, I/O, avionics, communication, and even prXMC modules to create custom computing boards in a single slot” said Robert Greenfield, Acromag’s bsiness development manager.

Air-cooled models with a 0 to +55 degrees C temperature range, versions with extended temperature ranges or conduction cooling support are also available.

Designed and manufactured in the USA the carriers are suitable for high-performance aerospace, defence, scientific research and industrial systems requiring high-speed I/O. 

Founded in 1957, Acromag designs and manufactures hi-tech industrial electronics. The international corporation’s headquarters are near Detroit, Michigan and the company has a global network of sales representatives and distributors. 

Acromag offers embedded computing including general purpose I/O boards, single-board computers, FPGA modules, embedded computers, COM Express products, mezzanine modules, wiring accessories, and software. Industries served include military, aerospace, manufacturing, transportation, utilities, and scientific research laboratories.

http://www.acromag.com

> Read More

SoMs are pin-compatible with Intel Agilex I- and F- FPGA SoCs

System on modules (SoMs) that support Intel Agilex FPGA SoCs have been released by iWave. The iW-RainboW-G43 and iW-RainboW-G51 Agilex SoC based SoMs are designed for data centre, networking and edge applications which can exploit the SoCs customised acceleration and connectivity characteristics.

According to iWave, they provide an improvement in performance with 40 per cent lower power consumption while delivering two times the fabric performance per Watt. The SoCs integrate the Arm Cortex A53 Core application processor.

The SoMs are available in a 120 x 90mm form factor and are claimed to be the first SoMs to be pin compatible for the majority of the F-Series and the I-Series. 

The Agilex SoM has up to 2.7M programmable logic elements for processing large amounts of, or complex, data algorithms. The SoM supports primary interface and components, such as Gigabit Ethernet, USB2.0 port, JTAG, UART, onboard DDR4 and eMMC flash for storage, high-speed transceivers.

Complementing the Agilex SoC on-chip resources, the SoM provides up to 64 FGT transceiver channels (up to 32G NRZ / 58G PAM4), up to eight FHT transceiver channels (up to 58G NRZ / 116G PAM4), on SoM PTP and SyncE network synchronisers, SmartVID to adjust voltage as per the temperature and performance requirements and up to 138 LVDS/276 SE I/Os.

To enable quick prototyping and speed up development, iWave supports customers by providing reference designs in the form of development kits and software packages. 

http://www.iwavesystems.com 

> Read More

About Smart Cities

This news story is brought to you by smartcitieselectronics.com, the specialist site dedicated to delivering information about what’s new in the Smart City Electronics industry, with daily news updates, new products and industry news. To stay up-to-date, register to receive our weekly newsletters and keep yourself informed on the latest technology news and new products from around the globe. Simply click this link to register here: Smart Cities Registration