GroqCard™ Accelerator

Written by:
Groq

For plug and play low latency, scalable performance, GroqCard accelerator packages a single GroqChip™ processor into a standard PCIe Gen4 x16 form factor providing hassle-free server integration. Featuring up to 11 RealScale™ chip-to-chip connections alongside an internal software-defined network, GroqCard enables near-linear multi-server and multi-rack scalability without the need for external switches.

Key Features

Fully deterministic processor provides predictable and repeatable performance with no run-to-run variation. 

230 MB of on-die memory delivers large globally sharable SRAM for high-bandwidth, low-latency access to model parameters without the need for external memory.

Up to 80 TBs on-die memory bandwidth facilitates massive concurrency and data parallelism needed for bandwidth sensitive applications.

Up to 11 RealScale™ chip-to-chip connectors enable near-linear multi-server and multi-rack scalability without the need for external switches.

End-to-end on-chip protection improves uptime and reliability with error-correction code (ECC) protection throughout the entire GroqChip data path.

PCIe Gen4 x16 interface delivers up to 31.5GBs of bi-directional bandwidth in an industry standard interface for fast device and network connections – all with a lightweight open source driver and no CPU burden.

Specifications

Available through our partner, Bittware

Dual width, full height, ¾ length PCI Express Gen4 x16 adapter

Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)

230 MB SRAM per chip

Up to 80 TB/s on-die memory bandwidth

Up to 11 RealScale™ chip-to-chip connectors

  • MXM: INT8, FP16
  • VXM: INT8, INT16, INT32, FP16, FP32
  • Max: 375W
  • TDP: 275 
  • Typical: 240W

Never miss a Groq update! Sign up below for our latest news.

The latest Groq news. Delivered to your inbox.