Groq® Opens API Access to Real-time Inference, the Magic Behind Instant Responses from Generative AI Products

Written by:

Customer and Partner aiXplain Implements Game-changing Groq Technology to Bring the World’s Fastest AI Language Processing for Consumer Electronics to Market

LAS VEGAS, CES® 2024, January 9, 2024 – Groq® demos “wow” at CES! The need for speed is paramount in consumer generative AI applications and only the Groq LPU™ Inference Engine generates 300 tokens per second per user on open-source large language models (LLMs), like Llama 2 70B from Meta AI. At that speed, the same number of words as Shakespeare’s Hamlet can be produced in under seven minutes, which is 75X faster than the average human can type.

With demand surging for real-time inference, the compute process of running data through a trained AI model to provide instant results from AI applications for fluid end-user experiences, Groq has raced to bring its technology to market. 

  • Early access to the Groq API is available starting January 15, 2024, enabling approved users to experiment with Llama 2 70B, Mistral, and Falcon running on the Groq LPU Inference Engine.
  • The Groq LPU Inference Engine is already being used by leading chat agents, robotics, FinTech, and national labs for research and enterprise applications.
  • Groq partner and customer, aiXplain, uses the API in a multi-faceted program to take advantage of real-time inference across its portfolio of innovative products and services.
  • As of December 21, 2023, the general public can try it themselves via GroqChat, an alpha release of Meta AI’s foundational LLM running on the Groq LPU Inference Engine.

“Inference is the next big thing in AI,” said aiXplain CEO and Founder Hassan Sawaf. “We were searching for the right solution to bring several production-ready AI ideas to life, but the real-time inference demands of these products and services made that seem like an impossible task. Until we found Groq. Only the Groq LPU Inference Engine delivers the low latency speed necessary to sustain user engagement beyond novelty and make these products successful for the long-term.”

Groq and aiXplain will co-host a cocktail party on January 9 where Groq Founder and CEO Jonathan Ross will demonstrate how real-time inference is changing the trajectory for consumer electronics. Space is limited and registration is required. Please email if you would like to attend.

“What aiXplain is doing is nothing short of creating magic for their customers,” said Ross. “At Groq, we aim to create a sense of awe by accelerating generative AI applications to the point that they become immersive experiences. Thanks to the partnership between aiXplain and Groq, truly interactive engagement with AI is here, today.”

Groq API access will be generally available in Q2 2024. 

About Groq

Groq® is a generative AI solutions company and the creator of the LPU Inference Engine, the fastest language processing accelerator on the market. It is architected from the ground up to achieve low latency, energy-efficient, and repeatable inference performance at scale. Customers rely on the Groq LPU Inference Engine as an end-to-end solution for running Large Language Models (LLMs) and other generative AI applications at 10x the speed. Jonathan Ross, inventor of the Google Tensor Processing Unit (TPU), founded Groq to enable an AI economy powered by human agency.

About aiXplain

Founded in 2020, aiXplain is the industry’s first end-to-end integrated platform for quick development and enterprise-grade deployment of AI projects and solutions. aiXplain’s no-code/low-code integrated development environment (IDE) enables users to develop, manage, benchmark, experiment, and deploy AI assets quickly and efficiently. Users can design their own AI pipeline and benchmark their model against other models, either using their own datasets or benefiting from available datasets—all to easily create and maintain AI systems.

In October 2023, aiXplain released private access to waitlisted users to its flagship product, Bel Esprit. Bel Esprit is a Generative AI chat agent that aims to simplify AI solution creation by interpreting instructions and reacting back in natural language. It selects AI models from aiXplain’s marketplace, integrating them into deployable solutions in real time. Bel Esprit also offers user-friendly explanations and connects users to AI specialists when needed for further customization.

aiXplain founder and CEO Hassan Sawaf founded and led several machine learning organizations in small and large technology companies that are today leaders in their respective market segments, teams he started and managed are in Meta, AWS, eBay, and Leidos.

Groq Media Contact

Reach out to our PR team.