Enterprise Access

GroqCloud™ provides the world's fastest AI inference for LLMs at scale

Enterprise API Solutions 

Do you need the fastest inference at data center scale? 
We have multiple tiered solutions to suit your commercial projects via our GroqCloud API.

If you need any of the following listed below we should chat.
Do you need: 

Let’s talk to ensure we can provide the right solution for your needs. Please fill out our short form to the right and a Groqster will reach out shortly to discuss your project needs.

Public Sector 
If you are in the Public Sector, we’d love to connect. Please follow the link below to fill out the form and we’ll set up a time to talk shortly. 

Developers, Startups, SMBs  – No Need to Use the Form

Developer access can be obtained completely self-serve through GroqCloud. There you can obtain your API key and access our documentation as well as our terms and conditions on our developer Console. You can chat directly with the Groq team about your needs, feature requests, and more by joining our Discord community.

If you are currently using the OpenAI API, you just need three things to convert over to Groq:

  1. Groq API key
  2. Endpoint
  3. Model
 
For Support you can use our chat feature found on both the Groq main pages and under the Questions? link on the Developer Console.

Price

Groq powers leading open-source AI models.

Other models, such as Mistral and CodeLlama, are available for specific customer requests. Send us your inquiries here

Model Current Speed Price
Llama 3 70B (8K Context Length) ~330 tokens/s $0.59/$0.79
(per 1M Tokens, input/output)
Mixtral 8x7B SMoE (32K Context Length) ~575 tokens/s $0.24/$0.24
(per 1M Tokens, input/output)
Llama 3 8B (8K Context Length) ~1250 tokens/s $0.05/$0.08
(per 1M Tokens, input/output)
Gemma 7B (8K Context Length) ~950 tokens/s $0.07/$0.07
(per 1M Tokens, input/output)
Whisper Large V3 ~210x speed factor $0.03 /hour transcribed
Gemma 2 9B (8K Context Length) ~500 tokens/s $0.20/$0.20
(per 1M Tokens, input/output)

Fastest Inference, Period.

Groq has demonstrated 15x faster LLM inference performance on an ArtificialAnalysis.ai leaderboard compared to the top cloud-based providers.
In this public benchmark, Mistral.ai’s Mixtral 8x7B Instruct running on the Groq LPU™ Inference Engine outperformed all other cloud-based inference providers at up to 15x faster output tokens throughput.