API Access

GroqCloud™ has multiple levels of API access.


Developer access can be obtained completely self-serve through Playground on GroqCloud. There you can obtain your API key and access our documentation as well as our terms and conditions on Playground. Join our Discord community here

If you are currently using OpenAI API, you just need three things to convert over to Groq:

  1. Groq API key
  2. Endpoint 
  3. Model 
Enterprise Solutions

Do you need the fastest inference at data center scale? We should chat if you need:

Let’s talk to ensure we can provide the right solution for your needs. Please fill out this form and tell us a little about your project. After submitting, a Groqster will be in touch with you shortly.


Groq guarantees to beat any published price per million tokens by published providers of the equivalent listed models.
Other models, such as Mistral and CodeLlama, are available for specific customer requests. Send us your inquiries here
Model Current Speed Price per 1M Tokens (Input/Output)
Llama 3 70B (8K Context Length) ~280 tokens/s $0.59/$0.79
Mixtral 8x7B SMoE (32K Context Length) ~480 tokens/s $0.27/$0.27
Llama 3 8B (8K Context Length) ~870 tokens/s $0.05/$0.10
Gemma 7B (8K Context Length) ~820 tokens/s $0.10/$0.10
Llama 2 70B (4K Context Length) ~300 tokens/s $0.64/$0.80
Llama 2 7B (2K Context Length) ~750 tokens/s $0.10/$0.10

Fastest Inference, Period.

Groq has demonstrated 15x faster LLM inference performance on an ArtificialAnalysis.ai leaderboard compared to the top cloud-based providers.
In this public benchmark, Mistral.ai’s Mixtral 8x7B Instruct running on the Groq LPU™ Inference Engine outperformed all other cloud-based inference providers at up to 15x faster output tokens throughput.