HuggingFace TGI vs OpenAI API Endpoint Costs

Falco · January 15, 2025, 3:24pm

For a single instance, it will be hard to beat API pricing, as with API pricing you pay per call, whereas when running TGI, you pay per hour the server is running.

Let’s say you are running Llama 3.1 8B in a g6.xlarge; that will cost you approximately $600 a month. This could give you around 450M tokens in Anthropic Claude 3.5 Haiku.

Running your own LLM makes sense when you need either privacy or scale.

Topic		Replies	Views
What Discourse AI features are FREE to use? Support ai	14	319	September 29, 2024
OpenAI releases GPT-4o for free to all users, including devs? Support ai	3	371	May 16, 2024
Discourse AI - Self-Hosted Guide Self-Hosting ai	61	12019	April 30, 2025
How much do you spend on OpenAI integration? General	8	954	January 15, 2024
How to add a new Chat Bot connected to a self-hosted LLM? Dev ai-bot , ai	11	836	August 12, 2024

HuggingFace TGI vs OpenAI API Endpoint Costs

Related topics