HuggingFace TGI vs OpenAI API Endpoint Costs

For a single instance, it will be hard to beat API pricing, as with API pricing you pay per call, whereas when running TGI, you pay per hour the server is running.

Let’s say you are running Llama 3.1 8B in a g6.xlarge; that will cost you approximately $600 a month. This could give you around 450M tokens in Anthropic Claude 3.5 Haiku.

Running your own LLM makes sense when you need either privacy or scale.

4 Likes