HuggingFace TGI vs OpenAI API Endpoint Costs

oppman · January 14, 2025, 5:24pm

An intern deployed our Discourse site on DigitalOcean with OpenAI using an API endpoint connected to the Discourse AI plugin. The site is working great. The intern suggested that they look at HuggingFace TGI. I’m trying to provide guidance to the intern to see if they are on the correct track with regards to HuggingFace. I believe that they are suggesting self-hosted HuggingFace TGI to reduce costs. However, when I look at the GPU costs of hosting, it seems expensive.

I could ask the intern to propose specific services and costs, but I’m trying to help with strategic guidance. The alternative is for the intern to continue to test OpenAI, Anthropic, Gemini.

Is there any advice on what I should assign the intern?

The basic idea is to implement Discourse AI on a production deployment of Discourse and then ask the customer (the one funding the community) to pay some additional service fee to maintain the AI and promote the new features.

As far as intern assignments, I could also assign them to look at the Hugging Face Inference API. Is it cheaper than using the OpenAI API?

Is anyone using specific services from Google Cloud, AWS, Azure to host the TGI?

For example, for AWS, should they look at g4dn.xlarge or g5.xlarge

For GCP, is T4 GPUs, the recommended path?

Any advice on how they would calculate costs?

Falco · January 15, 2025, 3:24pm

For a single instance, it will be hard to beat API pricing, as with API pricing you pay per call, whereas when running TGI, you pay per hour the server is running.

Let’s say you are running Llama 3.1 8B in a g6.xlarge; that will cost you approximately $600 a month. This could give you around 450M tokens in Anthropic Claude 3.5 Haiku.

Running your own LLM makes sense when you need either privacy or scale.

oppman · January 15, 2025, 4:29pm

Thank you for your response. $600/month for Llama 3.1 8B in g6.xlarge would be reasonable cost, but as you graciously pointed out, the API cost would be cheaper. Thus, we’ll likely go with the OpenAI and other API costs. What are the privacy concerns?

For the purpose of experimentation with HuggingFace TGI, is there anything cheaper than $600/month that we could use for testing? For example, can the intern turn off the GPU instance when they are not working? I’m trying to figure out what to recommend to them. I am somewhat confused as to the costs for the GPU-enabled containers and I don’t want to put the burden of the cost recommendation on the intern. If they make a mistake with the purchase of a container, they may feel bad.

What I’d like to do is buy them the resources, then instruct them to test out HuggingFace TGI in the resource that I purchased for them. They can then report back on any performance or result optimization differences.

Topic		Replies	Views
What Discourse AI features are FREE to use? Support ai	14	286	September 29, 2024
OpenAI releases GPT-4o for free to all users, including devs? Support ai	3	366	May 16, 2024
Discourse AI - Self-Hosted Guide Self-Hosting ai	61	11403	April 30, 2025
How much do you spend on OpenAI integration? General	8	935	January 15, 2024
How to add a new Chat Bot connected to a self-hosted LLM? Dev ai , ai-bot	11	802	August 12, 2024

HuggingFace TGI vs OpenAI API Endpoint Costs

Related topics