Thanks. Do I understand it correctly that backfill is when the vectorization happens? When switching between models, do the vectors need to be recalculated (Are they “proprietary”)? I assume yes.
It’d be useful to know how the costs of using the OpenAI API stack up against investing in a GPU-powered server with opensource solution. Is there a formula or any way to estimate the number of tokens used? We’re only using the API to vectorize posts, not for calculating vector distances, right? So, the number of tokens used depends on how much content we have, correct?
I assume that for both related topics and AI-powered search, all posts need to be vectorized only once, so I can calculate the total number of words in posts table and derive the number of tokens needed. The same process would apply to the daily addition of posts. I’m neglecting the search phrases for now.