Self-Hosting Embeddings for DiscourseAI

To save space, is it possible to use quantized embeddings? I’d like to use binary quantized embeddings to really cut down the storage size. Having done some tests, I get >90% performance with 32x less storage!

1 Like