I’m using vLLM too. I also would recommend the openchat v3.5 0106 model, which is a 7B parameter model which performs very well.
I actually am running it in 4bit quantized so that it runs faster.
I’m using vLLM too. I also would recommend the openchat v3.5 0106 model, which is a 7B parameter model which performs very well.
I actually am running it in 4bit quantized so that it runs faster.