It would be really nice, if you could add the models from the latest Qwen3.5 model family, like Qwen3.5-35B-A3B. They are rather small and MoE-based, which guarrantees low latency, but they are told to perform better than gpt-oss-120b and even larger models. They have built-in reasoning, which however can be disabled. So these models could outperform and potentially even replace the current gpt-oss-120b and gpt-oss-20b models.
Also, desperately waiting for Qwen-Embedding-8B model to be deployed, since as far as I know, there are no decent production-grade low-latency deployments of this or similar models from other cloud providers. And it is crucial to have such a model for low-latency RAG systems such as voice assistants and others.