The “autocomplete” model component is used to power code completion suggestions and is typically a 1-15B parameter model. The models are run on your laptop or on a server and have generally been trained with special templates like fill-in-the-middle (FIM) for code infilling. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for code completion include Codex, CodeGemma, Code Llama, DeepSeek Coder Base, StarCoder 2, etc.