This is a quick and dirty guide to setting up a local LLM. You’ll see how to run the Qwen2.5-Code-3B-Instruct model on your local machine using vllm. You’ll then setup the CodeCompanion plugin in NeoVim for interacting with the model directly from your editor.

vllm Installation and Server Setup

Step one is to install the vllm CLI utility:

python -m venv local-llm
source local-llm/bin/activate
pip install vllm

The vllm tool will download and standup a local server for the model. Take note of what hardware you have available (RAM, CPU, GPU/VRAM) and then browse models at hugginface.co. This example kicks off a Qwen2.5-Code-3B-Instruct model server:

vllm serve --model Qwen2.5-Code-3B-Instruct

The command will take a few minutes to run as it downloads the model and sets up the server. A successful run will look like this:

Starting vLLM API server 0 on http://0.0.0.0:8000
...
INFO: Started server process [5853]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 172.22.117.26:436999 - "GET /v1/models HTTP/1.1" 200 OK

The server is now ready to accept requests.

NeoVim Setup with CodeCompanion

CodeCompanion is one of many plugins meant to assists in integrating LLMs with NeoVim. CodeCompanion groups LLM configuration via adapters. The plugin includes many default adapters for popular LLMs. However, it doesn’t have an adapter for the Qwen model.

Below is a Lazy plugin configuration for CodeCompanion that adds an adapter for the Qwen model.

{
    "olimorris/code-companion.nvim",
    lazy = false,
    depedendencies = {
        "nvim-lua/plenary.nvim",
        "nvim-treesitter/nvim-treesitter",
    },
    config = {
        adapters = {
            qwen = function()
                return require("codecompanion.adapters").extend("openai_compatible", {
                    env = {
                        url = "http://localhost:8000",
                        chat_url = "/v1/chat/completions",
                        models_endpoint = "/v1/models",
                    },
                })
            end,
        },
        strategies = {
            chat = {
                adapter = "qwen",
            },
            inline = {
                adapter = "qwen",
            },
            cmd = {
                adapter = "qwen",
            },
        },
    }
}

Since the Qwen model is OpenAI compatible, you can inherit from the openai_compatible adapter and extend it with the necessary configuration.

The plugin requires the markdown and markdown_inline Tree-sitter parsers. Install them both with :TSInstall markdown markdown_inline. You can also improve your experience by installing a number of additional plugins. See Additional Plugins for more information.

Further Reading

For CodeCompanion usage instructions, refer to the user guide. In general, CodeCompanion provides a chat interface for querying the LLM. There’s also support for inline prompting (that is, you highlight a section of code and ask the LLM for help).

For more information on vLLM (the library not the CLI utility), checkout this RedHat article.