Skip to content

vLLM Client (Server Mode)

Create a model

from_vllm expects an OpenAI-compatible client (pointing to your vLLM server).

from openai import OpenAI
from gimkit import from_vllm

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
model = from_vllm(client, model_name="Qwen/Qwen2.5-7B-Instruct")

Prompt recommendation

For GIM-trained local models, keep use_gim_prompt=False. For non-GIM-trained models, enable use_gim_prompt=True as an extra prompt layer.

Example query:

from gimkit import guide as g

query = f"""
Name: {g.person_name(name="name")}
Phone: {g.phone_number(name="phone")}
"""

# GIM-trained model path
result = model(query)

# Non-GIM-trained model path
result_non_gim = model(query, use_gim_prompt=True)

Output types

output_type="cfg" (default)

vLLM uses CFG constraints by default for strong structure control.

result = model(query, output_type="cfg")

output_type="json"

result = model(query, output_type="json", use_gim_prompt=True)

Notes

  • GIMKit automatically adds stop="<|/GIM_RESPONSE|>" for safer termination.
  • You can still pass extra generation args via **inference_kwargs.