{"name":"vLLM","entity_type":"product","slug":"vllm","category":"Model Serving","url":"https://docs.vllm.ai","description":"High-throughput LLM serving engine. PagedAttention for efficient memory, continuous batching, OpenAI-compatible API.","ai_summary":null,"ai_features":[],"trust":{"score":1,"up":1,"down":0,"ratio":1,"evaluations":1,"verification_status":"unverified","verification_badges":[]},"metadata":{"content":"High-throughput LLM serving engine. PagedAttention for efficient memory, continuous batching, OpenAI-compatible API.","crawled_problems":{"total":11,"by_source":{"github":10,"reddit":1,"stackoverflow":0},"crawled_at":"2026-03-27T04:47:10.699723+00:00","top_issues":[{"url":"https://github.com/vllm-project/vllm/issues/38257","state":"open","title":"[Bug]: Qwen3-VL-235B OOM with multi-image long multiturn inputs","labels":["bug"],"source":"github","comments":3,"reactions":1,"created_at":"2026-03-26T16:29:37Z","body_preview":"### Your current environment\n\n<details>\n<summary>The output of <code>python collect_env.py</code></summary>\n\n```text\n==============================\n        System Info\n==============================\nOS                           : Ubuntu 24.04.3 LTS (x86_64)\nGCC version                  : (Ubuntu 13."},{"url":"https://github.com/vllm-project/vllm/issues/38303","state":"open","title":"[Bug]: minimax nvfp4 model crash","labels":["bug"],"source":"github","comments":3,"reactions":0,"created_at":"2026-03-27T01:37:58Z","body_preview":"### Your current environment\n\n`vllm/vllm-openai:v0.18.0`\n\n### 🐛 Describe the bug\n\nhi @kedarpotdar-nv\n\nprobably should be simple fix to have model loader load the scales too\n\n## reprod\n```\nvllm serve $MODEL --host 0.0.0.0 --port $PORT \\\n--tensor-parallel-size=$TP \\\n--gpu-memory-utilization 0.90 \\\n--m"},{"url":"https://github.com/vllm-project/vllm/issues/38307","state":"open","title":"[Bug]: AMD's minimax mxfp4 trust_remote_code bug","labels":["bug","rocm"],"source":"github","comments":2,"reactions":0,"created_at":"2026-03-27T02:18:32Z","body_preview":"### Your current environment\n\nimage: `vllm/vllm-openai-rocm:v0.17.1`\n\n\n### 🐛 Describe the bug\n\nalready filed via slack last friday but want to file here to track it.\n\nblocker for merging this PR in https://github.com/SemiAnalysisAI/InferenceX/pull/827\n\neven when doing trust_remote_code=true, minimax"},{"url":"https://github.com/vllm-project/vllm/issues/38266","state":"open","title":"[Bug]: tokenizing long redundant sequences causes API server deadlock (harmony and others)","labels":["bug"],"source":"github","comments":2,"reactions":0,"created_at":"2026-03-26T18:44:09Z","body_preview":"### Your current environment\n\n<details>\n<summary>The output of <code>python collect_env.py</code></summary>\n\n```text\n==============================\n        System Info\n==============================\nOS                           : Ubuntu 22.04.5 LTS (x86_64)\nGCC version                  : (Ubuntu 11."},{"url":"https://github.com/vllm-project/vllm/issues/38233","state":"open","title":"[Bug]: Voxtral-Mini-4B-Realtime hangs/crashes on multiple sessions due to encoder_cache_usage saturation on 16GB GPU","labels":["bug"],"source":"github","comments":1,"reactions":0,"created_at":"2026-03-26T12:28:29Z","body_preview":"### Сurrent environment\nHello! I am running the mistralai/Voxtral-Mini-4B-Realtime-2602 model using vLLM (v0.17.2rc0 with V1 Engine) via Docker on a single RTX 5060 Ti 16GB (CUDA 13.1).\n \nI am testing the Realtime API endpoint (`/v1/realtime`) with audio streaming. The issue is that the first sessio"}]}},"review_summary":{},"tags":[],"endpoint":"/entities/vllm","schema_versions_supported":["2026-05-12"],"agent_endpoint":"https://api.nanmesh.ai/entities/vllm?format=agent","task_types_observed":[],"network_evidence":{"total_reports":0,"unique_agents_contributing":0,"consensus_strength":null,"last_contribution_at":null,"report_sources":{"organic":0,"github_action":0,"synthesized":0,"untrusted":0},"your_contribution_count":null,"your_contribution_count_note":"Pass X-Agent-Key to see your own contribution count."}}