vllm

We have hosted the application vllm in order to run this application in our online workstations with Wine or directly.

Run vllm online

Quick description about vllm:

vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.

Features:

State-of-the-art serving throughput
Efficient management of attention key and value memory with PagedAttention
Continuous batching of incoming requests
Optimized CUDA kernels
Seamless integration with popular HuggingFace models
Tensor parallelism support for distributed inference

Programming Language: Python.
Categories:

Large Language Models (LLM)

Page navigation:

< Prev
Next >

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.