vllmA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
io.ray:streaming-apiray streaming apidata-sciencedeep-learningdeploymentdistributedhyperparameter-optimizationhyperparameter-searchlarge-language-modelsllmllm-inferencellm-servingmachine-learningoptimizationparallelpythonpytorchrayreinforcement-learningrllibservingtensorflow
io.ray:ray-servejava for ray servedata-sciencedeep-learningdeploymentdistributedhyperparameter-optimizationhyperparameter-searchlarge-language-modelsllmllm-inferencellm-servingmachine-learningoptimizationparallelpythonpytorchrayreinforcement-learningrllibservingtensorflow
vllm-npuA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
ant-rayRay provides a simple, universal API for building distributed applications.raydistributedparallelmachine-learninghyperparameter-tuningreinforcement-learningdeep-learningservingpythondata-sciencedeploymenthyperparameter-optimizationhyperparameter-searchlarge-language-modelsllmllm-inferencellm-servingoptimizationpytorchreinforcement-learningrllibtensorflow
vllm-onlineA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
secretflow-rayRay provides a simple, universal API for building distributed applications.raydistributedparallelmachine-learninghyperparameter-tuningreinforcement-learningdeep-learningservingpythondata-sciencedeploymenthyperparameter-optimizationhyperparameter-searchlarge-language-modelsllmllm-inferencellm-servingoptimizationpytorchreinforcement-learningrllibtensorflow
rayRay provides a simple, universal API for building distributed applications.raydistributedparallelmachine-learninghyperparameter-tuningreinforcement-learningdeep-learningservingpythondata-sciencedeploymenthyperparameter-optimizationhyperparameter-searchlarge-language-modelsllmllm-inferencellm-servingoptimizationpytorchreinforcement-learningrllibtensorflow
ray-cppA subpackage of Ray which provides the Ray C++ API.raydistributedparallelmachine-learninghyperparameter-tuningreinforcement-learningdeep-learningservingpythondata-sciencedeploymenthyperparameter-optimizationhyperparameter-searchlarge-language-modelsllmllm-inferencellm-servingoptimizationpytorchreinforcement-learningrllibtensorflow
bentomlBentoML: The easiest way to serve AI apps and modelsBentoMLCompoundAISystemsLLMOpsMLOpsModelDeploymentInferenceServingai-inferencedeep-learninggenerative-aiinference-platformllmllm-inferencellm-servingmachine-learningml-engineeringmodel-inference-servicemodel-servingmultimodalpython