poe-api-wrapperA simple, lightweight and efficient API wrapper for Poe.compythonpoequorachatgptclaudepoe-apiapichatbotcode-llamadall-egeminigpt-4groqllamamistralopenaipalm2qwenreverse-engineeringstable-diffusion
node-llama-cppRun AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation levelllamallama-cppllama.cppbindingsaicmakecmake-jsprebuilt-binariesllmggufmetalcudavulkangrammarembeddingrerankrerankingjson-grammarjson-schema-grammarfunctionsfunction-callingtoken-predictionspeculative-decodingtemperatureminPtopKtopPseedjson-schemaraspberry-piself-hostedlocalcataimistraldeepseekqwenqwqtypescriptlorabatchinggpunodejs
vllm-npuA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
llmtunerEasy-to-use LLM fine-tuning frameworkLLaMABLOOMFalconLLMChatGPTtransformerpytorchdeeplearningagentaichatglmfine-tuninggptinstruction-tuninglanguage-modellarge-language-modelsllama3loramistralmoepeftqloraquantizationqwenrlhftransformers
zh-langchainChinese language processing librarychatbotchatchatchatglmchatgptembeddingfaissfastchatgptknowledge-baselangchainlangchain-chatglmllamallmmilvusollamaqwenragretrieval-augmented-generationstreamlitxinference
tilearn-inferA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
vllm-onlineA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu
xfastertransformer-develBoost large language model inference performance on CPU platform.LLMchatglminferenceintelllamamodel-servingqwentransformerxeon
xinferenceModel Serving Made Easyartificial-intelligencechatglmdeploymentflan-t5gemmaggmlglm4inferencellamallama3llamacppllmmachine-learningmistralopenai-apipytorchqwenvllmwhisperwizardlm
nextai-vllmA high-throughput and memory-efficient inference and serving engine for LLMsamdcudadeepseekgpthpuinferenceinferentiallamallmllm-servingllmopsmlopsmodel-servingpytorchqwenrocmtputrainiumtransformerxpu