返回博客ai-architecture 
Green AI: Cut Inference Cost 80% with Quantisation, Distillation, Speculative Decoding (2026)
April 28, 202620 min read
Green AI LLM inference cost quantisation GPTQ AWQ INT4 quantisation INT8 quantisation FP8 speculative decoding EAGLE-2 Medusa distillation continuous batching paged attention vLLM TGI TensorRT-LLM SGLang prefix caching KV cache model routing spot GPU MIG carbon-aware scheduling inference cost optimisation

Frequently Asked Questions
Satyam
人工智能和云架构师。帮助团队构建可扩展到数百万的系统。