Tencent Hunyuan Releases HPC-Ops: A High Performance LLM Inference Operator Library
Tencent Hunyuan has open sourced HPC-Ops, a production grade operator library for large language model inference architecture devices. HPC-Ops focuses on low level CUDA kernels for core operators such as Attention, Grouped GEMM, and Fused […]
