These days, large language models can handle increasingly complex tasks, writing complex code and engaging in sophisticated ...
Abstract: In this paper, a table lookup-based computing technique is proposed to perform convolutional neural network (CNN) inference without multiplication, and its FPGA implementation is ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
Abstract: General sparse matrix-matrix multiplication (SpGEMM) is a fundamental computational method with wide-ranging applications in scientific simulations, machine learning, and image processing.