![]() The flexible and efficient application of dense linear algebra is crucial within deep learning and the broader GPU computing ecosystem. With CUTLASS, we would like to give everyone the techniques and structures they need to develop new algorithms in CUDA C++ using high-performance GEMM constructs as building blocks. ![]() Today, our ability to adapt these GEMM strategies and algorithms is critical to delivering the best performance for many different problems and applications within deep learning. When constructing cuDNN, we began with our high-performance implementations of general matrix multiplication (GEMM) in the cuBLAS library, supplementing and tailoring them to efficiently compute convolution. Matrix multiplication is also the core routine when computing convolutions based on Fast Fourier Transforms (FFT) or the Winograd approach. Many operations in modern deep neural networks are either defined as matrix multiplications or can be cast as such.Īs an example, the NVIDIA cuDNN library implements convolutions for neural networks using various flavors of matrix multiplication, such as the classical formulation of direct convolution as a matrix product between image-to-column and filter datasets. Matrix multiplication is a key computation within many scientific applications, particularly those in deep learning. CUTLASS 1.0 is described in the Doxygen documentation and our talk at the GPU Technology Conference 2018. We have decomposed the structure of the GEMM computation into deeper, structured primitives for loading data, computing predicate masks, streaming data at each level of the GEMM hierarchy, and updating the output matrix. ![]() CUTLASS 1.0 has changed substantially from our preview release described in the blog post below. Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |