cutlass

Star

Here are 16 public repositories matching this topic...

bytedance / flux

Star

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

gpu cuda pytorch cutlass

Updated May 28, 2025
C++

coderonion / awesome-cuda-and-hpc

Star

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

Updated May 31, 2025

DD-DuDa / Cute-Learning

Star

Examples of CUDA implementations by Cutlass CuTe

gpu cuda cutlass

Updated Jul 1, 2025
Makefile

leimao / CUTLASS-Examples

Sponsor

Star

CUTLASS and CuTe Examples

docker cuda cutlass

Updated Jan 4, 2025
Cuda

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Feb 27, 2025
C++

YashasSamaga / ConvolutionBuildingBlocks

Star

GEMM and Winograd based convolutions using CUTLASS

deep-learning cuda convolution cutlass

Updated Jul 15, 2020
Cuda

yester31 / Cutlass_EX

Star

study of cutlass

cmake cuda cpp17 cutlass linux-programming parallel-programming

Updated Nov 10, 2024
Cuda

Bruce-Lee-LY / cutlass_gemm

Star

Multiple GEMM operators are constructed with cutlass to support LLM inference.

gpu cublas nvidia cutlass gemm cublaslt llm matrix-multiply tensor-core

Updated Sep 27, 2024
C++

sgl-project / whl

Star

Kernel Library Wheel for SGLang

cuda cutlass sglang flashinfer

Updated Jul 3, 2025
HTML

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

Star

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cuda transformer cutlass cute tensorrt feature-matching multihead-attention superpoint lightglue flash-attention flash-attention-2

Updated Mar 3, 2025
Cuda

cjmcv / ai-infra-notes

Star

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

hpc gpu cuda inference simd cutlass heterogeneous-computing mlsys llm sglang

Updated Jun 29, 2025

digital-nomad-cheng / tvm_project_course

Star

neural-network compiler cuda cutlass tensorrt tvm dl-compiler

Updated Nov 2, 2023
Python

Routhleck / blocksparse-pytorch-implement

Star

pytorch implements block sparse

python cuda pytorch matrix-multiplication cutlass blocksparse tilesparse

Updated May 13, 2023
C++

peterlau123 / Lolly

Star

Lightweight and production level C++ Open source Library

c cpp cuda cutlass

Updated May 7, 2025
C++

prateekshukla1108 / cutlass3

Star

Docs

cutlass

Updated May 14, 2025
HTML

jiaau / kernels

Star

This repository showcases common optimization techniques for kernels.

kernel cpp hpc cuda cutlass cute

Updated Jun 9, 2025
Cuda

Improve this page

Add a description, image, and links to the cutlass topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cutlass topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cutlass

Here are 16 public repositories matching this topic...

bytedance / flux

coderonion / awesome-cuda-and-hpc

DD-DuDa / Cute-Learning

leimao / CUTLASS-Examples

Bruce-Lee-LY / flash_attention_inference

YashasSamaga / ConvolutionBuildingBlocks

yester31 / Cutlass_EX

Bruce-Lee-LY / cutlass_gemm

sgl-project / whl

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

cjmcv / ai-infra-notes

digital-nomad-cheng / tvm_project_course

Routhleck / blocksparse-pytorch-implement

peterlau123 / Lolly

prateekshukla1108 / cutlass3

jiaau / kernels

Improve this page

Add this topic to your repo