#cublas

AI Reddit Apr 11, 2026 2 min read

Reddit Flags a Possible cuBLAS Regression on RTX 5090 Batched FP32 Workloads

A MachineLearning thread argues that cuBLAS may be choosing an inefficient kernel for batched FP32 matrix multiplication on RTX 5090. The significance is not just the claimed slowdown, but the fact that the post includes reproducible benchmark tables, profiling notes, and linked repro material.

#cublas #rtx-5090 #cuda

AI Reddit Apr 11, 2026 2 min read

Reddit post flags a likely FP32 cuBLAS dispatch problem on RTX 5090

A r/MachineLearning post and linked benchmark writeup argue that batched FP32 SGEMM on RTX 5090 is hitting an inefficient cuBLAS path, leaving much of the GPU idle.

#cuda #cublas #gpu