>

Analysis of HPC Matrix Multiplication Performance Benchmarking

This post analyzes matrix multiplication performance on Intel Xeon CPUs and NVIDIA V100 GPUs, comparing results across C++, OpenMP, CUDA, MPI, NVSHMEM, and Python frameworks like NumPy and CuPy.

January 11, 2026 · 19 min · 3873 words · Tategoto Azarasi