Concurrent Number Cruncher : An Efficient Sparse Linear Solver on the GPU.

Luc Buatois and Guillaume Caumon and Bruno Levy. ( 2007 )

in: Proc. 27th Gocad Meeting, Nancy

Abstract

A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse generalpurpose linear solver based on the Conjugate Gradient algorithm which outperforms by up to a factor of 5.5x leading-edge CPU counterparts (MKL / ACML).

Download / Links

BibTeX Reference

@inproceedings{206_Buatois,
 abstract = { A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where
the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs
with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical
computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give
a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs;
CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS).
However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed
row storage strategies. By combining recent GPU programming techniques with supercomputing
strategies (namely block compressed row storage and register blocking), we implement a sparse generalpurpose
linear solver based on the Conjugate Gradient algorithm which outperforms by up to a factor of
5.5x leading-edge CPU counterparts (MKL / ACML). },
 author = { Buatois, Luc AND Caumon, Guillaume AND Levy, Bruno },
 booktitle = { Proc. 27th Gocad Meeting, Nancy },
 title = { Concurrent Number Cruncher : An Efficient Sparse Linear Solver on the GPU. },
 year = { 2007 }
}

Concurrent Number Cruncher : An Efficient Sparse Linear Solver on the GPU.

Abstract

Download / Links

BibTeX Reference

QuickLinks for Sponsors

Proceedings Archives

2025 RING meeting

2024 RING meeting

2023 RING meeting

2022 RING meeting

2021 RING meeting

2020 RING meeting

2019 RING meeting

2018 RING meeting

2017 RING meeting

2016 RING meeting

2015 RING meeting

34th (2014) gOcad meeting

33rd (2013) gOcad meeting

32nd (2012) gOcad meeting

31st (2011) gOcad meeting

30th (2010) gOcad meeting

29th (2009) Spring gOcad meeting

2009 Fall gOcad meeting

[1989-2008] gOcad Archive