Lessons learned from contrasting a BLAS kernel implementations
This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with...
Guardado en:
| Autor principal: | |
|---|---|
| Formato: | Objeto de conferencia |
| Lenguaje: | Inglés |
| Publicado: |
2013
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/31702 |
| Aporte de: |
| id |
I19-R120-10915-31702 |
|---|---|
| record_format |
dspace |
| institution |
Universidad Nacional de La Plata |
| institution_str |
I-19 |
| repository_str |
R-120 |
| collection |
SEDICI (UNLP) |
| language |
Inglés |
| topic |
Ciencias Informáticas BLAS libraries SSPR kernel CPU architecture GPU architecture performance analysis performance measurement software optimization Software libraries Optimization PROCESSOR ARCHITECTURES Performance Analysis and Design Aids |
| spellingShingle |
Ciencias Informáticas BLAS libraries SSPR kernel CPU architecture GPU architecture performance analysis performance measurement software optimization Software libraries Optimization PROCESSOR ARCHITECTURES Performance Analysis and Design Aids More, Andres Lessons learned from contrasting a BLAS kernel implementations |
| topic_facet |
Ciencias Informáticas BLAS libraries SSPR kernel CPU architecture GPU architecture performance analysis performance measurement software optimization Software libraries Optimization PROCESSOR ARCHITECTURES Performance Analysis and Design Aids |
| description |
This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved. |
| format |
Objeto de conferencia Objeto de conferencia |
| author |
More, Andres |
| author_facet |
More, Andres |
| author_sort |
More, Andres |
| title |
Lessons learned from contrasting a BLAS kernel implementations |
| title_short |
Lessons learned from contrasting a BLAS kernel implementations |
| title_full |
Lessons learned from contrasting a BLAS kernel implementations |
| title_fullStr |
Lessons learned from contrasting a BLAS kernel implementations |
| title_full_unstemmed |
Lessons learned from contrasting a BLAS kernel implementations |
| title_sort |
lessons learned from contrasting a blas kernel implementations |
| publishDate |
2013 |
| url |
http://sedici.unlp.edu.ar/handle/10915/31702 |
| work_keys_str_mv |
AT moreandres lessonslearnedfromcontrastingablaskernelimplementations |
| bdutipo_str |
Repositorios |
| _version_ |
1764820468706050049 |