Department of Computing Science

Assistant professor in parallel and multi-core computing

+46-(0)90-786 70 24

larsk@cs.umu.se

Lars Karlsson

Dept. of Computing Science

Umeå University

SE-901 87 Umeå

Sweden

Matrix and tensor computations

Efficient use of memory hierarchies

Fine-grained and dynamic scheduling

Multi-scale/hybrid parallelism

- Currently no active courses

- Computer Organization and Architecture (HT12)
- Design and Analysis of Parallel Algorithms (VT13, VT12, VT11, VT10, VT09, VT08)
- Parallel Computer Systems (VT13)
- Matrix Computations and Applications (HT11, HT10, HT09, HT08, HT07)
- Programspråk (HT09, HT08)

- Optimally Packed Chains of Bulges in Multishift QR Algorithms. ACM Transactions on Mathematical Software (accepted 2013).
- Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms. Parallel Computing (accepted 2013).
- Parallel two-stage reduction to Hessenberg form on shared-memory architectures. Parallel Computing, volume 37, issue 12, December 2011, pages 771-782.
- Efficient reduction from block Hessenberg form to Hessenberg form using shared memory. Proceedings of PARA 2010, Applied Parallel and Scientific Computing, LNCS, volume 7134, 2012, pages 258-268.
- Computing Codimensions and Generic Canonical Forms for Generalized Matrix Products. Electronic Journal of Linear Algebra, volume 22, 2011, pages 277-309.
- Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion. ACM Transactions on Mathematical Software, volume 38, issue 3, April 2012, pages 17:1-17:32.
- Blocked and Scalable Matrix Computations --- Packed Cholesky, In-Place Transposition, and Two-Sided Transformations. Licentiate Thesis, Dept. of Computing Science, Umeå University, Sweden, 2009. Report UMINF 09.11, ISBN 978-91-7264-788-6.
- Blocked In-Place Transposition with Application to Storage Format Conversion. Technical Report UMINF 09.01, Dept. of Computing Science, Umeå University, Sweden, 2009.
- A Framework for Dynamic Node-Scheduling of Two-Sided Blocked Matrix Computations. In Proceedings of PARA 2008 (accepted), 2009.
- Distributed SBP Cholesky Factorization Algorithms with Near-Optimal Scheduling. ACM Transactions on Mathematical Software, Volume 36, Number 2, pages 11:1-11:25, 2009. (Also published as Report UMINF 07.19 and IBM Research Report RC24342.)
- Three Algorithms for Cholesky Factorization on Distributed Memory using Packed Storage. In Applied Parallel Computing: State of the Art in Scientific Computing (PARA 2006), Lecture Notes in Computer Science, LNCS 4699, pages 550-559, Springer, 2007.

- Source code (Fortran95 using OpenMP)
- The underlying theory is presented in [6]
- Collaborators: Fred Gustavson and Bo Kågström

- Source code (Python)
- Detailed information and examples
- The underlying theory is presented in [5]
- Collaborators: Daniel Kressner and Bo Kågström

- Source code (C using MPI)
- The underlying theory is presented in [10]
- Collaborators: Fred Gustavson and Bo Kågström