Illinois ECE 498AL: Programming Massively Parallel Processors

Wen-Mei W Hwu

Home › Course › Illinois ECE 498AL: Programming Massively Parallel Processors › About

Illinois ECE 498AL: Programming Massively Parallel Processors

By Wen-Mei W Hwu

University of Illinois at Urbana-Champaign

View Course

Audio podcast
Video podcast
Slides/Notes podcast

Published on

11 Aug 2009

Abstract

Spring 2009

Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number of applications that traditionally used Application Specific Integrated Circuits (ASICs) are now implemented with concurrent processors in order to improve functionality and reduce engineering cost. The real challenge is to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals.

The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Today NVIDIA processors already exhibit this capability. Processors from Intel, AMD, and IBM will begin to qualify as massively parallel in the next several years. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

We will be using NVIDIA processors and the CUDA programming tools in the lab section of the course. Many have reported success in performing non-graphics parallel computation as well as traditional graphics rendering computation on these processors. You will go through structured programming assignments before being turned loose on the final project. Each programming assignment will involve successively more sophisticated programming skills. The final project will be of your own design, with the requirement that the project must involve a demanding application such as mathematics- or physics-intensive simulation or other data-intensive computation, followed by some form of visualization and display of results.

This is a course in programming massively parallel processors for general computation. We are fortunate to have the support and presence of David Kirk, the Chief Scientist of NVIDIA and one of the main driving forces behind the new NVIDIA CUDA technology. Building on architecture knowledge from ECE 411, and general C programming knowledge, we will expose you to the tools and techniques you will need to attack a real-world application for the final project. The final projects will be supported by some real application groups at UIUC and around the country, such as biomedical imaging and physical simulation.

Course Website

Programming Massively Parallel Processors

Topics:

Introduction
GPU Computing and CUDA Programming Model Intro
CUDA Example and CUDA Threads
CUDA Threads Part 2 and API Details
CUDA Memory
CUDA Memory Example
GPU as Part of the PC Architecture
CUDA Threading Hardware
CUDA Memory Hardware
Control Flow in CUDA
Floating Point Performance, precision and Accuracy
Parallel Programming Basics
Parallel Algorithm Basics

Cite this work

Researchers should cite this work as follows:

Wen-Mei W Hwu (2009), "Illinois ECE 498AL: Programming Massively Parallel Processors," https://nanohub.org/resources/7225.

BibTex | EndNote

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 2: The CUDA Programming Model CUDA Programming Model
Topics:

What is GPGPU?
CUDA
An Example of Physical Reality Behind CUDA
Parallel computing on a GPU
CUDA - C With no shader limitations
CUDA Devices and...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 3: CUDA Threads, Tools, Simple Examples CUDA Threads, Tools, Simple Examples
Topics:

A Running example of Matrix Multiplication
Memory Layout of a Matrix in C
Compiling a CUDA Program
Device Emulation Mode Pitfalls
Floating...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 4: CUDA Threads - Part 2 CUDA Threads Part2
Topics:

CUDA Thread Block
Transparent Scalability
G80 CUDA Mode, A Review
Executing Thread Blocks
Thread Scheduling
Block Granularity Considerations
More Details...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories CUDA Memories
Topics:

G80 Implementation of CUDA Memories
CUDA Variable Type Qualifiers
Where to Declare Variables
Variable Type Restrictions
A Common Programming Strategy
GPU Atomic...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 6: CUDA Memories - Part 2 CUDA Memories Part2
Topics:

Tiled Multiply
Breaking Md and Nd into Tiles
Tiled Matrix Multiplication Kernel
CUDA Code - Kernel Execution Configuration
First Order Size considerations...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 7: GPU as part of the PC Architecture GPU as part of the PC Architecture
Topics:

Typical Structure of a CUDA Program
Bandwidth: Gravity of Modern computer Systems
(Original) PCI Bus Specification
PCI as Memory Mapped I/O
...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 8: Threading Hardware in G80 Threading Hardware in G80
Topics:

Single Program Multiple Data (SPMD)
Grids and Blocks
CUDA Thread Block : Review
Geforce-8 Series Hardware Overview
CUDA Processor Terminology
Stream...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 9: Memory Hardware in G80 Memory Hardware in G80
Topics:

CUDA Device Memory Space
Parallel Memory Sharing
SM Memory Architecture
SM Register File
Programmer view of Register File
Matrix Multiplication...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 10: Control Flow Control Flow
Topics:

Terminology Review
How Thread Blocks are Partitioned
Control Flow Instructions
Parallel Reduction
A Vector Reduction Example
A simple Implementation
Vector...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 11: Floating Point Considerations Floating Point Considerations
Topics:

GPU Floating Point Features
Normalized Representation
Exponent Representation
Representable Numbers
Flush to Zero
Denormaliztion
Runtime Math...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 12: Structuring Parallel Algorithms Structuring Parallel Algorithms
Topics:

Key Parallel Programming Steps
Algorithms
Choosing Algorithm Structure
Mapping a Divide and Conquer algorithm
Tiled Algorithms
Increased work...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation Structuring Parallel Algorithms
Topics:

Parallel Reductions
Parallel Prefix Sum
Relevance of Scan
Application of Scan
Scan on the CPU
First attempt Parallel Scan Algorithm
Work...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 14: Application Case Study - Quantative MRI Reconstruction Quantative MRI Reconstruction
Topics:

Reconstructing MR Images
An exciting revolution: Sodium Map of the Brain
Least Squares reconstruction
Q vs. FhD
Algorithms to Accelerate
From...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 15: Kernel and Algorithm Patterns for CUDA Kernel and Algorithm Patterns for CUDA
Topics:

Reductions and Memory Patterns
Reduction Patterns in CUDA
Mapping Data into CUDA's Memories
Input/Output Convolution
Generic Algorithm...

Illinois ECE 498AL: Programming Massively Parallel Processors

Category

Published on

Abstract

Cite this work

Tags

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Lecture Notes

Video

Video