Learn your way! Get started

C++ AMP, Part 2 of 2: Memory Layout and Support

with expert John Stratton

Watch trailer

Course at a glance

Included in these subscriptions:

  • Dev & IT Pro Video
  • Dev & IT Pro Power Pack

Release date 8/5/2013
Level Advanced
Runtime 2h 22m
Closed captioning Included
Transcript Included
eBooks / courseware Included
Hands-on labs N/A
Sample code Included
Exams Included

Enterprise Solutions

Need reporting, custom learning tracks, or SCORM? Learn More

Course description

In this course you’ll learn about how accelerator hardware is designed and integrated into the system. With that foundation, we can start talking about what you can expect from the system when you use various C++AMP features. Specifically, we will talk about data transfers to and from the accelerator, memory layout and memory accesses from the accelerator, and thread execution and control flow on the accelerator. Then we’ll cover what support Microsoft’s Visual Studio 2012 has for C++ AMP.


This course assumes that you have a good understanding of core C++ concepts, included classes, objects, containers, and iterators. You should also be familiar with Visual Studio 2012 for Visual C++ development, including compilation, testing, and debugging. Although not required or expected, you may get more out of some parts of the course if you are familiar with multithreaded programming, Visual Studio 2012’s debugging capabilities for multiple threads, and basic computer architecture concepts.

Meet the expert

John Stratton, Ph.D., is a senior architect at Multicoreware Inc. and a visiting lecturer at the University of Illinois at Urbana-Champaign. John has been at the forefront of research and education in heterogeneous computing, reaching hundreds of students through the Virtual School of Computational Science and Engineering’s courses on heterogeneous computing and optimization for scientific applications. John writes papers and articles for leading academic conferences and journals as well as broad-reaching publications such as IEEE Computer. He is also an active participant and presenter at several industry and technology groups and events across the country.

Course outline

Memory Layout

Memory Layout Overview (25:47)
  • Introduction (00:48)
  • GPU Architecture Overview (08:45)
  • Minimum Scale of Parallelism (07:43)
  • Demo: Scale and Preformance (03:20)
  • Demo: Benchmark Results (04:34)
  • Summary (00:35)
Memory Layout and SIMD (31:04)
  • Introduction (01:03)
  • Memory Layout and Accesses (06:37)
  • Good Access Patterns (00:51)
  • Demo: Transpose Operation (05:16)
  • Implicit SIMD Execution (03:57)
  • Divergent Penalties (03:07)
  • Demo: Divergence (04:32)
  • Demo: Divergence Problems (04:42)
  • Summary (00:55)
Data Transfers (17:42)
  • Introduction (00:46)
  • Host-Accelerator Data Transfers (03:30)
  • When Data Transfers Happen (03:02)
  • Demo: Data-Transfers (05:48)
  • Demo: Array View (04:01)
  • Summary (00:33)

Support for C++ AMP

Windows Support (14:44)
  • Introduction (00:30)
  • C++AMP uses Direct Compute (03:31)
  • Demo: AMP Implementations (04:01)
  • Demo: Multiple Accelerators (05:16)
  • Summary (01:23)
Debugging (20:56)
  • Introduction (00:43)
  • C++AMP Debugging (02:56)
  • Demo: Debugging C++Amp (05:24)
  • Demo: Debugging Tools (07:17)
  • Demo: Freezing Threads (02:00)
  • Debugging Parallel Kernal Code (01:52)
  • Summary (00:41)
Tiling (32:32)
  • Introduction (00:52)
  • Tiled Extents and Indexes (01:21)
  • Tiled Accelerator Execution (04:58)
  • Demo: Tiled Extents (05:44)
  • Tiled Accelerator Execution (2) (06:58)
  • Demo: Tile Size (05:50)
  • Demo: Tile Variables (05:40)
  • Summary (01:05)