How we did it:
For any feedback, any questions, any notes or just for chat - feel free to follow us on social networks
Jason Sanders, Edward Kandrot
The complete guide to developing high-performance applications with CUDA - written by CUDA development team members, and supported by NVIDIA * *Breakthrough techniques for using the power of graphics processors to create highperformance general purpose applications. *Packed with realistic, C-based examples -- from basic to advanced. *Covers one of today's most highly-anticipated new technologies for software development wherever performance is crucial: finance, design automation, science, simulation, graphics, and beyond. NVIDIA graphics processors have immense computational power. With NVIDIA's breakthrough CUDA software platform, that power can be put to work in virtually any type of software development that requires exceptionally high performance, from finance to physics. Now, for the first time, two of NVIDIA's senior CUDA developers thoroughly introduce the platform, and show developers exactly how to make the most of it. CUDA C by Example is the first book on CUDA development for professional programmers - and the only book created with NVIDIA's direct involvement. Concise and practical, it focuses on presenting proven techniques and concrete example code for building high-performance parallelized CUDA programs with C. Programmers familiar with C will need no other skills or experience to get started - making high-performance programming more accessible than it's ever been before.
Michael Abrash explores the inner workings of all Intel-based PCs including the hot new Pentium. This is the only book available that provides practical and innovative "right-brain" approaches to writing fast PC software using C/C++ and assembly language. This book is packed with "from the trenches" programming secrets and features "undocumented" Pentium programming tips. Provides hundreds of optimized coding examples.
"OpenCL in Action blends the theory of parallel computing with the practical reality of building high-performance applications using OpenCL. It first guides you through the fundamental data structures in an intuitive manner. Then, it explains techniques for high-speed sorting, image processing, matrix operations, and fast Fourier transform. The book concludes with a deep look at the all-important subject of graphics acceleration. Numerous challenging examples give you different ways to experiment with working code."--Pub. desc.
Benedict Gaster, Lee Howes, David R.. Kaeli
"Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include different types of hardware: Central Processing Units (CPUs), Digital Signal Processors (DSPs), Graphic Processing Units (GPUs) and Accelerated Processing Units (APUs). Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future.
The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization. The accompanying open source code–more than 25,000 lines of it, freely available at www.cudahandbook.com–is specifically intended to be reused and repurposed by developers. Designed to be both a comprehensive reference and a practical cookbook, the text is divided into the following three parts: Part I, Overview, gives high-level descriptions of the hardware and software that make CUDA possible. Part II, Details, provides thorough descriptions of every aspect of CUDA, including Memory Streams and events Models of execution, including the dynamic parallelism feature, new with CUDA 5.0 and SM 3.5 The streaming multiprocessors, including descriptions of all features through SM 3.5 Programming multiple GPUs Texturing The source code accompanying Part II is presented as reusable microbenchmarks and microdemos, designed to expose specific hardware characteristics or highlight specific use cases. Part III, Select Applications, details specific families of CUDA applications and key parallel algorithms, including Streaming workloads Reduction Parallel prefix sum (Scan) N-body Image Processing These algorithms cover the full range of potential CUDA applications.