Multi - Core Architectures and Programming
- Lecture Notes, Study Materials and Important questions answers
Subject : Multi - Core Architectures and Programming
An Introduction to Parallel Programming by Peter S Pacheco
Chapter 1 Why Parallel Computing
- Why Parallel Computing? - Answer (click here)
- Why We Need Ever-Increasing Performance - Answer (click here)
- Why We’re Building Parallel Systems - Answer (click here)
- Why we Need to Write Parallel Programs - Answer (click here)
- How Do We Write Parallel Programs? - Answer (click here)
- Concurrent, Parallel, Distributed - Answer (click here)
Chapter 2 Parallel Hardware and Parallel Software
- Parallel Hardware and Parallel Software - Answer (click here)
- Some Background: von Neumann architecture, Processes, multitasking, and threads - Answer (click here)
- Modifications to the Von Neumann Model - Answer (click here)
- Parallel Hardware - Answer (click here)
- Parallel Software - Answer (click here)
- Input and Output - Answer (click here)
- Performance of Parallel Programming - Answer (click here)
- Parallel Program Design with example - Answer (click here)
- Writing and Running Parallel Programs - Answer (click here)
- Assumptions - Parallel Programming - Answer (click here)
Chapter 3 Distributed Memory Programming with MPI
- Distributed-Memory Programming with MPI - Answer (click here)
- The Trapezoidal Rule in MPI - Answer (click here)
- Dealing with I/O - Answer (click here)
- Collective Communication - Answer (click here)
- MPI Derived Datatypes - Answer (click here)
- Performance Evaluation of MPI Programs - Answer (click here)
- A Parallel Sorting Algorithm - Answer (click here)
Chapter 4 Shared Memory Programming with Pthreads
- Shared-Memory Programming with Pthreads - Answer (click here)
- Processes, Threads, and Pthreads - Answer (click here)
- Pthreads - Hello, World Program - Answer (click here)
- Matrix-Vector Multiplication - Answer (click here)
- Critical Sections - Answer (click here)
- Busy-Waiting - Answer (click here)
- Mutexes - Answer (click here)
- Producer-Consumer Synchronization and Semaphores - Answer (click here)
- Barriers and Condition Variables - Answer (click here)
- Read-Write Locks - Answer (click here)
- Caches, Cache Coherence, and False Sharing - Answer (click here)
- Thread-Safety - Answer (click here)
- Shared-Memory Programming with OpenMP - Answer (click here)
- The Trapezoidal Rule - Answer (click here)
- Scope of Variables - Answer (click here)
- The Reduction Clause - Answer (click here)
- The parallel For Directive - Answer (click here)
- More About Loops in Openmp: Sorting - Answer (click here)
- Scheduling Loops - Answer (click here)
- Producers and Consumers - Answer (click here)
- Caches, Cache Coherence, and False Sharing - Answer (click here)
- Thread-Safety - Answer (click here)
- Parallel Program Development - Answer (click here)
- Two n-Body Solvers - Answer (click here)
- Parallelizing the basic solver using OpenMP - Answer (click here)
- Parallelizing the reduced solver using OpenMP - Answer (click here)
- Evaluating the OpenMP codes - Answer (click here)
- Parallelizing the solvers using pthreads - Answer (click here)
- Parallelizing the basic solver using MPI - Answer (click here)
- Parallelizing the reduced solver using MPI - Answer (click here)
- Performance of the MPI solvers - Answer (click here)
- Tree Search - Answer (click here)
- Recursive depth-first search - Answer (click here)
- Nonrecursive depth-first search - Answer (click here)
- Data structures for the serial implementations - Answer (click here)
- Performance of the serial implementations - Answer (click here)
- Parallelizing tree search - Answer (click here)
- A static parallelization of tree search using pthreads - Answer (click here)
- A dynamic parallelization of tree search using pthreads - Answer (click here)
- Evaluating the Pthreads tree-search programs - Answer (click here)
- Parallelizing the tree-search programs using OpenMP - Answer (click here)
- Performance of the OpenMP implementations - Answer (click here)
- Implementation of tree search using MPI and static partitioning - Answer (click here)
- Implementation of tree search using MPI and dynamic partitioning - Answer (click here)
- Which API? - Answer (click here)
Multicore Application Programming For Windows Linux and Oracle Solaris by Darryl Gove
Chapter 1 Hardware and Processes and Threads
- Hardware, Processes, and Threads - Answer (click here)
- Examining the Insides of a Computer - Answer (click here)
- The Motivation for Multicore Processors - Answer (click here)
- Supporting Multiple Threads on a Single Chip - Answer (click here)
- Increasing Instruction Issue Rate with Pipelined Processor Cores - Answer (click here)
- Using Caches to Hold Recently Used Data - Answer (click here)
- Using Virtual Memory to Store Data - Answer (click here)
- Translating from Virtual Addresses to Physical Addresses - Answer (click here)
- The Characteristics of Multiprocessor Systems - Answer (click here)
- How Latency and Bandwidth Impact Performance - Answer (click here)
- The Translation of Source Code to Assembly Language - Answer (click here)
- The Performance of 32-Bit versus 64-Bit Code - Answer (click here)
- Ensuring the Correct Order of Memory Operations - Answer (click here)
- The Differences Between Processes and Threads - Answer (click here)
Chapter 2 Coding for Performance
- Coding for Performance - Answer (click here)
- Defining Performance - Answer (click here)
- Understanding Algorithmic Complexity - Answer (click here)
- Why Algorithmic Complexity Is Important - Answer (click here)
- Using Algorithmic Complexity with Care - Answer (click here)
- How Structure Impacts Performance - Answer (click here)
- Performance and Convenience Trade-Offs in Source Code and Build Structures - Answer (click here)
- Using Libraries to Structure Applications - Answer (click here)
- The Impact of Data Structures on Performance - Answer (click here)
- The Role of the Compiler - Answer (click here)
- The Two Types of Compiler Optimization - Answer (click here)
- Selecting Appropriate Compiler Options - Answer (click here)
- How Cross-File Optimization Can Be Used to Improve Performance - Answer (click here)
- Using Profile Feedback - Answer (click here)
- How Potential Pointer Aliasing Can Inhibit Compiler Optimizations - Answer (click here)
- Identifying Where Time Is Spent Using Profiling - Answer (click here)
- Commonly Available Profiling Tools - Answer (click here)
- How Not to Optimize - Answer (click here)
- Performance by Design - Answer (click here)
Chapter 3 Identifying Opportunities for Parallelism
- Identifying Opportunities for Parallelism - Answer (click here)
- Using Multiple Processes to Improve System Productivity - Answer (click here)
- Multiple Users Utilizing a Single System - Answer (click here)
- Improving Machine Efficiency Through Consolidation - Answer (click here)
- Using Containers to Isolate Applications Sharing a Single System - Answer (click here)
- Hosting Multiple Operating Systems Using Hypervisors - Answer (click here)
- Using Parallelism to Improve the Performance of a Single Task - Answer (click here)
- One Approach to Visualizing Parallel Applications - Answer (click here)
- How Parallelism Can Change the Choice of Algorithms - Answer (click here)
- Amdahl’s Law - Answer (click here)
- Determining the Maximum Practical Threads - Answer (click here)
- How Synchronization Costs Reduce Scaling - Answer (click here)
- Parallelization Patterns - Answer (click here)
- Data Parallelism Using SIMD Instructions - Answer (click here)
- Parallelization Using Processes or Threads - Answer (click here)
- Multiple Independent Tasks - Answer (click here)
- Multiple Loosely Coupled Tasks - Answer (click here)
- Multiple Copies of the Same Task - Answer (click here)
- Single Task Split Over Multiple Threads - Answer (click here)
- Using a Pipeline of Tasks to Work on a Single Item - Answer (click here)
- Division of Work into a Client and a Server - Answer (click here)
- Splitting Responsibility into a Producer and a Consumer - Answer (click here)
- Combining Parallelization Strategies - Answer (click here)
- How Dependencies Influence the Ability Run Code in Parallel - Answer (click here)
- Antidependencies and Output Dependencies - Answer (click here)
- Using Speculation to Break Dependencies - Answer (click here)
- Critical Paths - Answer (click here)
- Identifying Parallelization Opportunities - Answer (click here)
Chapter 4 Synchronization and Data Sharing
- Synchronization and Data Sharing - Answer (click here)
- Data Races - Answer (click here)
- Using Tools to Detect Data Races - Answer (click here)
- Avoiding Data Races - Answer (click here)
- Synchronization Primitives - Answer (click here)
- Mutexes and Critical Regions - Answer (click here)
- Spin Locks - Answer (click here)
- Semaphores - Answer (click here)
- Readers-Writer Locks - Answer (click here)
- Barriers - Answer (click here)
- Atomic Operations and Lock-Free Code - Answer (click here)
- Deadlocks and Livelocks - Answer (click here)
- Communication Between Threads and Processes - Answer (click here)
- Storing Thread-Private Data - Answer (click here)
Chapter 5 Using POSIX Threads
- Using POSIX Threads - Answer (click here)
- Creating Threads - Answer (click here)
- Compiling Multithreaded Code - Answer (click here)
- Process Termination - Answer (click here)
- Sharing Data Between Threads - Answer (click here)
- Variables and Memory - Answer (click here)
- Multiprocess Programming - Answer (click here)
- Sockets - Answer (click here)
- Reentrant Code and Compiler Flags - Answer (click here)
- Windows Threading - Answer (click here)
Chapter 6 Windows Threading
- Creating Native Windows Threads - Answer (click here)
- Terminating Threads - Answer (click here)
- Creating and Resuming Suspended Threads - Answer (click here)
- Using Handles to Kernel Resources - Answer (click here)
- Methods of Synchronization and Resource Sharing - Answer (click here)
- An Example of Requiring Synchronization Between Threads - Answer (click here)
- Protecting Access to Code with Critical Sections - Answer (click here)
- Protecting Regions of Code with Mutexes - Answer (click here)
- Slim Reader/Writer Locks - Answer (click here)
- Signaling Event Completion to Other Threads or Processes - Answer (click here)
- Wide String Handling in Windows - Answer (click here)
- Creating Processes - Answer (click here)
- Sharing Memory Between Processes - Answer (click here)
- Inheriting Handles in Child Processes - Answer (click here)
- Naming Mutexes and Sharing Them Between Processes - Answer (click here)
- Communicating with Pipes - Answer (click here)
- Communicating Using Sockets - Answer (click here)
- Atomic Updates of Variables - Answer (click here)
- Allocating Thread-Local Storage - Answer (click here)
- Setting Thread Priority - Answer (click here)
Chapter 7 Using Automatic Parallelization and OpenMP
- Using Automatic Parallelization and OpenMP - Answer (click here)
- Using Automatic Parallelization to Produce a Parallel Application - Answer (click here)
- Identifying and Parallelizing Reductions - Answer (click here)
- Automatic Parallelization of Codes Containing Calls - Answer (click here)
- Assisting Compiler in Automatically Parallelizing Code - Answer (click here)
- Using OpenMP to Produce a Parallel Application - Answer (click here)
- Using OpenMP to Parallelize Loops - Answer (click here)
- Runtime Behavior of an OpenMP Application - Answer (click here)
- Variable Scoping Inside OpenMP Parallel Regions - Answer (click here)
- Parallelizing Reductions Using OpenMP - Answer (click here)
- Accessing Private Data Outside the Parallel Region - Answer (click here)
- Improving Work Distribution Using Scheduling - Answer (click here)
- Using Parallel Sections to Perform Independent Work - Answer (click here)
- Nested Parallelism - Answer (click here)
- Using OpenMP for Dynamically Defined Parallel Tasks - Answer (click here)
- Keeping Data Private to Threads - Answer (click here)
- Controlling the OpenMP Runtime Environment - Answer (click here)
- Waiting for Work to Complete - Answer (click here)
- Restricting the Threads That Execute a Region of Code - Answer (click here)
- Ensuring That Code in a Parallel Region Is Executed in Order - Answer (click here)
- Collapsing Loops to Improve Workload Balance - Answer (click here)
- Enforcing Memory Consistency - Answer (click here)
- An Example of Parallelization - Answer (click here)
Chapter 8 Hand Coded Synchronization and Sharing
- Hand-Coded Synchronization and Sharing - Answer (click here)
- Atomic Operations - Answer (click here)
- Using Compare and Swap Instructions to Form More Complex Atomic Operations - Answer (click here)
- Enforcing Memory Ordering to Ensure Correct Operation - Answer (click here)
- Compiler Support of Memory-Ordering Directives - Answer (click here)
- Reordering of Operations by the Compiler - Answer (click here)
- Volatile Variables - Answer (click here)
- Operating System–Provided Atomics - Answer (click here)
- Lockless Algorithms - Answer (click here)
- Dekker’s Algorithm - Answer (click here)
- Producer-Consumer with a Circular Buffer - Answer (click here)
- Scaling to Multiple Consumers or Producers - Answer (click here)
- Scaling the Producer-Consumer to Multiple Threads - Answer (click here)
- Modifying the Producer-Consumer Code to Use Atomics - Answer (click here)
- The ABA Problem - Answer (click here)
Chapter 9 Scaling with Multicore Processors
- Scaling with Multicore Processors - Answer (click here)
- Constraints to Application Scaling - Answer (click here)
- Hardware Constraints to Scaling - Answer (click here)
- Bandwidth Sharing Between Cores - Answer (click here)
- False Sharing - Answer (click here)
- Cache Conflict and Capacity - Answer (click here)
- Pipeline Resource Starvation - Answer (click here)
- Operating System Constraints to Scaling - Answer (click here)
- Multicore Processors and Scaling - Answer (click here)
Chapter 10 Other Parallelization Technologies
- Other Parallelization Technologies - Answer (click here)
- GPU-Based Computing - Answer (click here)
- Language Extensions - Answer (click here)
- Alternative Languages - Answer (click here)
- Clustering Technologies - Answer (click here)
- Transactional Memory - Answer (click here)
- Vectorization - Answer (click here)
No comments:
Post a Comment