Concurrent programming is notoriously difficult, even for experts. When logically independent requests share various resources (dictionaries, buffer pools, database connections, and so forth), the programmer must orchestrate the sharing, introducing new problems. These problems—data races, deadlocks, livelocks, and so forth—generally derive from a variety of uncertainties that arise when concurrent tasks attempt to manipulate the same data objects in a program. These problems make the basic software development tasks of testing and debugging extremely difficult [...]
The combination of extra concepts, new failure modes, and testing difficulty should give every developer pause. Is this something you really want to bite off? Clearly, the answer is no! However, many will be forced into this swamp in order to deliver the necessary performance. Microsoft is actively developing solutions to some of the core problems, but high-productivity solution stacks are not yet available.
Current multicore chip architectures are able to increase the number of cores faster than memory bandwidth, so for most problems where the data set does not fit in memory, using the memory hierarchy is an important concern. This imbalance gives rise to a style of programming called stream processing where the focus is to stage blocks of data into the on-chip cache (or perhaps private memory) and then perform as many operations against that data as possible before displacing it with the next block. Those operations may be internally parallel to use the multiple cores or they may be pipelined in a data flow style, but the key issue is to do as much work on the data in the cache while it is there.