C++ is a superpower

  • C++ Garbage Collection

  • Memory Model

  • Atomics


C++0x working draft

1300 stron. Chyba zejdzie trochę czasu na czytanie :)


Scale up or scale out?

Listening to the questions, from the audience and conversations after, one thing was clear - this is a complex challenge that even smart, experienced people struggle to do well.  When moving toward developing software in a parallel environment, there are a lot of things to consider, a lot of questions come up.

How do I train my developers? 
Can I reuse what I have or do I have to rewrite?
What tools should I use?


Actor Model Concurrency

When it comes to programming models, everyone’s favorite whipping boy is the model where access to shared memory is controlled using locks or their close relatives (e.g. semaphores, condition variables). To be sure, this approach is fraught with peril – race conditions, deadlock, livelock, thundering herds, indefinite postponement, lock or priority inversion, and the list just keeps going. The funny thing is that most of these don’t really go away with any of the programming models that are proposed as solutions (including the actor model). For example, software transactional memory is what all the Cool Kids talk about the most. It’s a good model, with many advantages over lock-based programming, but a program can still deadlock at a higher level even if it’s using STM to avoid deadlock at a lower level. There’s an old saying that a bad programmer can write Fortran in any language, and it’s equally true that a bad programmer can create deadlock in any programming model.


Intel 64 Architecture Memory Ordering

Intel 64 memory ordering guarantees that for each of the following memory-access instructions, the constituent memory operation appears to execute as a single memory access regardless of memory type:

  1. Instructions that read or write a single byte.

  2. Instructions that read or write a word (2 bytes) whose address is aligned on a 2 byte boundary.

  3. Instructions that read or write a doubleword (4 bytes) whose address is aligned on a 4 byte boundary.

  4. Instructions that read or write a quadword (8 bytes) whose address is aligned on an 8  byte boundary.

All locked instructions (the implicitly locked xchg instruction and other read-modify-write  instructions with a lock prefix) are an indivisible and uninterruptible sequence of load(s) followed by store(s) regardless of memory type and alignment.
Other instructions may be implemented with multiple memory accesses. From a memory ordering point of view, there are no guarantees regarding the relative order in which the constituent memory accesses are made. There is also no guarantee that the constituent operations of a store are executed in the same order as the constituent operations of a load

Intel 64 memory ordering obeys the following principles:

  1. Loads are not reordered with other loads.

  2. Stores are not reordered with other stores.

  3. Stores are not reordered with older loads.

  4. Loads may be reordered with older stores to different locations but not with older
    stores to the same location.

  5. In a multiprocessor system, memory ordering obeys causality (memory ordering
    respects transitive visibility).

  6. In a multiprocessor system, stores to the same location have a total order.

  7. In a multiprocessor system, locked instructions have a total order.

  8. Loads and stores are not reordered with locked instructions.


Threads scheduling - priority boost

Each thread has a dynamic priority. This is the priority the scheduler uses to determine which thread to execute. Initially, a thread's dynamic priority is the same as its base priority. The system can boost and lower the dynamic priority, to ensure that it is responsive and that no threads are starved for processor time. The system does not boost the priority of threads with a base priority level between 16 and 31. Only threads with a base priority between 0 and 15 receive dynamic priority boosts.


Another situation causes the system to dynamically boost a thread's priority level. Imagine a priority 4 thread that is ready to run but cannot because a priority 8 thread is constantly schedulable. In this scenario, the priority 4 thread is being starved of CPU time. When the system detects that a thread has been starved of CPU time for about three to four seconds, it dynamically boosts the starving thread's priority to 15 and allows that thread to run for twice its time quantum. When the double time quantum expires, the thread's priority immediately returns to its base priority.

When the user works with windows of a process, that process is said to be the foreground process and all other processes are background processes. Certainly, a user would prefer the process that he or she is using to behave more responsively than the background processes. To improve the responsiveness of the foreground process, Windows tweaks the scheduling algorithm for threads in the foreground process. For Windows 2000, the system gives foreground process threads a larger time quantum than they would usually receive. This tweak is performed only if the foreground process is of the normal priority class. If it is of any other priority class, no tweaking is performed.

Programming Applications for Microsoft Windows, Jeffery Richer
Rozdział 7, Cam Programming Priorities

Windows Thread Scheduling by priorities

The system treats all threads with the same priority as equal. The system assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread. For more information, see Context Switches.


The ReaderWriterGate Lock

What I'd like to do now is explain the idea behind the ReaderWriterGate, discuss a possible implementation, and offer a slightly new way of thinking about threading and thread synchronization in general. It is my hope that you will see places in your existing code where this kind of thinking can be applied so that with minimal re-architecting, you could incorporate some of these ideas to give your applications better performance and scalability.


Performance Profiling Without the Overhead


Reader Writer Lock

Poluję na reader writer lock spełniający następujące wymagania:

  1. C++/Win32/32bit

  2. Readers reentrancy.
    Jeśli dany wątek ma reader-lock w metodzie i woła inną metodę w tym samym wątku to ta metoda może wząść reader-lock ponownie (innymi słowy czytelnik może wziąść sekcję czytania wielokrotnie).

  3. Writer reentrancy.
    Jeśli dany wątek ma writer-lock w metodzie i woła inną metodę w tym samym wątku to ta metoda może wząść writer-lock ponownie (innymi słowy czytelnik może wziąść sekcję czytania wielokrotnie).

  4. Hierachy WR.
    Jeśli dany wątek ma writer-lock to może wziąść reader-lock.

  5. Hierachy RW.
    Jeśli dany wątek ma reader-lock to może wziąść writer-lock (chyba marzenie, nie wiem czy to do osiągnięcia bez dead-lock'ów).

  6. Weakening of writer-lock.
    Jesli wątek ma writer-lock to może go osłabić i zamienić na reader-lock (nie wymagane, mile widziane).

  7. Preferences.
    Nie daje preferencji ani pisarzom ani czytelnikom (jeśli już trzeba to niech preferuje pisarzy).

  8. TryEnter
    Metody w stylu try-enter mile widziane zarówno dla czytelnika jak i dla pisarza.

  9. Timeouts
    Timeouty do czekania mile widziane zarówno dla czytelnika jak i dla pisarza.

  10. Fairness
    "fairness", sprawiedliwość nie jest wymagana choć jeśli by była to bym nie płakał

  11. Starvation
    Zagłodzenie. Nie mile widziane choć akceptowalne.

Troszkę literatury:


Dodano 2010-06-24:

Don't let a long-running operation take hostages


Low-Lock Techniques - Vance Morrison

The biggest conceptual difference between sequential and multithreaded programs is the way the programmer should view memory. In a sequential program, memory can be thought of as being stable unless the program is actively modifying it. For a multithreaded program, however, it is better to think of all memory as spinning (being changed by other threads) unless the programmer does something explicit to freeze (or stabilize) it.

Sequential consistency is an intuitive model and, as the previous example shows, some of the concepts of sequential programs can be applied to it. It is also the model that gets implemented naturally on single-processor machines so, frankly, it is the only memory model most programmers have practical experience with. Unfortunately, for a true multiprocessor machine, this model is too restrictive to be implemented efficiently by memory hardware, and no commercial multiprocessor machines conform to it.


 Jestem w połowie czytania, strasznie długo to trwa, ale naprawdę warto.

What Every Dev Must Know About Multithreaded Apps

Naprawdę wyjątkowo dobrze napisany artykuł. Jeśli nawet nie masz czasu przeczytać go dziś to koniecznie zbookmarkuj na jakiś deszczowy dzień.


A simple condition variable primitive


Almost user mode - unfair mutex - Optex

Build a Richer Thread Synchronization Lock

Reader/Writer Locks and the ResourceLock Library

System.Collections.Concurrent Namespace - .NET 4.0

Bardzo ciekawe klasy. Zwłaszcza iteratory. Iterator dający możliwość iteracji w sposób wątkoodporny brzmi ciekawie.

I blog jednego z ludzi zajmujących się jak podejrzewam implementacją tych iteratorów.

Enumerating Concurrent Collections
Zwłaszcza to wylistowanie jest fajne:

  • Deleted items will always be seen

  • Deleted items will never be seen

  • Added items will always be seen if added at the end of the collection

  • Added items will always be seen if added wherever they are added

  • Added items will always never be seen

  • Moved items will never be seen twice

  • Moved items will be seen twice, if moved to the end of the collection

  • Moved items will always be seen, even if moved to the beginning of the collection

  • No more than N items will be seen, where N is the original length of the collection



ABA problem




Single Reader, Single Writer Fixed Sized Lookaside Dequeu


Rules for use:

  • Push() can only be called from one thread, but it doesn't need to be the same thread as Pop().

  • Pop() can only be called from one thread, but it doesn't need to be the same thread as Push().

  • max_count must be a power of two

  • Don't Pop() more than Count()

  • Check IsFull() or Count() before a Push()

Próby leczenia rynku derywatyw USA

Ciekawy wpis na blogu Piotra Kuczyńskiego. Polecam.


C++ 0x - Jednolita składnia dla funkcji


Tomasz Kulig