>and each of the threads could 'step' at roughly the same rate: Here is an idea to consider. If the goal is to even out the progress of threads (rows-processed) by increasing the length of 'F'-runs, simply bumping up the active-thread's priority at 'S' and then reducing it again at last 'F' might work. Caveats: not portable, will not cut the mutex acquisition/release overhead, cannot 100% guarantee all 'F's for a 'S' will be done in one thread-run.