Determinism vs. Throughput
As enterprise distributions continue to roll out their respective real-time offerings, the benefits (and drawbacks) of real-time are becoming available to the masses. The expectations for these distros are often a mixture of general purpose computing performance and real-time determinism and low-latency. Unfortunately, meeting one of these often comes at the expense of the other. Deterministic behavior requires more overhead in the OS, including some lock serialization and additional context switching, which naturally degrades throughput. Two real-time Linux developers, Steven Rostedt of Red Hat and Gregory Haskins of Novell have been actively working on two distinct approaches to minimizing the effects of determinism on throughput.
Steven's rwlock-multi patches [1] (2.6.24.X-rt8) attempt to reduce the bottleneck of the conversion of the rwlock to an rtmutex in the PREEMPT_RT [2] tree. In order to provide Priority Inheritance [3] support, the rwlock was converted to an rtmutex, limiting it to one owner. This bottleneck hit some subsystems harder than others, networking was particularly affected. The rwlock-multi patch relieves the serialization by allowing a fixed number (configurable) of readers to hold the lock at any one time, bounding the wait time to a higher priority writer. As would be expected, if a high priority writer is blocked on an rwlock, it will boost the priority of all the readers, and prevent any new lower priority readers from getting the lock (as soon as the existing readers release the lock, the writer gets it). This approach certainly increases the latency a high priority writer may see when acquiring a contended lock, but the increase is bounded, and therefor still deterministic. The benefit is seen by all subsystems that leveraged the rwlock for additional throughput, bringing the rt kernel closer in performance to the mainline kernel, while maintaining the necessary deterministic behavior.
Greg's adaptive locks [4] patches aim to minimize the impact of converting spinlocks to rtmutexes in the PREEMPT_RT tree. This conversion was made to ensure the highest priority task would get the lock using a locking mechanism that supports priority inheritance and priorty-ordered wait queues. The side-effects included a lot more context switches as tasks blocked on the lock rather than actively spinning. However, by design, spinlocks were always intended to be held for very short periods of time, the adaptive locks patch modifies the rtmutex to busy-wait for a short configurable period of time. Since the rtmutex is still being used, the patch maintains priority inheritance and preemption of the blocked task. Improvements of up to 500% were seen on certain I/O intensive workloads, without negatively impacting the deterministic behavior of the PREEMPT_RT patch.
These sorts of patches go a long way towards bringing low-latency computing to the masses, as the costs for determinism lessen.
1. New read/write locks for PI and multiple readers
2. CONFIG_PREEMPT_RT Patch
3. Priority Inversion Avoidance for Enterprise Real-Time Linux Systems
4. adaptive-locks v3
Adaptive locks spin-time not configurable yet
Would like to point out that it is true that it would be nice to have the spin times in adaptive spinlocks dynamically or worse case, statically configurable. But the version of the infrastructure in mainline-rt currently does not have any restrictions on the wait-time. This is the area being worked on. So, at the moment, for every spinlock, the process spins till it obtains the lock. But the reason why this does not affect the latenices is that the sleep is completely preemptable. So at the cost of lower priority tasks not being able to run while the higher prio task spins, the next highest prio task will preempt the spinning task.
