Programmatically limit CPU usage of certain processes
As some of my readers might know, I'm a committed KDE user. I love the freedom this desktop environment provides me with and its sheer versatility. Other people might think differently about that matter, but that's my opinion.
Even more of you might know that KDE has a semantic desktop implementation called Nepomuk and some might agree that, although it's a great thing in general, it has very often caused a lot of issues in the past. Especially the whole file indexing engine was very unstable and even unusable for many people, including me.
Now with KDE 4.10 the file indexer has undergone some major changes which made it pretty usable so I decided to switch it on again. It turned out that the first stage indexing works exceptionally well. It indexed about 60,000 files in my home directory in the blink of an eye.
Unfortunately, I had to realize that the second level indexing does not work so well. I remember Virtuoso often eating up all my CPU in the past. Now Virtuoso keeps quiet, but nepomukindexer
let's my workstation fly. It only starts indexing when my PC is idle, but for bigger files it keeps the CPU busy at a level of 100%, which is a pretty bad thing. There is already a Bug report about nepomukindexer
consuming too much CPU time on larger files, but I didn't want to wait for a fix.
Long story short: I thought of ways to automatically limit the CPU usage of certain processes (not necessarily only Nepomuk).
To accomplish this there is actually a pretty neat tool out there called cpulimit. It does exactly what I need. I give it a PID and a percentage value and it limits the CPU time available for that process. Unfortunately, it only works for processes that are currently running on the system and Nepomuk spawns a new process every time it starts the indexing process.
Therefore I needed a script that runs permanently in the background and automatically limits the CPU usage if necessary. After googling for some time I found a few, but they were more or less overkill for what I needed. So I decided to write my own. It's a dirty hack, but it works.
#!/bin/zsh
MAXIMUM_ALLOWED_CPU=10
LIMIT_THRESHOLD=80
INTERVAL=10
PROCESSES_TO_WATCH=(nepomukindexer)
while sleep $INTERVAL; do
for process_name in $PROCESSES_TO_WATCH; do
pid_list=($(pgrep $process_name))
# Don't do anything if no running process found
[ ${#pid_list[@]} -le 0 ] && continue
for pid in $pid_list; do
process_info=$(top -b -n 1 -p $pid | tail -n 1)
# Prevent race condition
echo $process_info | grep -q "^\s*${pid} " || continue
typeset -i cpu_usage=$(echo $process_info | sed 's/ \+/ /g' | sed 's/^ \+//' | cut -d ' ' -f 9)
if [ $cpu_usage -gt $LIMIT_THRESHOLD ]; then
echo "Limiting CPU usage of process $pid ($process_name)..."
cpulimit -p $pid -l $MAXIMUM_ALLOWED_CPU &
fi
done
done
done
I thought I'd share it here, maybe someone of you finds it useful too. The variable $PROCESSESTOWATCH
is an array of all program names this script should watch.
Once started, the script looks every $INTERVAL
seconds for new processes in the list being active on the CPU for more than $LIMITTHRESHOLD
percent of the time and limits them to $MAXIMUMALLOWED_CPU
percent.
I start this script automatically when I log into KDE to bring Nepomuk to terms when it starts turning my PC into a central heating system again.
Update Mar 5th, 7:10 P.M. UTC+1:
I was asked why I don't use Control Groups, which are in fact the proper way of limiting resources for specific tasks on Linux. Well, the elegant answer would be that cgroups only work on Linux, but not on other *nix systems like, e.g., BSD. But okay, that doesn't really count since this is a Linux blog.
The real answer is that cpulimit is just dead-simple and every user can run the above script. Cgroups, however, require root privileges and probably some deeper understanding of how Linux processes work in general. The cpulimit script can easily be started automatically with your desktop session and all the stuff that comes with it is strictly local. The price is a larger overhead, but as long as you don't use it for too many processes and with too short sleep intervals, you shouldn't notice it. If you need a general solution for coordinating resource usage, though, you should of course use cgroups instead.
Update Mar 7th, 5:00 P.M. UTC+1
I updated the script slightly to prevent race conditions which lead to script termination.