Power efficiency problems with AMD CPUs on Linux 6.5.5
An unusual noticeable humming was coming from my home infrastructure machines during the past day. Usually these machines are basically silent since I disabled CPU boosting as part of noise and power optimisation. But not today.
Starting with some metrics, it became obvious, that while the system load didn’t change over the past few days, starting from Monday 00:00+UTC local time the average CPU temperature had increased by 5°C across all machines.
Monday 00:00+UTC is the time, my system upgrades run. Best contender for less efficient CPU without a change in system load is the kernel. And it was a kernel update. Fedora switched from Kernel version 6.4.15 to 6.5.5. Along with this change came a new CPU frequency driver: amd-pstate-epp
.
Obviously the idea of this driver was to make power usage more efficient, and it maybe does. If you didn’t optimise anything before.
Disabling CPU boosting
Up until the mentioned Monday all my machines would apply echo 0 > /sys/devices/system/cpu/cpufreq/boost
5 minutes after booting. This way all boot processes could use the full CPU boost, and afterwards it would clock down to its maximum base clock (3.3GHz
/3.4GHz
), making the CPU not only nice and quite, but also use way less power.
With these optimisations applied across 3 machines with a quite moderate workload, the CPU’s energy consumption was 8.6 watts on average per machine over 28 days.
“Power saving” with the new AMD driver
With the new AMD driver the “boost control knob”, that I was using before was gone and the power profile changed. The new driver implements Energy Profile Preference (EPP) by which you indicate the CPU which power profile (performance
, blance_performance
, balance_power
, power
) for the individual CPU core to use.
The idea behind EPP is that the CPU knows best when to boost and when to leave performance on the table. The default EPP was set to performance
, so the CPU was boosting as much as needed. As a result the CPU happily boosting at least one core up to 4.4GHz
almost all the time. The power draw for the period it was active was around 18.7 Watt per machine.
But adjusting the EPP to balance_power
and later power
didn’t change really change the behaviour from what I could tell and my metrics continued to bounce around the same values as before. So at best it provided marginal power savings.
# Get available power profiles for CPU core 0
cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences
# Set power profile for CPU core 0
echo "power" > /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference
Bringing back the “boost control knob”
Thankfully AMD implemented another mode for their CPU frequency driver. The default is active
, which is the mode using EPP. But there is also a mode called passive
, which brings control back to the kernel-land and along with that the regular controls like the “boost control knob”.
In order to bring back the previous behaviour one just needs to adjust the driver status to passive
and disable boosting again:
# Disable the new driver
echo "passive" > /sys/devices/system/cpu/amd_pstate/status
# Disable CPU boost again
echo 0 > /sys/devices/system/cpu/cpufreq/boost
Within a few seconds my machines were silent again and power draw is back to the old level.
Conclusion
If you run an AMD machine and you aren’t in for compute heavy workloads that need every GHz they can get, it might be worth to look into disabling CPU boosting. It has a quite significant impact on power draw.
And if you wonder how these journeys go, this article has been accompanied by a story on mastodon.
PS: While investigating, I always thought about this happening in a data centre, where your power costs double overnight due to a kernel update. Must be a fun day.