1. 现象

Since the dawn of time, the way Linux synchronizes to disk the data written to memory by processes (aka. background writeback) has sucked. When Linux writes all that data in the background, it should have little impact on foreground activity. That's the definition of background activity...But for a long as it can be remembered, heavy buffered writers have not behaved like that. For instance, if you do something like $ dd if=/dev/zero of=foo bs=1M count=10k, or try to copy files to USB storage, and then try and start a browser or any other large app, it basically won't start before the buffered writeback is done, and your desktop, or command shell, feels unreponsive. These problems happen because heavy writes -the kind of write activity caused by the background writeback- fill up the block layer, and other IO requests have to wait a lot to be attended (for more details, see the LWN article)
Linux 4.10 releasenote

2. 现状

在此问题社区patch以前,也经常遇到过,但是当时没在意。前段日子使用ubuntu桌面的时候,当打开chrome,firefox的tab很多事,就出现好多打开很慢的,同时GUI变灰进而变黑,thunderbird 脚本停止运行等现象。
在启动了一个virtualbox虚拟机占用20%系统内存情况下,问题就更严重了。

随便浏览了这系列patch,了解了其中基本原理,编译了4.12.4内核。系统使用起来显然比patch前流畅多了。

但是问题并没有消失。

今天编译swift源码的时候,刚开始的时候还能打开chrome浏览。但一段时间后,鼠标不能移动了,想关闭tab也做不到了。只得等待编译不那么忙或者其他,我最终进入系统。top发现系统负载都15了,firefox,chrome等等出现好多D状态(注意系统总计4个cpu)。这次的现象好像是编译导致其他软件的writeback都处于阻塞状态。

因此决定重新尝试理解这个patch,并试图优化或者改进它。

先看看其中的微调选型,然后把patch的过程走一遍,并优化。

3. 过程