Patches contributed by Eötvös Lorand University
commit aea25401c3347d9f3a64ebdc81043be246a9f631
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:46 2007 +0200
sched: document nice levels
Document the design thinking behind nice levels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/Documentation/sched-nice-design.txt b/Documentation/sched-nice-design.txt
new file mode 100644
index 000000000000..e2bae5a577e3
--- /dev/null
+++ b/Documentation/sched-nice-design.txt
@@ -0,0 +1,108 @@
+This document explains the thinking about the revamped and streamlined
+nice-levels implementation in the new Linux scheduler.
+
+Nice levels were always pretty weak under Linux and people continuously
+pestered us to make nice +19 tasks use up much less CPU time.
+
+Unfortunately that was not that easy to implement under the old
+scheduler, (otherwise we'd have done it long ago) because nice level
+support was historically coupled to timeslice length, and timeslice
+units were driven by the HZ tick, so the smallest timeslice was 1/HZ.
+
+In the O(1) scheduler (in 2003) we changed negative nice levels to be
+much stronger than they were before in 2.4 (and people were happy about
+that change), and we also intentionally calibrated the linear timeslice
+rule so that nice +19 level would be _exactly_ 1 jiffy. To better
+understand it, the timeslice graph went like this (cheesy ASCII art
+alert!):
+
+
+ A
+ \ | [timeslice length]
+ \ |
+ \ |
+ \ |
+ \ |
+ \|___100msecs
+ |^ . _
+ | ^ . _
+ | ^ . _
+ -*----------------------------------*-----> [nice level]
+ -20 | +19
+ |
+ |
+
+So that if someone wanted to really renice tasks, +19 would give a much
+bigger hit than the normal linear rule would do. (The solution of
+changing the ABI to extend priorities was discarded early on.)
+
+This approach worked to some degree for some time, but later on with
+HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which
+we felt to be a bit excessive. Excessive _not_ because it's too small of
+a CPU utilization, but because it causes too frequent (once per
+millisec) rescheduling. (and would thus trash the cache, etc. Remember,
+this was long ago when hardware was weaker and caches were smaller, and
+people were running number crunching apps at nice +19.)
+
+So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the
+right minimal granularity - and this translates to 5% CPU utilization.
+But the fundamental HZ-sensitive property for nice+19 still remained,
+and we never got a single complaint about nice +19 being too _weak_ in
+terms of CPU utilization, we only got complaints about it (still) being
+too _strong_ :-)
+
+To sum it up: we always wanted to make nice levels more consistent, but
+within the constraints of HZ and jiffies and their nasty design level
+coupling to timeslices and granularity it was not really viable.
+
+The second (less frequent but still periodically occuring) complaint
+about Linux's nice level support was its assymetry around the origo
+(which you can see demonstrated in the picture above), or more
+accurately: the fact that nice level behavior depended on the _absolute_
+nice level as well, while the nice API itself is fundamentally
+"relative":
+
+ int nice(int inc);
+
+ asmlinkage long sys_nice(int increment)
+
+(the first one is the glibc API, the second one is the syscall API.)
+Note that the 'inc' is relative to the current nice level. Tools like
+bash's "nice" command mirror this relative API.
+
+With the old scheduler, if you for example started a niced task with +1
+and another task with +2, the CPU split between the two tasks would
+depend on the nice level of the parent shell - if it was at nice -10 the
+CPU split was different than if it was at +5 or +10.
+
+A third complaint against Linux's nice level support was that negative
+nice levels were not 'punchy enough', so lots of people had to resort to
+run audio (and other multimedia) apps under RT priorities such as
+SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation
+proof, and a buggy SCHED_FIFO app can also lock up the system for good.
+
+The new scheduler in v2.6.23 addresses all three types of complaints:
+
+To address the first complaint (of nice levels being not "punchy"
+enough), the scheduler was decoupled from 'time slice' and HZ concepts
+(and granularity was made a separate concept from nice levels) and thus
+it was possible to implement better and more consistent nice +19
+support: with the new scheduler nice +19 tasks get a HZ-independent
+1.5%, instead of the variable 3%-5%-9% range they got in the old
+scheduler.
+
+To address the second complaint (of nice levels not being consistent),
+the new scheduler makes nice(1) have the same CPU utilization effect on
+tasks, regardless of their absolute nice levels. So on the new
+scheduler, running a nice +10 and a nice 11 task has the same CPU
+utilization "split" between them as running a nice -5 and a nice -4
+task. (one will get 55% of the CPU, the other 45%.) That is why nice
+levels were changed to be "multiplicative" (or exponential) - that way
+it does not matter which nice level you start out from, the 'relative
+result' will always be the same.
+
+The third complaint (of negative nice levels not being "punchy" enough
+and forcing audio apps to run under the more dangerous SCHED_FIFO
+scheduling policy) is addressed by the new scheduler almost
+automatically: stronger negative nice levels are an automatic
+side-effect of the recalibrated dynamic range of nice levels.
commit fd8bb43e27bbba1b6d49552c3d588cf741dd44af
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:46 2007 +0200
sched: delta_exec accounting fix
small delta_exec accounting fix: increase delta_exec and increase
sum_exec_runtime even if the task is not on the runqueue anymore.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 037b8245e533..16511e9e5528 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -287,15 +287,15 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, u64 now)
struct load_weight *lw = &cfs_rq->load;
unsigned long load = lw->weight;
- if (unlikely(!load))
- return;
-
delta_exec = curr->delta_exec;
schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max));
curr->sum_exec_runtime += delta_exec;
cfs_rq->exec_clock += delta_exec;
+ if (unlikely(!load))
+ return;
+
delta_fair = calc_delta_fair(delta_exec, lw);
delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);
commit c5dcfe72aa8d26e924cccca9725a9f7be0d4ab01
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:46 2007 +0200
sched: clean up delta_mine
cleanup: delta_mine is an unsigned value.
no code impact:
text data bss dec hex filename
27823 2726 16 30565 7765 sched.o.before
27823 2726 16 30565 7765 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index edcb4b542bca..037b8245e533 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -283,8 +283,7 @@ add_wait_runtime(struct cfs_rq *cfs_rq, struct sched_entity *se, long delta)
static inline void
__update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, u64 now)
{
- unsigned long delta, delta_exec, delta_fair;
- long delta_mine;
+ unsigned long delta, delta_exec, delta_fair, delta_mine;
struct load_weight *lw = &cfs_rq->load;
unsigned long load = lw->weight;
commit 8e717b194ce3f3ac9e6acc63f66fe274cdf9cde1
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:46 2007 +0200
sched: schedule() speedup
speed up schedule(): share the 'now' parameter that deactivate_task()
was calculating internally.
( this also fixes the small accounting window between the deactivate
call and the pick_next_task() call. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
index 0112f63ad376..49f5b281c561 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -940,10 +940,9 @@ static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
/*
* deactivate_task - remove a task from the runqueue.
*/
-static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep)
+static void
+deactivate_task(struct rq *rq, struct task_struct *p, int sleep, u64 now)
{
- u64 now = rq_clock(rq);
-
if (p->state == TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible++;
@@ -2122,7 +2121,7 @@ void sched_exec(void)
static void pull_task(struct rq *src_rq, struct task_struct *p,
struct rq *this_rq, int this_cpu)
{
- deactivate_task(src_rq, p, 0);
+ deactivate_task(src_rq, p, 0, rq_clock(src_rq));
set_task_cpu(p, this_cpu);
activate_task(this_rq, p, 0);
/*
@@ -3446,13 +3445,14 @@ asmlinkage void __sched schedule(void)
spin_lock_irq(&rq->lock);
clear_tsk_need_resched(prev);
+ now = __rq_clock(rq);
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
unlikely(signal_pending(prev)))) {
prev->state = TASK_RUNNING;
} else {
- deactivate_task(rq, prev, 1);
+ deactivate_task(rq, prev, 1, now);
}
switch_count = &prev->nvcsw;
}
@@ -3460,7 +3460,6 @@ asmlinkage void __sched schedule(void)
if (unlikely(!rq->nr_running))
idle_balance(cpu, rq);
- now = __rq_clock(rq);
prev->sched_class->put_prev_task(rq, prev, now);
next = pick_next_task(rq, prev, now);
@@ -4220,7 +4219,7 @@ int sched_setscheduler(struct task_struct *p, int policy,
}
on_rq = p->se.on_rq;
if (on_rq)
- deactivate_task(rq, p, 0);
+ deactivate_task(rq, p, 0, rq_clock(rq));
oldprio = p->prio;
__setscheduler(rq, p, policy, param->sched_priority);
if (on_rq) {
@@ -4973,7 +4972,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
on_rq = p->se.on_rq;
if (on_rq)
- deactivate_task(rq_src, p, 0);
+ deactivate_task(rq_src, p, 0, rq_clock(rq_src));
set_task_cpu(p, dest_cpu);
if (on_rq) {
activate_task(rq_dest, p, 0);
@@ -5387,7 +5386,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */
rq = task_rq_lock(rq->idle, &flags);
- deactivate_task(rq, rq->idle, 0);
+ deactivate_task(rq, rq->idle, 0, rq_clock(rq));
rq->idle->static_prio = MAX_PRIO;
__setscheduler(rq, rq->idle, SCHED_NORMAL, 0);
rq->idle->sched_class = &idle_sched_class;
@@ -6626,7 +6625,7 @@ void normalize_rt_tasks(void)
on_rq = p->se.on_rq;
if (on_rq)
- deactivate_task(task_rq(p), p, 0);
+ deactivate_task(task_rq(p), p, 0, rq_clock(task_rq(p)));
__setscheduler(rq, p, SCHED_NORMAL, 0);
if (on_rq) {
activate_task(task_rq(p), p, 0);
commit 7bfd0485871df01764ca89d5679f128d870aef1a
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:46 2007 +0200
sched: uninline rq_clock()
uninline rq_clock() to save 263 bytes of code:
text data bss dec hex filename
39561 3642 24 43227 a8db sched.o.before
39298 3642 24 42964 a7d4 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
index 50c3587b06cb..0112f63ad376 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -353,7 +353,7 @@ static unsigned long long __rq_clock(struct rq *rq)
return clock;
}
-static inline unsigned long long rq_clock(struct rq *rq)
+static unsigned long long rq_clock(struct rq *rq)
{
int this_cpu = smp_processor_id();
commit f1a438d813d416fa9f4be4e6dbd10b54c5938d89
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:45 2007 +0200
sched: reorder update_cpu_load(rq) with the ->task_tick() call
Peter Williams suggested to flip the order of update_cpu_load(rq) with
the ->task_tick() call. This is a NOP for the current scheduler (the
two functions are independent of each other), ->task_tick() might
create some state for update_cpu_load() in the future (or in PlugSched).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
index 72bb9483d949..4680f52974e3 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3298,9 +3298,9 @@ void scheduler_tick(void)
struct task_struct *curr = rq->curr;
spin_lock(&rq->lock);
+ update_cpu_load(rq);
if (curr != rq->idle) /* FIXME: needed? */
curr->sched_class->task_tick(rq, curr);
- update_cpu_load(rq);
spin_unlock(&rq->lock);
#ifdef CONFIG_SMP
commit 0915c4e89d311948b67cdd4c183a2efbcafcc9f9
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 9 11:16:45 2007 +0200
sched: batch sleeper bonus
batch up the sleeper bonus sum a bit more. Anything below
sched-granularity is too small to make a practical difference
anyway.
this optimization reduces the math in high-frequency scheduling
scenarios.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 6f579ff5a9bc..9f401588d509 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -300,7 +300,7 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, u64 now)
delta_fair = calc_delta_fair(delta_exec, lw);
delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);
- if (cfs_rq->sleeper_bonus > sysctl_sched_stat_granularity) {
+ if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) {
delta = calc_delta_mine(cfs_rq->sleeper_bonus,
curr->load.weight, lw);
if (unlikely(delta > cfs_rq->sleeper_bonus))
commit 5845b677cf7f64a0f104609e1dfe02a439f69f71
Author: Ingo Molnar <mingo@elte.hu>
Date: Tue Jul 31 19:07:02 2007 -0500
atl1: use spin_trylock_irqsave()
use the simpler spin_trylock_irqsave() API to get the adapter lock.
[ this is also a fix for -rt where adapter->lock is a sleeping lock. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c
index 56f6389a300e..3c1984ecf36c 100644
--- a/drivers/net/atl1/atl1_main.c
+++ b/drivers/net/atl1/atl1_main.c
@@ -1704,10 +1704,8 @@ static int atl1_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
}
}
- local_irq_save(flags);
- if (!spin_trylock(&adapter->lock)) {
+ if (!spin_trylock_irqsave(&adapter->lock, flags)) {
/* Can't get lock - tell upper layer to requeue */
- local_irq_restore(flags);
dev_printk(KERN_DEBUG, &adapter->pdev->dev, "tx locked\n");
return NETDEV_TX_LOCKED;
}
commit 94c18227d1e3f02de5b345bd3cd5c960214dc9c8
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 2 17:41:40 2007 +0200
[PATCH] sched: reduce task_struct size
more task_struct size reduction, by moving the debugging/instrumentation
fields to under CONFIG_SCHEDSTATS:
(i386, nodebug):
size
----
pre-CFS 1328
CFS 1472
CFS+patch 1376
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c9e0c2a6a950..17249fae5014 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -904,23 +904,28 @@ struct sched_entity {
struct rb_node run_node;
unsigned int on_rq;
+ u64 exec_start;
+ u64 sum_exec_runtime;
u64 wait_start_fair;
+ u64 sleep_start_fair;
+
+#ifdef CONFIG_SCHEDSTATS
u64 wait_start;
- u64 exec_start;
+ u64 wait_max;
+ s64 sum_wait_runtime;
+
u64 sleep_start;
- u64 sleep_start_fair;
- u64 block_start;
u64 sleep_max;
+ s64 sum_sleep_runtime;
+
+ u64 block_start;
u64 block_max;
u64 exec_max;
- u64 wait_max;
- u64 last_ran;
- u64 sum_exec_runtime;
- s64 sum_wait_runtime;
- s64 sum_sleep_runtime;
unsigned long wait_runtime_overruns;
unsigned long wait_runtime_underruns;
+#endif
+
#ifdef CONFIG_FAIR_GROUP_SCHED
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
commit 6cfb0d5d06bea2b8791f32145eae539d524e5f6c
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Aug 2 17:41:40 2007 +0200
[PATCH] sched: reduce debug code
move the rest of the debugging/instrumentation code to under
CONFIG_SCHEDSTATS too. This reduces code size and speeds code up:
text data bss dec hex filename
33044 4122 28 37194 914a sched.o.before
32708 4122 28 36858 8ffa sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
index a9d374061a46..72bb9483d949 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -983,18 +983,21 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
u64 clock_offset, fair_clock_offset;
clock_offset = old_rq->clock - new_rq->clock;
- fair_clock_offset = old_rq->cfs.fair_clock -
- new_rq->cfs.fair_clock;
- if (p->se.wait_start)
- p->se.wait_start -= clock_offset;
+ fair_clock_offset = old_rq->cfs.fair_clock - new_rq->cfs.fair_clock;
+
if (p->se.wait_start_fair)
p->se.wait_start_fair -= fair_clock_offset;
+ if (p->se.sleep_start_fair)
+ p->se.sleep_start_fair -= fair_clock_offset;
+
+#ifdef CONFIG_SCHEDSTATS
+ if (p->se.wait_start)
+ p->se.wait_start -= clock_offset;
if (p->se.sleep_start)
p->se.sleep_start -= clock_offset;
if (p->se.block_start)
p->se.block_start -= clock_offset;
- if (p->se.sleep_start_fair)
- p->se.sleep_start_fair -= fair_clock_offset;
+#endif
__set_task_cpu(p, new_cpu);
}
@@ -1555,17 +1558,19 @@ int fastcall wake_up_state(struct task_struct *p, unsigned int state)
static void __sched_fork(struct task_struct *p)
{
p->se.wait_start_fair = 0;
- p->se.wait_start = 0;
p->se.exec_start = 0;
p->se.sum_exec_runtime = 0;
p->se.delta_exec = 0;
p->se.delta_fair_run = 0;
p->se.delta_fair_sleep = 0;
p->se.wait_runtime = 0;
+ p->se.sleep_start_fair = 0;
+
+#ifdef CONFIG_SCHEDSTATS
+ p->se.wait_start = 0;
p->se.sum_wait_runtime = 0;
p->se.sum_sleep_runtime = 0;
p->se.sleep_start = 0;
- p->se.sleep_start_fair = 0;
p->se.block_start = 0;
p->se.sleep_max = 0;
p->se.block_max = 0;
@@ -1573,6 +1578,7 @@ static void __sched_fork(struct task_struct *p)
p->se.wait_max = 0;
p->se.wait_runtime_overruns = 0;
p->se.wait_runtime_underruns = 0;
+#endif
INIT_LIST_HEAD(&p->run_list);
p->se.on_rq = 0;
@@ -6579,12 +6585,14 @@ void normalize_rt_tasks(void)
do_each_thread(g, p) {
p->se.fair_key = 0;
p->se.wait_runtime = 0;
+ p->se.exec_start = 0;
p->se.wait_start_fair = 0;
+ p->se.sleep_start_fair = 0;
+#ifdef CONFIG_SCHEDSTATS
p->se.wait_start = 0;
- p->se.exec_start = 0;
p->se.sleep_start = 0;
- p->se.sleep_start_fair = 0;
p->se.block_start = 0;
+#endif
task_rq(p)->cfs.fair_clock = 0;
task_rq(p)->clock = 0;
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 0eca442b7792..1c61e5315ad2 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -44,11 +44,16 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p, u64 now)
(long long)p->se.wait_runtime,
(long long)(p->nvcsw + p->nivcsw),
p->prio,
+#ifdef CONFIG_SCHEDSTATS
(long long)p->se.sum_exec_runtime,
(long long)p->se.sum_wait_runtime,
(long long)p->se.sum_sleep_runtime,
(long long)p->se.wait_runtime_overruns,
- (long long)p->se.wait_runtime_underruns);
+ (long long)p->se.wait_runtime_underruns
+#else
+ 0LL, 0LL, 0LL, 0LL, 0LL
+#endif
+ );
}
static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu, u64 now)
@@ -171,7 +176,7 @@ static int sched_debug_show(struct seq_file *m, void *v)
u64 now = ktime_to_ns(ktime_get());
int cpu;
- SEQ_printf(m, "Sched Debug Version: v0.05, %s %.*s\n",
+ SEQ_printf(m, "Sched Debug Version: v0.05-v20, %s %.*s\n",
init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version);
@@ -235,21 +240,24 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
#define P(F) \
SEQ_printf(m, "%-25s:%20Ld\n", #F, (long long)p->F)
- P(se.wait_start);
+ P(se.wait_runtime);
P(se.wait_start_fair);
P(se.exec_start);
- P(se.sleep_start);
P(se.sleep_start_fair);
+ P(se.sum_exec_runtime);
+
+#ifdef CONFIG_SCHEDSTATS
+ P(se.wait_start);
+ P(se.sleep_start);
P(se.block_start);
P(se.sleep_max);
P(se.block_max);
P(se.exec_max);
P(se.wait_max);
- P(se.wait_runtime);
P(se.wait_runtime_overruns);
P(se.wait_runtime_underruns);
P(se.sum_wait_runtime);
- P(se.sum_exec_runtime);
+#endif
SEQ_printf(m, "%-25s:%20Ld\n",
"nr_switches", (long long)(p->nvcsw + p->nivcsw));
P(se.load.weight);
@@ -269,7 +277,9 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
void proc_sched_set_task(struct task_struct *p)
{
+#ifdef CONFIG_SCHEDSTATS
p->se.sleep_max = p->se.block_max = p->se.exec_max = p->se.wait_max = 0;
p->se.wait_runtime_overruns = p->se.wait_runtime_underruns = 0;
+#endif
p->se.sum_exec_runtime = 0;
}
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5bf7285ad02c..6f579ff5a9bc 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -349,7 +349,7 @@ static inline void
update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
{
se->wait_start_fair = cfs_rq->fair_clock;
- se->wait_start = now;
+ schedstat_set(se->wait_start, now);
}
/*
@@ -447,7 +447,7 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
}
se->wait_start_fair = 0;
- se->wait_start = 0;
+ schedstat_set(se->wait_start, 0);
}
static inline void
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index ade20dc422f1..002fcf8d3f64 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -18,8 +18,8 @@ static inline void update_curr_rt(struct rq *rq, u64 now)
delta_exec = now - curr->se.exec_start;
if (unlikely((s64)delta_exec < 0))
delta_exec = 0;
- if (unlikely(delta_exec > curr->se.exec_max))
- curr->se.exec_max = delta_exec;
+
+ schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec));
curr->se.sum_exec_runtime += delta_exec;
curr->se.exec_start = now;