Patches contributed by Eötvös Lorand University
commit e7bc62b6b3aeaa8849f8383e0cfb7ca6c003adc6
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Dec 4 20:13:45 2008 +0100
performance counters: documentation
Add more documentation about performance counters.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/Documentation/perf-counters.txt b/Documentation/perf-counters.txt
new file mode 100644
index 000000000000..19033a0bb526
--- /dev/null
+++ b/Documentation/perf-counters.txt
@@ -0,0 +1,104 @@
+
+Performance Counters for Linux
+------------------------------
+
+Performance counters are special hardware registers available on most modern
+CPUs. These registers count the number of certain types of hw events: such
+as instructions executed, cachemisses suffered, or branches mis-predicted -
+without slowing down the kernel or applications. These registers can also
+trigger interrupts when a threshold number of events have passed - and can
+thus be used to profile the code that runs on that CPU.
+
+The Linux Performance Counter subsystem provides an abstraction of these
+hardware capabilities. It provides per task and per CPU counters, and
+it provides event capabilities on top of those.
+
+Performance counters are accessed via special file descriptors.
+There's one file descriptor per virtual counter used.
+
+The special file descriptor is opened via the perf_counter_open()
+system call:
+
+ int
+ perf_counter_open(u32 hw_event_type,
+ u32 hw_event_period,
+ u32 record_type,
+ pid_t pid,
+ int cpu);
+
+The syscall returns the new fd. The fd can be used via the normal
+VFS system calls: read() can be used to read the counter, fcntl()
+can be used to set the blocking mode, etc.
+
+Multiple counters can be kept open at a time, and the counters
+can be poll()ed.
+
+When creating a new counter fd, 'hw_event_type' is one of:
+
+ enum hw_event_types {
+ PERF_COUNT_CYCLES,
+ PERF_COUNT_INSTRUCTIONS,
+ PERF_COUNT_CACHE_REFERENCES,
+ PERF_COUNT_CACHE_MISSES,
+ PERF_COUNT_BRANCH_INSTRUCTIONS,
+ PERF_COUNT_BRANCH_MISSES,
+ };
+
+These are standardized types of events that work uniformly on all CPUs
+that implements Performance Counters support under Linux. If a CPU is
+not able to count branch-misses, then the system call will return
+-EINVAL.
+
+[ Note: more hw_event_types are supported as well, but they are CPU
+ specific and are enumerated via /sys on a per CPU basis. Raw hw event
+ types can be passed in as negative numbers. For example, to count
+ "External bus cycles while bus lock signal asserted" events on Intel
+ Core CPUs, pass in a -0x4064 event type value. ]
+
+The parameter 'hw_event_period' is the number of events before waking up
+a read() that is blocked on a counter fd. Zero value means a non-blocking
+counter.
+
+'record_type' is the type of data that a read() will provide for the
+counter, and it can be one of:
+
+ enum perf_record_type {
+ PERF_RECORD_SIMPLE,
+ PERF_RECORD_IRQ,
+ };
+
+a "simple" counter is one that counts hardware events and allows
+them to be read out into a u64 count value. (read() returns 8 on
+a successful read of a simple counter.)
+
+An "irq" counter is one that will also provide an IRQ context information:
+the IP of the interrupted context. In this case read() will return
+the 8-byte counter value, plus the Instruction Pointer address of the
+interrupted context.
+
+The 'pid' parameter allows the counter to be specific to a task:
+
+ pid == 0: if the pid parameter is zero, the counter is attached to the
+ current task.
+
+ pid > 0: the counter is attached to a specific task (if the current task
+ has sufficient privilege to do so)
+
+ pid < 0: all tasks are counted (per cpu counters)
+
+The 'cpu' parameter allows a counter to be made specific to a full
+CPU:
+
+ cpu >= 0: the counter is restricted to a specific CPU
+ cpu == -1: the counter counts on all CPUs
+
+Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.
+
+A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts
+events of that task and 'follows' that task to whatever CPU the task
+gets schedule to. Per task counters can be created by any user, for
+their own tasks.
+
+A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
+all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
+
commit b5aa97e83bcc31a96374d18f5452d53909a16c90
Merge: 218d11a8b071 4217458dafaa 5b3eec0c8003
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Dec 8 15:46:30 2008 +0100
Merge branches 'x86/signal' and 'x86/irq' into perfcounters/core
Merge these pending x86 tree changes into the perfcounters tree
to avoid conflicts.
diff --cc arch/x86/kernel/Makefile
index b62a7667828e,ef28c210ebf8,943fe6026c64..3d4346a73a8f
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@@@ -11,7 -11,6 -11,8 +11,8 @@@@ ifdef CONFIG_FUNCTION_TRACE
CFLAGS_REMOVE_tsc.o = -pg
CFLAGS_REMOVE_rtc.o = -pg
CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
+ CFLAGS_REMOVE_ftrace.o = -pg
++ CFLAGS_REMOVE_early_printk.o = -pg
endif
#
commit aa9c9b8c584a42a094202b7e0f63497e888f86a7
Merge: 87f7606591ae 218d11a8b071
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Dec 8 15:07:49 2008 +0100
Merge branch 'linus' into x86/quirks
commit 4d117c5c6b00254e51c61ff5b506ccaba21a5a03
Merge: 6c415b9234a8 43714539eab4
Author: Ingo Molnar <mingo@elte.hu>
Date: Mon Dec 8 13:52:00 2008 +0100
Merge branch 'sched/urgent' into sched/core
commit 970987beb9c99ca806edc464518d411cc399fb4d
Merge: faec2ec505d3 1fd8f2a3f9a9 feaf3848a813
Author: Ingo Molnar <mingo@elte.hu>
Date: Fri Dec 5 14:45:22 2008 +0100
Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core
diff --cc kernel/trace/trace.c
index 1bd9574404e5,1ca74c0cee6a,d86e3252f300..ea38652d631c
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@@@ -1165,97 -1165,86 -884,12 +1165,97 @@@@ function_trace_call_preempt_only(unsign
trace_function(tr, data, ip, parent_ip, flags, pc);
atomic_dec(&data->disabled);
- if (resched)
- preempt_enable_no_resched_notrace();
- else
- preempt_enable_notrace();
+ ftrace_preempt_enable(resched);
}
+static void
+function_trace_call(unsigned long ip, unsigned long parent_ip)
+{
+ struct trace_array *tr = &global_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ long disabled;
+ int cpu;
+ int pc;
+
+ if (unlikely(!ftrace_function_enabled))
+ return;
+
+ /*
+ * Need to use raw, since this must be called before the
+ * recursive protection is performed.
+ */
+ local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ data = tr->data[cpu];
+ disabled = atomic_inc_return(&data->disabled);
+
+ if (likely(disabled == 1)) {
+ pc = preempt_count();
+ trace_function(tr, data, ip, parent_ip, flags, pc);
+ }
+
+ atomic_dec(&data->disabled);
+ local_irq_restore(flags);
+}
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+int trace_graph_entry(struct ftrace_graph_ent *trace)
+{
+ struct trace_array *tr = &global_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ long disabled;
+ int cpu;
+ int pc;
+
++ if (!ftrace_trace_task(current))
++ return 0;
++
++ if (!ftrace_graph_addr(trace->func))
++ return 0;
++
+ local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ data = tr->data[cpu];
+ disabled = atomic_inc_return(&data->disabled);
+ if (likely(disabled == 1)) {
+ pc = preempt_count();
+ __trace_graph_entry(tr, data, trace, flags, pc);
+ }
++ /* Only do the atomic if it is not already set */
++ if (!test_tsk_trace_graph(current))
++ set_tsk_trace_graph(current);
+ atomic_dec(&data->disabled);
+ local_irq_restore(flags);
+
+ return 1;
+}
+
+void trace_graph_return(struct ftrace_graph_ret *trace)
+{
+ struct trace_array *tr = &global_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ long disabled;
+ int cpu;
+ int pc;
+
+ local_irq_save(flags);
+ cpu = raw_smp_processor_id();
+ data = tr->data[cpu];
+ disabled = atomic_inc_return(&data->disabled);
+ if (likely(disabled == 1)) {
+ pc = preempt_count();
+ __trace_graph_return(tr, data, trace, flags, pc);
+ }
++ if (!trace->depth)
++ clear_tsk_trace_graph(current);
+ atomic_dec(&data->disabled);
+ local_irq_restore(flags);
+}
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
static struct ftrace_ops trace_ops __read_mostly =
{
.func = function_trace_call,
diff --cc kernel/trace/trace.h
index b4b7b735184d,fce98898205a,8465ad052707..a71bbe0a3631
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@@@ -502,59 -504,17 -396,6 +504,59 @@@@ trace_vprintk(unsigned long ip, int dep
extern unsigned long trace_flags;
+/* Standard output formatting function used for function return traces */
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+extern enum print_line_t print_graph_function(struct trace_iterator *iter);
++
++#ifdef CONFIG_DYNAMIC_FTRACE
++/* TODO: make this variable */
++#define FTRACE_GRAPH_MAX_FUNCS 32
++extern int ftrace_graph_count;
++extern unsigned long ftrace_graph_funcs[FTRACE_GRAPH_MAX_FUNCS];
++
++static inline int ftrace_graph_addr(unsigned long addr)
++{
++ int i;
++
++ if (!ftrace_graph_count || test_tsk_trace_graph(current))
++ return 1;
++
++ for (i = 0; i < ftrace_graph_count; i++) {
++ if (addr == ftrace_graph_funcs[i])
++ return 1;
++ }
++
++ return 0;
++}
+#else
++static inline int ftrace_trace_addr(unsigned long addr)
++{
++ return 1;
++}
++static inline int ftrace_graph_addr(unsigned long addr)
++{
++ return 1;
++}
++#endif /* CONFIG_DYNAMIC_FTRACE */
++
++#else /* CONFIG_FUNCTION_GRAPH_TRACER */
+static inline enum print_line_t
+print_graph_function(struct trace_iterator *iter)
+{
+ return TRACE_TYPE_UNHANDLED;
+}
- #endif
++#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
++
++extern struct pid *ftrace_pid_trace;
++
++static inline int ftrace_trace_task(struct task_struct *task)
++{
++ if (ftrace_pid_trace)
++ return 1;
++
++ return test_tsk_trace_trace(task);
++}
+
/*
* trace_iterator_flags is an enumeration that defines bit
* positions into trace_flags that controls the output.
commit c0515566f3117c44b0572559bcc3cb00899b0910
Merge: 4385cecf1f58 061e41fdb504
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Dec 4 11:05:26 2008 +0100
Merge commit 'v2.6.28-rc7' into x86/cleanups
commit 6b2539302bee8e88c99e3c7d80c16a04dbe5e2ad
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Dec 4 09:18:28 2008 +0100
tracing: fix typo and missing inline function
Impact: fix build bugs
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8b81b4d727bd..b4b7b735184d 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -529,7 +529,11 @@ static inline int ftrace_graph_addr(unsigned long addr)
#else
static inline int ftrace_trace_addr(unsigned long addr)
{
- return 1
+ return 1;
+}
+static inline int ftrace_graph_addr(unsigned long addr)
+{
+ return 1;
}
#endif /* CONFIG_DYNAMIC_FTRACE */
commit b29144c317fb748dae6d72c0f88eda9d43165b8d
Merge: b8307db2477f e8e1abe92fd7 764f3b95131a
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Dec 4 09:07:44 2008 +0100
Merge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing/core
commit b8307db2477f9c551e54e0c7b643ea349a3349cd
Merge: f0461d0146ee 061e41fdb504
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Dec 4 09:07:19 2008 +0100
Merge commit 'v2.6.28-rc7' into tracing/core
commit cb9c34e6d090d376b77becaa5d29a65dec7f4272
Merge: 470c66239ef0 061e41fdb504
Author: Ingo Molnar <mingo@elte.hu>
Date: Thu Dec 4 08:52:14 2008 +0100
Merge commit 'v2.6.28-rc7' into core/locking