Patches contributed by Eötvös Lorand University


commit e4233dec749a3519069d9390561b5636a75c7579
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 26 00:56:55 2007 -0800

    [PATCH] ACPI: fix cpufreq regression
    
    Recently cpufreq support on my laptop (Lenovo T60) broke completely: when
    it's plugged into AC it would never go higher than 1 GHz - neither 1.3 GHz
    nor 1.83 GHz is possible - no matter which governor (userspace, speed or
    ondemand) is used.
    
    After some cpufreq debugging i tracked the regression back to the following
    (totally correct) bug-fix commit:
    
       commit 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e
       Author: Dave Jones <davej@redhat.com>
       Date:   Wed Nov 22 20:42:01 2006 -0500
    
        [PATCH] Correct bound checking from the value returned from _PPC method.
    
    This bugfix, which makes other laptops work, made a previously hidden
    (BIOS) bug visible on my laptop.
    
    The bug is the following: if the _PPC (Performance Present Capabilities)
    optional ACPI object is queried /after/ bootup then the BIOS reports an
    incorrect value of '2'.
    
    My laptop (Lenovo T60) has the following performance states supported:
    
       0: 1833000
       1: 1333000
       2: 1000000
    
    Per ACPI specification, a _PPC value of '0' means that all 3 performance
    states are usable.  A _PPC value of '1' means states 1 ..  2 are usable, a
    value of '2' means only state '2' (slowest) is usable.
    
    now, the _PPC object is optional, and it also comes with notification.
    Furthermore, when a CPU object is initialized, the _PPC object is
    initialized as well.  So the following evaluation of the _PPC object is
    superfluous:
    
     [<c028ba5f>] acpi_processor_get_platform_limit+0xa1/0xaf
     [<c028c040>] acpi_processor_register_performance+0x3b9/0x3ef
     [<c0111a85>] acpi_cpufreq_cpu_init+0xb7/0x596
     [<c03dab74>] cpufreq_add_dev+0x160/0x4a8
     [<c02bed90>] sysdev_driver_register+0x5a/0xa0
     [<c03d9c4c>] cpufreq_register_driver+0xb4/0x176
     [<c068ac08>] acpi_cpufreq_init+0xe5/0xeb
     [<c010056e>] init+0x14f/0x3dd
    
    And this is the point where my laptop's BIOS returns the incorrect value of
    '2'.  Note that it has not sent any notification event, so the value is
    probably not really intentional (possibly spurious), and Windows likely
    doesnt query it after bootup either.  Maybe the value is kept at '2'
    normally, and is only set to the real value when a true asynchronous event
    (such as AC plug event, battery switch, etc.) occurs.
    
    So i /think/ this is a grey area of the ACPI spec: per the letter of the
    spec the _PPC value only changes when notified, so there's no reason to
    query it after the system has booted up.  So in my opinion the best (and
    most compatible) strategy would be to do the change below, and to not
    evaluate the _PPC object in the acpi_processor_get_performance_info() call,
    but only evaluate it if _PPC is present during CPU object init, or if it's
    notified during an asynchronous event.  This change is more permissive than
    the previous logic, so it definitely shouldnt break any existing system.
    
    This also happens to fix my laptop, which is merrily chugging along at
    1.83 GHz now. Yay!
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Cc: Dave Jones <davej@redhat.com>
    Acked-by: Len Brown <lenb@kernel.org>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 5207f9e4b443..cbb6f0814ce2 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -322,10 +322,6 @@ static int acpi_processor_get_performance_info(struct acpi_processor *pr)
 	if (result)
 		return result;
 
-	result = acpi_processor_get_platform_limit(pr);
-	if (result)
-		return result;
-
 	return 0;
 }
 

commit 1b5180b65122666a36a1a232b7b9b38b21a9dcdd
Author: Ingo Molnar <mingo@elte.hu>
Date:   Tue Jan 23 10:45:50 2007 +0100

    [PATCH] notifiers: fix blocking_notifier_call_chain() scalability
    
    while lock-profiling the -rt kernel i noticed weird contention during
    mmap-intense workloads, and the tracer showed the following gem, in one
    of our MM hotpaths:
    
     threaded-2771  1....   65us : sys_munmap (sysenter_do_call)
     threaded-2771  1....   66us : profile_munmap (sys_munmap)
     threaded-2771  1....   66us : blocking_notifier_call_chain (profile_munmap)
     threaded-2771  1....   66us : rt_down_read (blocking_notifier_call_chain)
    
    ouch! a global rw-semaphore taken in one of the most performance-
    sensitive codepaths of the kernel.  And i dont even have oprofile
    enabled! All distro kernels have CONFIG_PROFILING enabled, so this
    scalability problem affects the majority of Linux users.
    
    The fix is to enhance blocking_notifier_call_chain() to only take the
    lock if there appears to be work on the call-chain.
    
    With this patch applied i get nicely saturated system, and much higher
    munmap performance, on SMP systems.
    
    And as a bonus this also fixes a similar scalability bottleneck in the
    thread-exit codepath: profile_task_exit() ...
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/kernel/sys.c b/kernel/sys.c
index c7675c1bfdf2..6e2101dec0fc 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -323,11 +323,18 @@ EXPORT_SYMBOL_GPL(blocking_notifier_chain_unregister);
 int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
 		unsigned long val, void *v)
 {
-	int ret;
+	int ret = NOTIFY_DONE;
 
-	down_read(&nh->rwsem);
-	ret = notifier_call_chain(&nh->head, val, v);
-	up_read(&nh->rwsem);
+	/*
+	 * We check the head outside the lock, but if this access is
+	 * racy then it does not matter what the result of the test
+	 * is, we re-check the list after having taken the lock anyway:
+	 */
+	if (rcu_dereference(nh->head)) {
+		down_read(&nh->rwsem);
+		ret = notifier_call_chain(&nh->head, val, v);
+		up_read(&nh->rwsem);
+	}
 	return ret;
 }
 

commit 0dbe5a111382fd1320ff4b1d889e5b8c41290619
Author: Ingo Molnar <mingo@elte.hu>
Date:   Mon Jan 22 20:40:36 2007 -0800

    [PATCH] paravirt: mark the paravirt_ops export internal
    
    The paravirt subsystem is still in flux so all exports from it are
    definitely internal use only.  The APIs around this /will/ change.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Cc: Andi Kleen <ak@suse.de>
    Cc: Zachary Amsden <zach@vmware.com>
    Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
    Acked-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/arch/i386/kernel/paravirt.c b/arch/i386/kernel/paravirt.c
index 3dceab5828f1..e55fd05da0f5 100644
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -566,4 +566,11 @@ struct paravirt_ops paravirt_ops = {
 	.irq_enable_sysexit = native_irq_enable_sysexit,
 	.iret = native_iret,
 };
-EXPORT_SYMBOL(paravirt_ops);
+
+/*
+ * NOTE: CONFIG_PARAVIRT is experimental and the paravirt_ops
+ * semantics are subject to change. Hence we only do this
+ * internal-only export of this, until it gets sorted out and
+ * all lowlevel CPU ops used by modules are separately exported.
+ */
+EXPORT_SYMBOL_GPL(paravirt_ops);

commit 07031e14c1127fc7e1a5b98dfcc59f434e025104
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Jan 10 23:15:38 2007 -0800

    [PATCH] KVM: add VM-exit profiling
    
    This adds the profile=kvm boot option, which enables KVM to profile VM
    exits.
    
    Use: "readprofile -m ./System.map | sort -n" to see the resulting
    output:
    
       [...]
       18246 serial_out                               148.3415
       18945 native_flush_tlb                         378.9000
       23618 serial_in                                212.7748
       29279 __spin_unlock_irq                        622.9574
       43447 native_apic_write                        2068.9048
       52702 enable_8259A_irq                         742.2817
       54250 vgacon_scroll                             89.3740
       67394 ide_inb                                  6126.7273
       79514 copy_page_range                           98.1654
       84868 do_wp_page                                86.6000
      140266 pit_read                                 783.6089
      151436 ide_outb                                 25239.3333
      152668 native_io_delay                          21809.7143
      174783 mask_and_ack_8259A                       783.7803
      362404 native_set_pte_at                        36240.4000
     1688747 total                                      0.5009
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: Avi Kivity <avi@qumranet.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index ccc06b1b91b5..714f6a7841cd 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
+#include <linux/profile.h>
 #include <asm/desc.h>
 
 #include "kvm_svm.h"
@@ -1558,6 +1559,13 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 
 	reload_tss(vcpu);
 
+	/*
+	 * Profile KVM exit RIPs:
+	 */
+	if (unlikely(prof_on == KVM_PROFILING))
+		profile_hit(KVM_PROFILING,
+			(void *)(unsigned long)vcpu->svm->vmcb->save.rip);
+
 	stgi();
 
 	kvm_reput_irq(vcpu);
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index d4701cb4c654..ce219e3f557f 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -21,6 +21,7 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/profile.h>
 #include <asm/io.h>
 #include <asm/desc.h>
 
@@ -1859,6 +1860,12 @@ static int vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 #endif
 
+	/*
+	 * Profile KVM exit RIPs:
+	 */
+	if (unlikely(prof_on == KVM_PROFILING))
+		profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP));
+
 	kvm_run->exit_type = 0;
 	if (fail) {
 		kvm_run->exit_type = KVM_EXIT_TYPE_FAIL_ENTRY;
diff --git a/include/linux/profile.h b/include/linux/profile.h
index 5670b340c4ef..eec48f5f9348 100644
--- a/include/linux/profile.h
+++ b/include/linux/profile.h
@@ -15,6 +15,7 @@ extern int prof_on __read_mostly;
 #define CPU_PROFILING	1
 #define SCHED_PROFILING	2
 #define SLEEP_PROFILING	3
+#define KVM_PROFILING	4
 
 struct proc_dir_entry;
 struct pt_regs;
diff --git a/kernel/profile.c b/kernel/profile.c
index 11550b2290b6..a6574a18514e 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -40,7 +40,10 @@ int (*timer_hook)(struct pt_regs *) __read_mostly;
 
 static atomic_t *prof_buffer;
 static unsigned long prof_len, prof_shift;
+
 int prof_on __read_mostly;
+EXPORT_SYMBOL_GPL(prof_on);
+
 static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
@@ -52,6 +55,7 @@ static int __init profile_setup(char * str)
 {
 	static char __initdata schedstr[] = "schedule";
 	static char __initdata sleepstr[] = "sleep";
+	static char __initdata kvmstr[] = "kvm";
 	int par;
 
 	if (!strncmp(str, sleepstr, strlen(sleepstr))) {
@@ -72,6 +76,15 @@ static int __init profile_setup(char * str)
 		printk(KERN_INFO
 			"kernel schedule profiling enabled (shift: %ld)\n",
 			prof_shift);
+	} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
+		prof_on = KVM_PROFILING;
+		if (str[strlen(kvmstr)] == ',')
+			str += strlen(kvmstr) + 1;
+		if (get_option(&str, &par))
+			prof_shift = par;
+		printk(KERN_INFO
+			"kernel KVM profiling enabled (shift: %ld)\n",
+			prof_shift);
 	} else if (get_option(&str, &par)) {
 		prof_shift = par;
 		prof_on = CPU_PROFILING;
@@ -318,6 +331,7 @@ void profile_hits(int type, void *__pc, unsigned int nr_hits)
 	local_irq_restore(flags);
 	put_cpu();
 }
+EXPORT_SYMBOL_GPL(profile_hits);
 
 static int __devinit profile_cpu_callback(struct notifier_block *info,
 					unsigned long action, void *__cpu)

commit 68a99f6d37aa65e848e09ec6ea52848e93bd5de2
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 5 16:36:59 2007 -0800

    [PATCH] KVM: Simplify mmu_alloc_roots()
    
    Small optimization/cleanup:
    
        page == page_header(page->page_hpa)
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Avi Kivity <avi@qumranet.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 4118e504e647..c6f972914f08 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -820,9 +820,9 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
 		hpa_t root = vcpu->mmu.root_hpa;
 
 		ASSERT(!VALID_PAGE(root));
-		root = kvm_mmu_get_page(vcpu, root_gfn, 0,
-					PT64_ROOT_LEVEL, 0, NULL)->page_hpa;
-		page = page_header(root);
+		page = kvm_mmu_get_page(vcpu, root_gfn, 0,
+					PT64_ROOT_LEVEL, 0, NULL);
+		root = page->page_hpa;
 		++page->root_count;
 		vcpu->mmu.root_hpa = root;
 		return;
@@ -836,10 +836,10 @@ static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
 			root_gfn = vcpu->pdptrs[i] >> PAGE_SHIFT;
 		else if (vcpu->mmu.root_level == 0)
 			root_gfn = 0;
-		root = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
+		page = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
 					PT32_ROOT_LEVEL, !is_paging(vcpu),
-					NULL)->page_hpa;
-		page = page_header(root);
+					NULL);
+		root = page->page_hpa;
 		++page->root_count;
 		vcpu->mmu.pae_root[i] = root | PT_PRESENT_MASK;
 	}

commit d21225ee2b6fa9f7669526927f2e0bedebd90940
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 5 16:36:59 2007 -0800

    [PATCH] KVM: Make loading cr3 more robust
    
    Prevent the guest's loading of a corrupt cr3 (pointing at no guest phsyical
    page) from crashing the host.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Avi Kivity <avi@qumranet.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0675d3e51692..67c1154960f0 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -463,7 +463,19 @@ void set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	vcpu->cr3 = cr3;
 	spin_lock(&vcpu->kvm->lock);
-	vcpu->mmu.new_cr3(vcpu);
+	/*
+	 * Does the new cr3 value map to physical memory? (Note, we
+	 * catch an invalid cr3 even in real-mode, because it would
+	 * cause trouble later on when we turn on paging anyway.)
+	 *
+	 * A real CPU would silently accept an invalid cr3 and would
+	 * attempt to use it - with largely undefined (and often hard
+	 * to debug) behavior on the guest side.
+	 */
+	if (unlikely(!gfn_to_memslot(vcpu->kvm, cr3 >> PAGE_SHIFT)))
+		inject_gp(vcpu);
+	else
+		vcpu->mmu.new_cr3(vcpu);
 	spin_unlock(&vcpu->kvm->lock);
 }
 EXPORT_SYMBOL_GPL(set_cr3);

commit 7f7417d67ea6c1538469e3ea005484e807642c0a
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 5 16:36:57 2007 -0800

    [PATCH] KVM: Avoid oom on cr3 switch
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Avi Kivity <avi@qumranet.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 6e7381bc2218..4118e504e647 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -905,6 +905,8 @@ static void paging_new_cr3(struct kvm_vcpu *vcpu)
 {
 	pgprintk("%s: cr3 %lx\n", __FUNCTION__, vcpu->cr3);
 	mmu_free_roots(vcpu);
+	if (unlikely(vcpu->kvm->n_free_mmu_pages < KVM_MIN_FREE_MMU_PAGES))
+		kvm_mmu_free_some_pages(vcpu);
 	mmu_alloc_roots(vcpu);
 	kvm_mmu_flush_tlb(vcpu);
 	kvm_arch_ops->set_cr3(vcpu, vcpu->mmu.root_hpa);

commit a75acf850ca80136a4f845cf9a7cd26e7465c1f4
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 5 16:36:29 2007 -0800

    [PATCH] profiling: fix sched profiling typo
    
    Fix sched profiling typo, introduced by the sleep profiling patch.  This
    bug caused profile=sched to not work.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/kernel/profile.c b/kernel/profile.c
index fb5e03d57e9d..11550b2290b6 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -63,7 +63,7 @@ static int __init profile_setup(char * str)
 		printk(KERN_INFO
 			"kernel sleep profiling enabled (shift: %ld)\n",
 			prof_shift);
-	} else if (!strncmp(str, sleepstr, strlen(sleepstr))) {
+	} else if (!strncmp(str, schedstr, strlen(schedstr))) {
 		prof_on = SCHED_PROFILING;
 		if (str[strlen(schedstr)] == ',')
 			str += strlen(schedstr) + 1;

commit d3b2c33860d4acdfe3ac29b40b03e655eb8d1e2c
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 5 16:36:23 2007 -0800

    [PATCH] KVM: Use raw_smp_processor_id() instead of smp_processor_id() where applicable
    
    Signed-off-by: Avi Kivity <avi@qumranet.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index fbab07af6574..2d204fd45972 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -116,7 +116,7 @@ static void vmcs_clear(struct vmcs *vmcs)
 static void __vcpu_clear(void *arg)
 {
 	struct kvm_vcpu *vcpu = arg;
-	int cpu = smp_processor_id();
+	int cpu = raw_smp_processor_id();
 
 	if (vcpu->cpu == cpu)
 		vmcs_clear(vcpu->vmcs);
@@ -541,7 +541,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
 
 static struct vmcs *alloc_vmcs(void)
 {
-	return alloc_vmcs_cpu(smp_processor_id());
+	return alloc_vmcs_cpu(raw_smp_processor_id());
 }
 
 static void free_vmcs(struct vmcs *vmcs)

commit 965b58a550b6f84815cb555e6abb953e863f1610
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jan 5 16:36:23 2007 -0800

    [PATCH] KVM: Fix GFP_KERNEL alloc in atomic section bug
    
    KVM does kmalloc() in an atomic section while having preemption disabled via
    vcpu_load().  Fix this by moving the ->*_msr setup from the vcpu_setup method
    to the vcpu_create method.
    
    (This is also a small speedup for setting up a vcpu, which can in theory be
    more frequent than the vcpu_create method).
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Avi Kivity <avi@qumranet.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index d0a2c2d5342a..fbab07af6574 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1094,14 +1094,6 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
 	rdmsrl(MSR_IA32_SYSENTER_EIP, a);
 	vmcs_writel(HOST_IA32_SYSENTER_EIP, a);   /* 22.2.3 */
 
-	ret = -ENOMEM;
-	vcpu->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
-	if (!vcpu->guest_msrs)
-		goto out;
-	vcpu->host_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
-	if (!vcpu->host_msrs)
-		goto out_free_guest_msrs;
-
 	for (i = 0; i < NR_VMX_MSR; ++i) {
 		u32 index = vmx_msr_index[i];
 		u32 data_low, data_high;
@@ -1155,8 +1147,6 @@ static int vmx_vcpu_setup(struct kvm_vcpu *vcpu)
 
 	return 0;
 
-out_free_guest_msrs:
-	kfree(vcpu->guest_msrs);
 out:
 	return ret;
 }
@@ -1906,13 +1896,33 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vmcs *vmcs;
 
+	vcpu->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!vcpu->guest_msrs)
+		return -ENOMEM;
+
+	vcpu->host_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!vcpu->host_msrs)
+		goto out_free_guest_msrs;
+
 	vmcs = alloc_vmcs();
 	if (!vmcs)
-		return -ENOMEM;
+		goto out_free_msrs;
+
 	vmcs_clear(vmcs);
 	vcpu->vmcs = vmcs;
 	vcpu->launched = 0;
+
 	return 0;
+
+out_free_msrs:
+	kfree(vcpu->host_msrs);
+	vcpu->host_msrs = NULL;
+
+out_free_guest_msrs:
+	kfree(vcpu->guest_msrs);
+	vcpu->guest_msrs = NULL;
+
+	return -ENOMEM;
 }
 
 static struct kvm_arch_ops vmx_arch_ops = {