commit 05580ec65bc245f55bb19ca11ccf8bd4aa1a8bc1 Author: Ben Hutchings Date: Sun Jan 7 01:46:55 2018 +0000 Linux 3.2.98 commit b0b7893b1cde5b0121a7d9a619426a93f3026e27 Author: Kees Cook Date: Wed Jan 3 10:18:01 2018 -0800 KPTI: Report when enabled Make sure dmesg reports when KPTI is enabled. Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings commit 108ec7a7d73e90db280f14581ccddc6f9c928bff Author: Kees Cook Date: Thu Jan 4 01:14:24 2018 +0000 KPTI: Rename to PAGE_TABLE_ISOLATION This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION. Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.2] Signed-off-by: Ben Hutchings commit 99273214e17a9772fe7929b7a71fb4a07b22b666 Author: Borislav Petkov Date: Mon Dec 25 13:57:16 2017 +0100 x86/kaiser: Move feature detection up ... before the first use of kaiser_enabled as otherwise funky things happen: about to get started... (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000] (XEN) Pagetable walk from ffff88022a449090: (XEN) L4[0x110] = 0000000229e0e067 0000000000001e0e (XEN) L3[0x008] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08 entry.o#create_bounce_frame+0x135/0x14d (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.9.1_02-3.21 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[] (XEN) RFLAGS: 0000000000000286 EM: 1 CONTEXT: pv guest (d0v0) Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings commit ab77eb13c3020f3e2c738dee83d1fc91919072e3 Author: Jiri Kosina Date: Tue Jan 2 14:19:49 2018 +0100 kaiser: disabled on Xen PV Kaiser cannot be used on paravirtualized MMUs (namely reading and writing CR3). This does not work with KAISER as the CR3 switch from and to user space PGD would require to map the whole XEN_PV machinery into both. More importantly, enabling KAISER on Xen PV doesn't make too much sense, as PV guests use distinct %cr3 values for kernel and user already. Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.2: use xen_pv_domain()] Signed-off-by: Ben Hutchings commit bd99918270ac4c710ef2aede774a5647ad7e50e8 Author: Borislav Petkov Date: Tue Jan 2 14:19:49 2018 +0100 x86/kaiser: Reenable PARAVIRT Now that the required bits have been addressed, reenable PARAVIRT. Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit fb8063468de58fbd9502928ace3231908168fb42 Author: Thomas Gleixner Date: Mon Dec 4 15:07:30 2017 +0100 x86/paravirt: Dont patch flush_tlb_single commit a035795499ca1c2bd1928808d1a156eda1420383 upstream. native_flush_tlb_single() will be changed with the upcoming PAGE_TABLE_ISOLATION feature. This requires to have more code in there than INVLPG. Remove the paravirt patching for it. Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Reviewed-by: Juergen Gross Acked-by: Peter Zijlstra Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Cc: michael.schwarz@iaik.tugraz.at Cc: moritz.lipp@iaik.tugraz.at Cc: richard.fellner@student.tugraz.at Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.de Signed-off-by: Ingo Molnar [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings commit bba226006469ef51f310f21b4ef0c6ec833bb98b Author: Hugh Dickins Date: Sat Nov 4 18:43:06 2017 -0700 kaiser: kaiser_flush_tlb_on_return_to_user() check PCID Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID check, instead of each caller doing it inline first: nobody needs to optimize for the noPCID case, it's clearer this way, and better suits later changes. Replace those no-op X86_CR3_PCID_KERN_FLUSH lines by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes. (cherry picked from Change-Id: I9b528ed9d7c1ae4a3b4738c2894ee1740b6fb0b9) Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 87cd625be9e9568f362aaf12756f84b180836f4c Author: Hugh Dickins Date: Sat Nov 4 18:23:24 2017 -0700 kaiser: asm/tlbflush.h handle noPGE at lower level I found asm/tlbflush.h too twisty, and think it safer not to avoid __native_flush_tlb_global_irq_disabled() in the kaiser_enabled case, but instead let it handle kaiser_enabled along with cr3: it can just use __native_flush_tlb() for that, no harm in re-disabling preemption. (This is not the same change as Kirill and Dave have suggested for upstream, flipping PGE in cr4: that's neat, but needs a cpu_has_pge check; cr3 is enough for kaiser, and thought to be cheaper than cr4.) Also delete the X86_FEATURE_INVPCID invpcid_flush_all_nonglobals() preference from __native_flush_tlb(): unlike the invpcid_flush_all() preference in __native_flush_tlb_global(), it's not seen in upstream 4.14, and was recently reported to be surprisingly slow. (cherry picked from Change-Id: I0da819a797ff46bca6590040b6480178dff6ba1e) Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit d5b72db20d3c252172f5bb4ea862a1a47e53c488 Author: Hugh Dickins Date: Tue Oct 3 20:49:04 2017 -0700 kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush Now that we're playing the ALTERNATIVE game, use that more efficient method: instead of user-mapping an extra page, and reading an extra cacheline each time for x86_cr3_pcid_noflush. Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax) is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs; but the one line with $63 in looks clearer, so let's stick with that. Worried about what happens with an ALTERNATIVE between the jump and jump label in another ALTERNATIVE? I was, but have checked the combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64, and it does a good job. (cherry picked from Change-Id: I46d06167615aa8d628eed9972125ab2faca93f05) Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 95fdffcc2c581654627e79553b45180da58793a7 Author: Borislav Petkov Date: Tue Jan 2 14:19:48 2018 +0100 x86/kaiser: Check boottime cmdline params AMD (and possibly other vendors) are not affected by the leak KAISER is protecting against. Keep the "nopti" for traditional reasons and add pti= like upstream. Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit e424b40fd8e8905af4362b053cb9c976c2e69385 Author: Borislav Petkov Date: Tue Jan 2 14:19:48 2018 +0100 x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling Concentrate it in arch/x86/mm/kaiser.c and use the upstream string "nopti". Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings commit 7adbf80a06a4e3162a7a09a957db289802b7c7a2 Author: Tom Lendacky Date: Mon Jul 17 16:10:33 2017 -0500 x86/boot: Add early cmdline parsing for options with arguments commit e505371dd83963caae1a37ead9524e8d997341be upstream. Add a cmdline_find_option() function to look for cmdline options that take arguments. The argument is returned in a supplied buffer and the argument length (regardless of whether it fits in the supplied buffer) is returned, with -1 indicating not found. Signed-off-by: Tom Lendacky Reviewed-by: Thomas Gleixner Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Brijesh Singh Cc: Dave Young Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Larry Woodman Cc: Linus Torvalds Cc: Matt Fleming Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Rik van Riel Cc: Toshimitsu Kani Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit fc3f34609512946f0c640cc3d66b9dadd2ecfb5c Author: Dave Hansen Date: Tue Dec 22 14:52:43 2015 -0800 x86/boot: Pass in size to early cmdline parsing commit 8c0517759a1a100a8b83134cf3c7f254774aaeba upstream. We will use this in a few patches to implement tests for early parsing. Signed-off-by: Dave Hansen [ Aligned args properly. ] Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: fenghua.yu@intel.com Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20151222225243.5CC47EB6@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit a606461eb3e0ad6de87f20b03cd9df26b6f997ee Author: Dave Hansen Date: Tue Dec 22 14:52:41 2015 -0800 x86/boot: Simplify early command line parsing commit 4de07ea481361b08fe13735004dafae862482d38 upstream. __cmdline_find_option_bool() tries to account for both NULL-terminated and non-NULL-terminated strings. It keeps 'pos' to look for the end of the buffer and also looks for '!c' in a bunch of places to look for NULL termination. But, it also calls strlen(). You can't call strlen on a non-NULL-terminated string. If !strlen(cmdline), then cmdline[0]=='\0'. In that case, we will go in to the while() loop, set c='\0', hit st_wordstart, notice !c, and will immediately return 0. So, remove the strlen(). It is unnecessary and unsafe. Signed-off-by: Dave Hansen Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: fenghua.yu@intel.com Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20151222225241.15365E43@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 8b704b1f32a509d7254581a9109a4f7dda9669c1 Author: Dave Hansen Date: Tue Dec 22 14:52:39 2015 -0800 x86/boot: Fix early command-line parsing when partial word matches commit abcdc1c694fa4055323cbec1cde4c2cb6b68398c upstream. cmdline_find_option_bool() keeps track of position in two strings: 1. the command-line 2. the option we are searchign for in the command-line We plow through each character in the command-line one at a time, always moving forward. We move forward in the option ('opptr') when we match characters in 'cmdline'. We reset the 'opptr' only when we go in to the 'st_wordstart' state. But, if we fail to match an option because we see a space (state=st_wordcmp, *opptr='\0',c=' '), we set state='st_wordskip' and 'break', moving to the next character. But, that move to the next character is the one *after* the ' '. This means that we will miss a 'st_wordstart' state. For instance, if we have cmdline = "foo fool"; and are searching for "fool", we have: "fool" opptr = ----^ "foo fool" c = --------^ We see that 'l' != ' ', set state=st_wordskip, break, and then move 'c', so: "foo fool" c = ---------^ and are still in state=st_wordskip. We will stay in wordskip until we have skipped "fool", thus missing the option we were looking for. This *only* happens when you have a partially- matching word followed by a matching one. To fix this, we always fall *into* the 'st_wordskip' state when we set it. Signed-off-by: Dave Hansen Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: fenghua.yu@intel.com Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20151222225239.8E1DCA58@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit 8f14962f8692669ba6bd4812a7e21b7fe4acd22e Author: Dave Hansen Date: Tue Dec 22 14:52:38 2015 -0800 x86/boot: Fix early command-line parsing when matching at end commit 02afeaae9843733a39cd9b11053748b2d1dc5ae7 upstream. The x86 early command line parsing in cmdline_find_option_bool() is buggy. If it matches a specified 'option' all the way to the end of the command-line, it will consider it a match. For instance, cmdline = "foo"; cmdline_find_option_bool(cmdline, "fool"); will return 1. This is particularly annoying since we have actual FPU options like "noxsave" and "noxsaves" So, command-line "foo bar noxsave" will match *BOTH* a "noxsave" and "noxsaves". (This turns out not to be an actual problem because "noxsave" implies "noxsaves", but it's still confusing.) To fix this, we simplify the code and stop tracking 'len'. 'len' was trying to indicate either the NULL terminator *OR* the end of a non-NULL-terminated command line at 'COMMAND_LINE_SIZE'. But, each of the three states is *already* checking 'cmdline' for a NULL terminator. We _only_ need to check if we have overrun 'COMMAND_LINE_SIZE', and that we can do without keeping 'len' around. Also add some commends to clarify what is going on. Signed-off-by: Dave Hansen Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: fenghua.yu@intel.com Cc: yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/20151222225238.9AEB560C@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings commit cbc1d40044e0104bb4be32d6f9336bdc58a3d0ff Author: Borislav Petkov Date: Mon May 19 20:59:16 2014 +0200 x86, boot: Carve out early cmdline parsing function commit 1b1ded57a4f2f4420b4de7c395d1b841d8b3c41a upstream. Carve out early cmdline parsing function into .../lib/cmdline.c so it can be used by early code in the kernel proper as well. Adapted from arch/x86/boot/cmdline.c. Signed-off-by: Borislav Petkov Link: http://lkml.kernel.org/r/1400525957-11525-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings commit c1d23a9da91c884a7cb13a5c238ac279f3635c58 Author: Hugh Dickins Date: Sun Sep 24 16:59:49 2017 -0700 kaiser: add "nokaiser" boot option, using ALTERNATIVE Added "nokaiser" boot option: an early param like "noinvpcid". Most places now check int kaiser_enabled (#defined 0 when not CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S and entry_64_compat.S are using the ALTERNATIVE technique, which patches in the preferred instructions at runtime. That technique is tied to x86 cpu features, so X86_FEATURE_KAISER fabricated ("" in its comment so "kaiser" not magicked into /proc/cpuinfo). Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that, but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when nokaiser like when !CONFIG_KAISER, but not setting either when kaiser - neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL won't get set in some obscure corner, or something add PGE into CR4. By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled, all page table setup which uses pte_pfn() masks it out of the ptes. It's slightly shameful that the same declaration versus definition of kaiser_enabled appears in not one, not two, but in three header files (asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h). I felt safer that way, than with #including any of those in any of the others; and did not feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes them all, so we shall hear about it if they get out of synch. Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER from kaiser.c; removed the unused native_get_normal_pgd(); removed the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some comments. But more interestingly, set CR4.PSE in secondary_startup_64: the manual is clear that it does not matter whether it's 0 or 1 when 4-level-pts are enabled, but I was distracted to find cr4 different on BSP and auxiliaries - BSP alone was adding PSE, in init_memory_mapping(). (cherry picked from Change-Id: I8e5bec716944444359cbd19f6729311eff943e9a) Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 2bf370c889e55dca0f673dd1ee688b242f2eb881 Author: Borislav Petkov Date: Sat Jan 10 20:34:07 2015 +0100 x86/alternatives: Use optimized NOPs for padding commit 4fd4b6e5537cec5b56db0b22546dd439ebb26830 upstream. Alternatives allow now for an empty old instruction. In this case we go and pad the space with NOPs at assembly time. However, there are the optimal, longer NOPs which should be used. Do that at patching time by adding alt_instr.padlen-sized NOPs at the old instruction address. Cc: Andy Lutomirski Signed-off-by: Borislav Petkov Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 0b01b7c297106bcbd52bb4e65109f41a21ac9bef Author: Borislav Petkov Date: Mon Jan 5 13:48:41 2015 +0100 x86/alternatives: Make JMPs more robust commit 48c7a2509f9e237d8465399d9cdfe487d3212a23 upstream. Up until now we had to pay attention to relative JMPs in alternatives about how their relative offset gets computed so that the jump target is still correct. Or, as it is the case for near CALLs (opcode e8), we still have to go and readjust the offset at patching time. What is more, the static_cpu_has_safe() facility had to forcefully generate 5-byte JMPs since we couldn't rely on the compiler to generate properly sized ones so we had to force the longest ones. Worse than that, sometimes it would generate a replacement JMP which is longer than the original one, thus overwriting the beginning of the next instruction at patching time. So, in order to alleviate all that and make using JMPs more straight-forward we go and pad the original instruction in an alternative block with NOPs at build time, should the replacement(s) be longer. This way, alternatives users shouldn't pay special attention so that original and replacement instruction sizes are fine but the assembler would simply add padding where needed and not do anything otherwise. As a second aspect, we go and recompute JMPs at patching time so that we can try to make 5-byte JMPs into two-byte ones if possible. If not, we still have to recompute the offsets as the replacement JMP gets put far away in the .altinstr_replacement section leading to a wrong offset if copied verbatim. For example, on a locally generated kernel image old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff810014bd: eb 21 jmp ffffffff810014e0 repl insn: size: 5 ffffffff81d0b23c: e9 b1 62 2f ff jmpq ffffffff810014f2 gets corrected to a 2-byte JMP: apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5) alt_insn: e9 b1 62 2f ff recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2 converted to: eb 33 90 90 90 and a 5-byte JMP: old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2 __switch_to: ffffffff81001516: eb 30 jmp ffffffff81001548 repl insn: size: 5 ffffffff81d0b241: e9 10 63 2f ff jmpq ffffffff81001556 gets shortened into a two-byte one: apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5) alt_insn: e9 10 63 2f ff recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2 converted to: eb 3e 90 90 90 ... and so on. This leads to a net win of around 40ish replacements * 3 bytes savings =~ 120 bytes of I$ on an AMD guest which means some savings of precious instruction cache bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs which on smart microarchitectures means discarding NOPs at decode time and thus freeing up execution bandwidth. Signed-off-by: Borislav Petkov Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit bb583273750706226c144831be9d961e0e867d57 Author: Borislav Petkov Date: Sat Dec 27 10:41:52 2014 +0100 x86/alternatives: Add instruction padding commit 4332195c5615bf748624094ce4ff6797e475024d upstream. Up until now we have always paid attention to make sure the length of the new instruction replacing the old one is at least less or equal to the length of the old instruction. If the new instruction is longer, at the time it replaces the old instruction it will overwrite the beginning of the next instruction in the kernel image and cause your pants to catch fire. So instead of having to pay attention, teach the alternatives framework to pad shorter old instructions with NOPs at buildtime - but only in the case when len(old instruction(s)) < len(new instruction(s)) and add nothing in the >= case. (In that case we do add_nops() when patching). This way the alternatives user shouldn't have to care about instruction sizes and simply use the macros. Add asm ALTERNATIVE* flavor macros too, while at it. Also, we need to save the pad length in a separate struct alt_instr member for NOP optimization and the way to do that reliably is to carry the pad length instead of trying to detect whether we're looking at single-byte NOPs or at pathological instruction offsets like e9 90 90 90 90, for example, which is a valid instruction. Thanks to Michael Matz for the great help with toolchain questions. Signed-off-by: Borislav Petkov Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit e72cf2122f5e4f5e9c7afcc816fba4c34938cdf5 Author: Borislav Petkov Date: Tue Dec 30 20:27:09 2014 +0100 x86/alternatives: Cleanup DPRINTK macro commit db477a3386dee183130916d6bbf21f5828b0b2e2 upstream. Make it pass __func__ implicitly. Also, dump info about each replacing we're doing. Fixup comments and style while at it. Signed-off-by: Borislav Petkov Signed-off-by: Hugh Dickins [bwh: Update one more use of DPRINTK() that was removed upstream] Signed-off-by: Ben Hutchings commit c03f7fe82cc62ebeb81851247897b36d5bbba61d Author: Hugh Dickins Date: Sun Dec 17 19:53:01 2017 -0800 kaiser: alloc_ldt_struct() use get_zeroed_page() Change the 3.2.96 and 3.18.72 alloc_ldt_struct() to allocate its entries with get_zeroed_page(), as 4.3 onwards does since f454b4788613 ("x86/ldt: Fix small LDT allocation for Xen"). This then matches the free_page() I had misported in __free_ldt_struct(), and fixes the "BUG: Bad page state in process ldt_gdt_32 ... flags: 0x80(slab)" reported by Kees Cook and Jiri Kosina, and analysed by Jiri. Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 026a90cb90fc41b70c275bd240bbe54dfb46497d Author: Hugh Dickins Date: Sun Dec 17 19:29:01 2017 -0800 kaiser: user_map __kprobes_text too In 3.2 (and earlier, and up to 3.15) Kaiser needs to user_map the __kprobes_text as well as the __entry_text: entry_64.S places some vital functions there, so without this you very soon triple-fault. Many thanks to Jiri Kosina for pointing me in this direction. Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 0cee3c94208ae76fc10459cf9398c63dc6956dad Author: Andrea Arcangeli Date: Tue Dec 5 21:15:07 2017 +0100 x86/mm/kaiser: re-enable vsyscalls To avoid breaking the kernel ABI. Signed-off-by: Andrea Arcangeli [Hugh Dickins: Backported to 3.2: - Leave out the PVCLOCK_FIXMAP user mapping, which does not apply to this tree - For safety added vsyscall_pgprot, and a BUG_ON if _PAGE_USER outside of FIXMAP.] Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit a4f588df14fb393b1c8f37c997dbab95afc2eb54 Author: Hugh Dickins Date: Mon Dec 11 17:59:50 2017 -0800 KAISER: Kernel Address Isolation This patch introduces our implementation of KAISER (Kernel Address Isolation to have Side-channels Efficiently Removed), a kernel isolation technique to close hardware side channels on kernel address information. More information about the original patch can be found at: https://github.com/IAIK/KAISER http://marc.info/?l=linux-kernel&m=149390087310405&w=2 Daniel Gruss Richard Fellner Michael Schwarz That original was then developed further by Dave Hansen Hugh Dickins then others after this snapshot. This combined patch for 3.2.96 was derived from hughd's patches below for 3.18.72, in 2017-12-04's kaiser-3.18.72.tar; except for the last, which was sent in 2017-12-09's nokaiser-3.18.72.tar. They have been combined in order to minimize the effort of rebasing: most of the patches in the 3.18.72 series were small fixes and cleanups and enhancements to three large patches. About the only new work in this backport is a simple reimplementation of kaiser_remove_mapping(): since mm/pageattr.c changed a lot between 3.2 and 3.18, and the mods there for Kaiser never seemed necessary. KAISER: Kernel Address Isolation kaiser: merged update kaiser: do not set _PAGE_NX on pgd_none kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE kaiser: fix build and FIXME in alloc_ldt_struct() kaiser: KAISER depends on SMP kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER kaiser: fix perf crashes kaiser: ENOMEM if kaiser_pagetable_walk() NULL kaiser: tidied up asm/kaiser.h somewhat kaiser: tidied up kaiser_add/remove_mapping slightly kaiser: kaiser_remove_mapping() move along the pgd kaiser: align addition to x86/mm/Makefile kaiser: cleanups while trying for gold link kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET kaiser: delete KAISER_REAL_SWITCH option kaiser: vmstat show NR_KAISERTABLE as nr_overhead kaiser: enhanced by kernel and user PCIDs kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user kaiser: PCID 0 for kernel and 128 for user kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user kaiser: paranoid_entry pass cr3 need to paranoid_exit kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls kaiser: fix unlikely error in alloc_ldt_struct() kaiser: drop is_atomic arg to kaiser_pagetable_walk() Signed-off-by: Hugh Dickins [bwh: - Fixed the #undef in arch/x86/boot/compressed/misc.h - Add missing #include in arch/x86/mm/kaiser.c] Signed-off-by: Ben Hutchings commit add19752eb782379610a01ea0d8cfce83cf071b0 Author: Andy Lutomirski Date: Sun Oct 8 21:53:05 2017 -0700 x86/mm/64: Fix reboot interaction with CR4.PCIDE commit 924c6b900cfdf376b07bccfd80e62b21914f8a5a upstream. Trying to reboot via real mode fails with PCID on: long mode cannot be exited while CR4.PCIDE is set. (No, I have no idea why, but the SDM and actual CPUs are in agreement here.) The result is a GPF and a hang instead of a reboot. I didn't catch this in testing because neither my computer nor my VM reboots this way. I can trigger it with reboot=bios, though. Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems") Reported-and-tested-by: Steven Rostedt (VMware) Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 92f128b6ac894a4ad9e0d24de1d9e6919494111c Author: Andy Lutomirski Date: Thu Jun 29 08:53:21 2017 -0700 x86/mm: Enable CR4.PCIDE on supported systems commit 660da7c9228f685b2ebe664f9fd69aaddcc420b5 upstream. We can use PCID if the CPU has PCID and PGE and we're not on Xen. By itself, this has no effect. A followup patch will start using PCID. Signed-off-by: Andy Lutomirski Reviewed-by: Nadav Amit Reviewed-by: Boris Ostrovsky Reviewed-by: Thomas Gleixner Cc: Andrew Morton Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dave Hansen Cc: Juergen Gross Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Rik van Riel Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar [Hugh Dickins: Backported to 3.2: - arch/x86/xen/enlighten_pv.c (not in this tree) - arch/x86/xen/enlighten.c (patched instead of that)] Signed-off-by: Hugh Dickins [Borislav Petkov: Fix bad backport to disable PCID on Xen] Signed-off-by: Ben Hutchings commit 902a5c3e2b4a68f070f1627d921d919a919f984a Author: Andy Lutomirski Date: Thu Jun 29 08:53:20 2017 -0700 x86/mm: Add the 'nopcid' boot option to turn off PCID commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream. The parameter is only present on x86_64 systems to save a few bytes, as PCID is always disabled on x86_32. Signed-off-by: Andy Lutomirski Reviewed-by: Nadav Amit Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Andrew Morton Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dave Hansen Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Rik van Riel Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar [Hugh Dickins: Backported to 3.2: - Documentation/admin-guide/kernel-parameters.txt (not in this tree) - Documentation/kernel-parameters.txt (patched instead of that)] Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 716812f5dd34fa75eb431c4584fe4fa8ad357d74 Author: Andy Lutomirski Date: Thu Jun 29 08:53:19 2017 -0700 x86/mm: Disable PCID on 32-bit kernels commit cba4671af7550e008f7a7835f06df0763825bf3e upstream. 32-bit kernels on new hardware will see PCID in CPUID, but PCID can only be used in 64-bit mode. Rather than making all PCID code conditional, just disable the feature on 32-bit builds. Signed-off-by: Andy Lutomirski Reviewed-by: Nadav Amit Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Andrew Morton Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dave Hansen Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Rik van Riel Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit ebcd6aa6fb360b3e8eff5e1668f16ecffe7a327c Author: Andy Lutomirski Date: Sun May 28 10:00:14 2017 -0700 x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code commit ce4a4e565f5264909a18c733b864c3f74467f69e upstream. The UP asm/tlbflush.h generates somewhat nicer code than the SMP version. Aside from that, it's fallen quite a bit behind the SMP code: - flush_tlb_mm_range() didn't flush individual pages if the range was small. - The lazy TLB code was much weaker. This usually wouldn't matter, but, if a kernel thread flushed its lazy "active_mm" more than once (due to reclaim or similar), it wouldn't be unlazied and would instead pointlessly flush repeatedly. - Tracepoints were missing. Aside from that, simply having the UP code around was a maintanence burden, since it means that any change to the TLB flush code had to make sure not to break it. Simplify everything by deleting the UP code. Signed-off-by: Andy Lutomirski Cc: Andrew Morton Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dave Hansen Cc: Linus Torvalds Cc: Mel Gorman Cc: Michal Hocko Cc: Nadav Amit Cc: Nadav Amit Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar [Hugh Dickins: Backported to 3.2] Signed-off-by: Hugh Dickins [bwh: Fix allnoconfig build failure due to direct use of 'apic' in flush_tlb_others_ipi()] Signed-off-by: Ben Hutchings commit b92b3fa0a7f0af493428584e8fea490e13b7cd04 Author: Andy Lutomirski Date: Fri Jun 9 11:49:15 2017 -0700 sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off() commit 252d2a4117bc181b287eeddf848863788da733ae upstream. idle_task_exit() can be called with IRQs on x86 on and therefore should use switch_mm(), not switch_mm_irqs_off(). This doesn't seem to cause any problems right now, but it will confuse my upcoming TLB flush changes. Nonetheless, I think it should be backported because it's trivial. There won't be any meaningful performance impact because idle_task_exit() is only used when offlining a CPU. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: stable@vger.kernel.org Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 0b13037d5eb4860f85e9cf37d5c12fdb5d5c7b1d Author: Andy Lutomirski Date: Tue Apr 26 09:39:09 2016 -0700 x86/mm, sched/core: Turn off IRQs in switch_mm() commit 078194f8e9fe3cf54c8fd8bded48a1db5bd8eb8a upstream. Potential races between switch_mm() and TLB-flush or LDT-flush IPIs could be very messy. AFAICT the code is currently okay, whether by accident or by careful design, but enabling PCID will make it considerably more complicated and will no longer be obviously safe. Fix it with a big hammer: run switch_mm() with IRQs off. To avoid a performance hit in the scheduler, we take advantage of our knowledge that the scheduler already has IRQs disabled when it calls switch_mm(). Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar Cc: Hugh Dickins Signed-off-by: Ben Hutchings commit 9ecb055a23a1b4bad7c3cbd82d6343e866f7aa14 Author: Andy Lutomirski Date: Tue Apr 26 09:39:08 2016 -0700 x86/mm, sched/core: Uninline switch_mm() commit 69c0319aabba45bcf33178916a2f06967b4adede upstream. It's fairly large and it has quite a few callers. This may also help untangle some headers down the road. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar [Hugh Dickins: Backported to 3.2] Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 338042fd974ff9413689747593a22e0ae1ec9415 Author: Andy Lutomirski Date: Tue Apr 26 09:39:07 2016 -0700 x86/mm: Build arch/x86/mm/tlb.c even on !SMP commit e1074888c326038340a1ada9129d679e661f2ea6 upstream. Currently all of the functions that live in tlb.c are inlined on !SMP builds. One can debate whether this is a good idea (in many respects the code in tlb.c is better than the inlined UP code). Regardless, I want to add code that needs to be built on UP and SMP kernels and relates to tlb flushing, so arrange for tlb.c to be compiled unconditionally. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 45f6717d45e509f4d1ebfbcd12ab52ae736b95be Author: Andy Lutomirski Date: Tue Apr 26 09:39:06 2016 -0700 sched/core: Add switch_mm_irqs_off() and use it in the scheduler commit f98db6013c557c216da5038d9c52045be55cd039 upstream. By default, this is the same thing as switch_mm(). x86 will override it as an optimization. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings commit 7c125fbe719ca8a40b4766774e001f71f858fa35 Author: Ingo Molnar Date: Thu Apr 28 11:39:12 2016 +0200 mm/mmu_context, sched/core: Fix mmu_context.h assumption commit 8efd755ac2fe262d4c8d5c9bbe054bb67dae93da upstream. Some architectures (such as Alpha) rely on include/linux/sched.h definitions in their mmu_context.h files. So include sched.h before mmu_context.h. Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Cc: Hugh Dickins Signed-off-by: Ben Hutchings commit bb63857c7ee4568857c4d04982d6d9a4776ed09b Author: Andy Lutomirski Date: Fri Jan 29 11:42:59 2016 -0800 x86/mm: If INVPCID is available, use it to flush global mappings commit d8bced79af1db6734f66b42064cc773cada2ce99 upstream. On my Skylake laptop, INVPCID function 2 (flush absolutely everything) takes about 376ns, whereas saving flags, twiddling CR4.PGE to flush global mappings, and restoring flags takes about 539ns. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Andrew Morton Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar Cc: Hugh Dickins Signed-off-by: Ben Hutchings commit f59e58350ccabb76f2d82028acc6586f8ee81eff Author: Andy Lutomirski Date: Fri Jan 29 11:42:58 2016 -0800 x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID commit d12a72b844a49d4162f24cefdab30bed3f86730e upstream. This adds a chicken bit to turn off INVPCID in case something goes wrong. It's an early_param() because we do TLB flushes before we parse __setup() parameters. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Andrew Morton Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar Cc: Hugh Dickins Signed-off-by: Ben Hutchings commit 6f498371e64cd1d0bc04ed66867ef04c80950e1a Author: Borislav Petkov Date: Wed Feb 10 15:51:16 2016 +0100 x86/mm: Fix INVPCID asm constraint commit e2c7698cd61f11d4077fdb28148b2d31b82ac848 upstream. So we want to specify the dependency on both @pcid and @addr so that the compiler doesn't reorder accesses to them *before* the TLB flush. But for that to work, we need to express this properly in the inline asm and deref the whole desc array, not the pointer to it. See clwb() for an example. This fixes the build error on 32-bit: arch/x86/include/asm/tlbflush.h: In function ‘__invpcid’: arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable which gcc4.7 caught but 5.x didn't. Which is strange. :-\ Signed-off-by: Borislav Petkov Cc: Andrew Morton Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Michael Matz Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Cc: Hugh Dickins Signed-off-by: Ben Hutchings commit 5a2615f13d1f643f53d66bf166bcaad42d3a28a3 Author: Andy Lutomirski Date: Fri Jan 29 11:42:57 2016 -0800 x86/mm: Add INVPCID helpers commit 060a402a1ddb551455ee410de2eadd3349f2801b upstream. This adds helpers for each of the four currently-specified INVPCID modes. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Andrew Morton Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Luis R. Rodriguez Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toshi Kani Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar Cc: Hugh Dickins Signed-off-by: Ben Hutchings commit a55c7b6ed442564e3793e20420cf7fdf96ae9ebf Author: H. Peter Anvin Date: Tue Feb 21 17:25:50 2012 -0800 x86, cpufeature: Add CPU features from Intel document 319433-012A commit 513c4ec6e4759aa33c90af0658b82eb4d2027871 upstream. Add CPU features from the Intel Archicture Instruction Set Extensions Programming Reference version 012A (Feb 2012), document number 319433-012A. Signed-off-by: H. Peter Anvin Signed-off-by: Hugh Dickins Signed-off-by: Ben Hutchings