8212828: (process) Change the Process launch mechanism default on Linux to be posix_spawn
authorstuefe
Fri, 08 Feb 2019 08:49:32 +0100
changeset 53722 f037d1a2e899
parent 53721 f35a8aaabcb9
child 53723 f31dad87f661
8212828: (process) Change the Process launch mechanism default on Linux to be posix_spawn Reviewed-by: rriggs, martin
src/java.base/unix/classes/java/lang/ProcessImpl.java
src/java.base/unix/native/libjava/ProcessImpl_md.c
--- a/src/java.base/unix/classes/java/lang/ProcessImpl.java	Mon Feb 11 13:23:20 2019 -0800
+++ b/src/java.base/unix/classes/java/lang/ProcessImpl.java	Fri Feb 08 08:49:32 2019 +0100
@@ -89,7 +89,7 @@
 
     private static enum Platform {
 
-        LINUX(LaunchMechanism.VFORK, LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
+        LINUX(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.VFORK, LaunchMechanism.FORK),
 
         BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
 
@@ -106,27 +106,6 @@
                 EnumSet.copyOf(Arrays.asList(launchMechanisms));
         }
 
-        @SuppressWarnings("fallthrough")
-        private String helperPath(String javahome, String osArch) {
-            switch (this) {
-                case SOLARIS:
-                    // fall through...
-                case LINUX:
-                case AIX:
-                case BSD:
-                    return javahome + "/lib/jspawnhelper";
-
-                default:
-                    throw new AssertionError("Unsupported platform: " + this);
-            }
-        }
-
-        String helperPath() {
-            Properties props = GetPropertyAction.privilegedGetProperties();
-            return helperPath(StaticProperty.javaHome(),
-                              props.getProperty("os.arch"));
-        }
-
         LaunchMechanism launchMechanism() {
             return AccessController.doPrivileged(
                 (PrivilegedAction<LaunchMechanism>) () -> {
@@ -169,7 +148,7 @@
 
     private static final Platform platform = Platform.get();
     private static final LaunchMechanism launchMechanism = platform.launchMechanism();
-    private static final byte[] helperpath = toCString(platform.helperPath());
+    private static final byte[] helperpath = toCString(StaticProperty.javaHome() + "/lib/jspawnhelper");
 
     private static byte[] toCString(String s) {
         if (s == null)
--- a/src/java.base/unix/native/libjava/ProcessImpl_md.c	Mon Feb 11 13:23:20 2019 -0800
+++ b/src/java.base/unix/native/libjava/ProcessImpl_md.c	Fri Feb 08 08:49:32 2019 +0100
@@ -49,56 +49,139 @@
 #include "childproc.h"
 
 /*
- * There are 4 possible strategies we might use to "fork":
+ *
+ * When starting a child on Unix, we need to do three things:
+ * - fork off
+ * - in the child process, do some pre-exec work: duping/closing file
+ *   descriptors to set up stdio-redirection, setting environment variables,
+ *   changing paths...
+ * - then exec(2) the target binary
+ *
+ * There are three ways to fork off:
+ *
+ * A) fork(2). Portable and safe (no side effects) but may fail with ENOMEM on
+ *    all Unices when invoked from a VM with a high memory footprint. On Unices
+ *    with strict no-overcommit policy this problem is most visible.
  *
- * - fork(2).  Very portable and reliable but subject to
- *   failure due to overcommit (see the documentation on
- *   /proc/sys/vm/overcommit_memory in Linux proc(5)).
- *   This is the ancient problem of spurious failure whenever a large
- *   process starts a small subprocess.
+ *    This is because forking the VM will first create a child process with
+ *    theoretically the same memory footprint as the parent - even if you plan
+ *    to follow up with exec'ing a tiny binary. In reality techniques like
+ *    copy-on-write etc mitigate the problem somewhat but we still run the risk
+ *    of hitting system limits.
+ *
+ *    For a Linux centric description of this problem, see the documentation on
+ *    /proc/sys/vm/overcommit_memory in Linux proc(5).
+ *
+ * B) vfork(2): Portable and fast but very unsafe. It bypasses the memory
+ *    problems related to fork(2) by starting the child in the memory image of
+ *    the parent. Things that can go wrong include:
+ *    - Programming errors in the child process before the exec(2) call may
+ *      trash memory in the parent process, most commonly the stack of the
+ *      thread invoking vfork.
+ *    - Signals received by the child before the exec(2) call may be at best
+ *      misdirected to the parent, at worst immediately kill child and parent.
  *
- * - vfork().  Using this is scary because all relevant man pages
- *   contain dire warnings, e.g. Linux vfork(2).  But at least it's
- *   documented in the glibc docs and is standardized by XPG4.
- *   http://www.opengroup.org/onlinepubs/000095399/functions/vfork.html
- *   On Linux, one might think that vfork() would be implemented using
- *   the clone system call with flag CLONE_VFORK, but in fact vfork is
- *   a separate system call (which is a good sign, suggesting that
- *   vfork will continue to be supported at least on Linux).
- *   Another good sign is that glibc implements posix_spawn using
- *   vfork whenever possible.  Note that we cannot use posix_spawn
- *   ourselves because there's no reliable way to close all inherited
- *   file descriptors.
+ *    This is mitigated by very strict rules about what one is allowed to do in
+ *    the child process between vfork(2) and exec(2), which is basically nothing.
+ *    However, we always broke this rule by doing the pre-exec work between
+ *    vfork(2) and exec(2).
+ *
+ *    Also note that vfork(2) has been deprecated by the OpenGroup, presumably
+ *    because of its many dangers.
+ *
+ * C) clone(2): This is a Linux specific call which gives the caller fine
+ *    grained control about how exactly the process fork is executed. It is
+ *    powerful, but Linux-specific.
+ *
+ * Aside from these three possibilities there is a forth option:  posix_spawn(3).
+ * Where fork/vfork/clone all fork off the process and leave pre-exec work and
+ * calling exec(2) to the user, posix_spawn(3) offers the user fork+exec-like
+ * functionality in one package, similar to CreateProcess() on Windows.
+ *
+ * It is not a system call in itself, but usually a wrapper implemented within
+ * the libc in terms of one of (fork|vfork|clone)+exec - so whether or not it
+ * has advantages over calling the naked (fork|vfork|clone) functions depends
+ * on how posix_spawn(3) is implemented.
+ *
+ * Note that when using posix_spawn(3), we exec twice: first a tiny binary called
+ * the jspawnhelper, then in the jspawnhelper we do the pre-exec work and exec a
+ * second time, this time the target binary (similar to the "exec-twice-technique"
+ * described in http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055333.html).
+ *
+ * This is a JDK-specific implementation detail which just happens to be
+ * implemented for jdk.lang.Process.launchMechanism=POSIX_SPAWN.
+ *
+ * --- Linux-specific ---
  *
- * - clone() with flags CLONE_VM but not CLONE_THREAD.  clone() is
- *   Linux-specific, but this ought to work - at least the glibc
- *   sources contain code to handle different combinations of CLONE_VM
- *   and CLONE_THREAD.  However, when this was implemented, it
- *   appeared to fail on 32-bit i386 (but not 64-bit x86_64) Linux with
- *   the simple program
- *     Runtime.getRuntime().exec("/bin/true").waitFor();
- *   with:
- *     #  Internal Error (os_linux_x86.cpp:683), pid=19940, tid=2934639536
- *     #  Error: pthread_getattr_np failed with errno = 3 (ESRCH)
- *   We believe this is a glibc bug, reported here:
- *     http://sources.redhat.com/bugzilla/show_bug.cgi?id=10311
- *   but the glibc maintainers closed it as WONTFIX.
+ * How does glibc implement posix_spawn?
+ * (see: sysdeps/posix/spawni.c for glibc < 2.24,
+ *       sysdeps/unix/sysv/linux/spawni.c for glibc >= 2.24):
+ *
+ * 1) Before glibc 2.4 (released 2006), posix_spawn(3) used just fork(2)/exec(2).
+ *    This would be bad for the JDK since we would risk the known memory issues with
+ *    fork(2). But since this only affects glibc variants which have long been
+ *    phased out by modern distributions, this is irrelevant.
+ *
+ * 2) Between glibc 2.4 and glibc 2.23, posix_spawn uses either fork(2) or
+ *    vfork(2) depending on how exactly the user called posix_spawn(3):
+ *
+ * <quote>
+ *       The child process is created using vfork(2) instead of fork(2) when
+ *       either of the following is true:
+ *
+ *       * the spawn-flags element of the attributes object pointed to by
+ *          attrp contains the GNU-specific flag POSIX_SPAWN_USEVFORK; or
+ *
+ *       * file_actions is NULL and the spawn-flags element of the attributes
+ *          object pointed to by attrp does not contain
+ *          POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF,
+ *          POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER,
+ *          POSIX_SPAWN_SETPGROUP, or POSIX_SPAWN_RESETIDS.
+ * </quote>
+ *
+ * Due to the way the JDK calls posix_spawn(3), it would therefore call vfork(2).
+ * So we would avoid the fork(2) memory problems. However, there still remains the
+ * risk associated with vfork(2). But it is smaller than were we to call vfork(2)
+ * directly since we use the jspawnhelper, moving all pre-exec work off to after
+ * the first exec, thereby reducing the vulnerable time window.
+ *
+ * 3) Since glibc >= 2.24, glibc uses clone+exec:
  *
- * - posix_spawn(). While posix_spawn() is a fairly elaborate and
- *   complicated system call, it can't quite do everything that the old
- *   fork()/exec() combination can do, so the only feasible way to do
- *   this, is to use posix_spawn to launch a new helper executable
- *   "jprochelper", which in turn execs the target (after cleaning
- *   up file-descriptors etc.) The end result is the same as before,
- *   a child process linked to the parent in the same way, but it
- *   avoids the problem of duplicating the parent (VM) process
- *   address space temporarily, before launching the target command.
+ *    new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
+ *                     CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
+ *
+ * This is even better than (2):
+ *
+ * CLONE_VM means we run in the parent's memory image, as with (2)
+ * CLONE_VFORK means parent waits until we exec, as with (2)
+ *
+ * However, error possibilities are further reduced since:
+ * - posix_spawn(3) passes a separate stack for the child to run on, eliminating
+ *   the danger of trashing the forking thread's stack in the parent process.
+ * - posix_spawn(3) takes care to temporarily block all incoming signals to the
+ *   child process until the first exec(2) has been called,
  *
- * Based on the above analysis, we are currently using vfork() on
- * Linux and posix_spawn() on other Unix systems.
+ * TL;DR
+ * Calling posix_spawn(3) for glibc
+ * (2) < 2.24 is not perfect but still better than using plain vfork(2), since
+ *     the chance of an error happening is greatly reduced
+ * (3) >= 2.24 is the best option - portable, fast and as safe as possible.
+ *
+ * ---
+ *
+ * How does muslc implement posix_spawn?
+ *
+ * They always did use the clone (.. CLONE_VM | CLONE_VFORK ...)
+ * technique. So we are safe to use posix_spawn() here regardless of muslc
+ * version.
+ *
+ * </Linux-specific>
+ *
+ *
+ * Based on the above analysis, we are currently defaulting to posix_spawn()
+ * on all Unices including Linux.
  */
 
-
 static void
 setSIGCHLDHandler(JNIEnv *env)
 {