--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/nashorn/docs/DEVELOPER_README Fri Dec 21 16:36:24 2012 -0400
@@ -0,0 +1,445 @@
+This document describes system properties that are used for internal
+debugging and instrumentation purposes, along with the system loggers,
+which are used for the same thing.
+
+This document is intended as a developer resource, and it is not
+needed as Nashorn documentation for normal usage. Flags and system
+properties described herein are subject to change without notice.
+
+=====================================
+1. System properties used internally
+=====================================
+
+This documentation of the system property flags assume that the
+default value of the flag is false, unless otherwise specified.
+
+
+SYSTEM PROPERTY: -Dnashorn.callsiteaccess.debug
+
+See the description of the access logger below. This flag is
+equivalent to enabling the access logger with "info" level.
+
+
+SYSTEM PROPERTY: -Dnashorn.compiler.ints.disable
+
+This flag prevents ints and longs (non double values) from being used
+for any primitive representation in the lowered IR. This is default
+false, i.e Lower will attempt to use integer variables as long as it
+can. For example, var x = 17 would try to use x as an integer, unless
+other operations occur later that require coercion to wider type, for
+example x *= 17.1;
+
+
+SYSTEM PROPERTY: -Dnashorn.compiler.intarithmetic
+
+Arithmetic operations in Nashorn (except bitwise ones) typically
+coerce the operands to doubles (as per the JavaScript spec). To switch
+this off and remain in integer mode, for example for "var x = a&b; var
+y = c&d; var z = x*y;", use this flag. This will force the
+multiplication of variables that are ints to be done with the IMUL
+bytecode and the result "z" to become an int.
+
+WARNING: Note that is is experimental only to ensure that type support
+exists for all primitive types. The generated code is unsound. This
+will be the case until we do optimizations based on it. There is a CR
+in Nashorn to do better range analysis, and ensure that this is only
+done where the operation can't overflow into a wider type. Currently
+no overflow checking is done, so at the moment, until range analysis
+has been completed, this option is turned off.
+
+We've experimented by using int arithmetic for everything and putting
+overflow checks afterwards, which would recompute the operation with
+the correct precision, but have yet to find a configuration where this
+is faster than just using doubles directly, even if the int operation
+does not overflow. Getting access to a JVM intrinsic that does branch
+on overflow would probably alleviate this.
+
+There is also a problem with this optimistic approach if the symbol
+happens to reside in a local variable slot in the bytecode, as those
+are strongly typed. Then we would need to split large sections of
+control flow, so this is probably not the right way to go, while range
+analysis is. There is a large difference between integer bytecode
+without overflow checks and double bytecode. The former is
+significantly faster.
+
+
+SYSTEM PROPERTY: -Dnashorn.codegen.debug, -Dnashorn.codegen.debug.trace=<x>
+
+See the description of the codegen logger below.
+
+
+SYSTEM_PROPERTY: -Dnashorn.fields.debug
+
+See the description on the fields logger below.
+
+
+SYSTEM PROPERTY: -Dnashorn.fields.dual
+
+When this property is true, Nashorn will attempt to use primitive
+fields for AccessorProperties (currently just AccessorProperties, not
+spill properties). Memory footprint for script objects will increase,
+as we need to maintain both a primitive field (a long) as well as an
+Object field for the property value. Ints are represented as the 32
+low bits of the long fields. Doubles are represented as the
+doubleToLongBits of their value. This way a single field can be used
+for all primitive types. Packing and unpacking doubles to their bit
+representation is intrinsified by the JVM and extremely fast.
+
+While dual fields in theory runs significantly faster than Object
+fields due to reduction of boxing and memory allocation overhead,
+there is still work to be done to make this a general purpose
+solution. Research is ongoing.
+
+In the future, this might complement or be replaced by experimental
+feature sun.misc.TaggedArray, which has been discussed on the mlvm
+mailing list. TaggedArrays are basically a way to share data space
+between primitives and references, and have the GC understand this.
+
+As long as only primitive values are written to the fields and enough
+type information exists to make sure that any reads don't have to be
+uselessly boxed and unboxed, this is significantly faster than the
+standard "Objects only" approach that currently is the default. See
+test/examples/dual-fields-micro.js for an example that runs twice as
+fast with dual fields as without them. Here, the compiler, can
+determine that we are dealing with numbers only throughout the entire
+property life span of the properties involved.
+
+If a "real" object (not a boxed primitive) is written to a field that
+has a primitive representation, its callsite is relinked and an Object
+field is used forevermore for that particular field in that
+PropertyMap and its children, even if primitives are later assigned to
+it.
+
+As the amount of compile time type information is very small in a
+dynamic language like JavaScript, it is frequently the case that
+something has to be treated as an object, because we don't know any
+better. In reality though, it is often a boxed primitive is stored to
+an AccessorProperty. The fastest way to handle this soundly is to use
+a callsite typecheck and avoid blowing the field up to an Object. We
+never revert object fields to primitives. Ping-pong:ing back and forth
+between primitive representation and Object representation would cause
+fatal performance overhead, so this is not an option.
+
+For a general application the dual fields approach is still slower
+than objects only fields in some places, about the same in most cases,
+and significantly faster in very few. This is due the program using
+primitives, but we still can't prove it. For example "local_var a =
+call(); field = a;" may very well write a double to the field, but the
+compiler dare not guess a double type if field is a local variable,
+due to bytecode variables being strongly typed and later non
+interchangeable. To get around this, the entire method would have to
+be replaced and a continuation retained to restart from. We believe
+that the next steps we should go through are instead:
+
+1) Implement method specialization based on callsite, as it's quite
+frequently the case that numbers are passed around, but currently our
+function nodes just have object types visible to the compiler. For
+example "var b = 17; func(a,b,17)" is an example where two parameters
+can be specialized, but the main version of func might also be called
+from another callsite with func(x,y,"string").
+
+2) This requires lazy jitting as the functions have to be specialized
+per callsite.
+
+Even though "function square(x) { return x*x }" might look like a
+trivial function that can always only take doubles, this is not
+true. Someone might have overridden the valueOf for x so that the
+toNumber coercion has side effects. To fulfil JavaScript semantics,
+the coercion has to run twice for both terms of the multiplication
+even if they are the same object. This means that call site
+specialization is necessary, not parameter specialization on the form
+"function square(x) { var xd = (double)x; return xd*xd; }", as one
+might first think.
+
+Generating a method specialization for any variant of a function that
+we can determine by types at compile time is a combinatorial explosion
+of byte code (try it e.g. on all the variants of am3 in the Octane
+benchmark crypto.js). Thus, this needs to be lazy
+
+3) Possibly optimistic callsite writes, something on the form
+
+x = y; //x is a field known to be a primitive. y is only an object as
+far as we can tell
+
+turns into
+
+try {
+ x = (int)y;
+} catch (X is not an integer field right now | ClassCastException e) {
+ x = y;
+}
+
+Mini POC shows that this is the key to a lot of dual field performance
+in seemingly trivial micros where one unknown object, in reality
+actually a primitive, foils it for us. Very common pattern. Once we
+are "all primitives", dual fields runs a lot faster than Object fields
+only.
+
+We still have to deal with objects vs primitives for local bytecode
+slots, possibly through code copying and versioning.
+
+
+SYSTEM PROPERTY: -Dnashorn.compiler.symbol.trace=<x>
+
+When this property is set, creation and manipulation of any symbol
+named "x" will show information about when the compiler changes its
+type assumption, bytecode local variable slot assignment and other
+data. This is useful if, for example, a symbol shows up as an Object,
+when you believe it should be a primitive. Usually there is an
+explanation for this, for example that it exists in the global scope
+and type analysis has to be more conservative. In that case, the stack
+trace upon type change to object will usually tell us why.
+
+
+SYSTEM PROPERTY: nashorn.lexer.xmlliterals
+
+If this property it set, it means that the Lexer should attempt to
+parse XML literals, which would otherwise generate syntax
+errors. Warning: there are currently no unit tests for this
+functionality.
+
+XML literals, when this is enabled, end up as standard LiteralNodes in
+the IR.
+
+
+SYSTEM_PROPERTY: nashorn.debug
+
+If this property is set to true, Nashorn runs in Debug mode. Debug
+mode is slightly slower, as for example statistics counters are enabled
+during the run. Debug mode makes available a NativeDebug instance
+called "Debug" in the global space that can be used to print property
+maps and layout for script objects, as well as a "dumpCounters" method
+that will print the current values of the previously mentioned stats
+counters.
+
+These functions currently exists for Debug:
+
+"map" - print(Debug.map(x)) will dump the PropertyMap for object x to
+stdout (currently there also exist functions called "embedX", where X
+is a value from 0 to 3, that will dump the contents of the embed pool
+for the first spill properties in any script object and "spill", that
+will dump the contents of the growing spill pool of spill properties
+in any script object. This is of course subject to change without
+notice, should we change the script object layout.
+
+"methodHandle" - this method returns the method handle that is used
+for invoking a particular script function.
+
+"identical" - this method compares two script objects for reference
+equality. It is a == Java comparison
+
+"dumpCounters" - will dump the debug counters' current values to
+stdout.
+
+Currently we count number of ScriptObjects in the system, number of
+Scope objects in the system, number of ScriptObject listeners added,
+removed and dead (without references).
+
+We also count number of ScriptFunctions, ScriptFunction invocations
+and ScriptFunction allocations.
+
+Furthermore we count PropertyMap statistics: how many property maps
+exist, how many times were property maps cloned, how many times did
+the property map history cache hit, prevent new allocations, how many
+prototype invalidations were done, how many time the property map
+proto cache hit.
+
+Finally we count callsite misses on a per callsite bases, which occur
+when a callsite has to be relinked, due to a previous assumption of
+object layout being invalidated.
+
+
+SYSTEM PROPERTY: nashorn.methodhandles.debug,
+nashorn.methodhandles.debug=create
+
+If this property is enabled, each MethodHandle related call that uses
+the java.lang.invoke package gets its MethodHandle intercepted and an
+instrumentation printout of arguments and return value appended to
+it. This shows exactly which method handles are executed and from
+where. (Also MethodTypes and SwitchPoints). This can be augmented with
+more information, for example, instance count, by subclassing or
+further extending the TraceMethodHandleFactory implementation in
+MethodHandleFactory.java.
+
+If the property is specialized with "=create" as its option,
+instrumentation will be shown for method handles upon creation time
+rather than at runtime usage.
+
+
+SYSTEM PROPERTY: nashorn.methodhandles.debug.stacktrace
+
+This does the same as nashorn.methodhandles.debug, but when enabled
+also dumps the stack trace for every instrumented method handle
+operation. Warning: This is enormously verbose, but provides a pretty
+decent "grep:able" picture of where the calls are coming from.
+
+See the description of the codegen logger below for a more verbose
+description of this option
+
+
+SYSTEM PROPERTY: nashorn.scriptfunction.specialization.disable
+
+There are several "fast path" implementations of constructors and
+functions in the NativeObject classes that, in their original form,
+take a variable amount of arguments. Said functions are also declared
+to take Object parameters in their original form, as this is what the
+JavaScript specification mandates.
+
+However, we often know quite a lot more at a callsite of one of these
+functions. For example, Math.min is called with a fixed number (2) of
+integer arguments. The overhead of boxing these ints to Objects and
+folding them into an Object array for the generic varargs Math.min
+function is an order of magnitude slower than calling a specialized
+implementation of Math.min that takes two integers. Specialized
+functions and constructors are identified by the tag
+@SpecializedFunction and @SpecializedConstructor in the Nashorn
+code. The linker will link in the most appropriate (narrowest types,
+right number of types and least number of arguments) specialization if
+specializations are available.
+
+Every ScriptFunction may carry specializations that the linker can
+choose from. This framework will likely be extended for user defined
+functions. The compiler can often infer enough parameter type info
+from callsites for in order to generate simpler versions with less
+generic Object types. This feature depends on future lazy jitting, as
+there tend to be many calls to user defined functions, some where the
+callsite can be specialized, some where we mostly see object
+parameters even at the callsite.
+
+If this system property is set to true, the linker will not attempt to
+use any specialized function or constructor for native objects, but
+just call the generic one.
+
+
+SYSTEM PROPERTY: nashorn.tcs.miss.samplePercent=<x>
+
+When running with the trace callsite option (-tcs), Nashorn will count
+and instrument any callsite misses that require relinking. As the
+number of relinks is large and usually produces a lot of output, this
+system property can be used to constrain the percentage of misses that
+should be logged. Typically this is set to 1 or 5 (percent). 1% is the
+default value.
+
+
+SYSTEM_PROPERTY: nashorn.profilefile=<filename>
+
+When running with the profile callsite options (-pcs), Nashorn will
+dump profiling data for all callsites to stderr as a shutdown hook. To
+instead redirect this to a file, specify the path to the file using
+this system property.
+
+
+===============
+2. The loggers.
+===============
+
+The Nashorn loggers can be used to print per-module or per-subsystem
+debug information with different levels of verbosity. The loggers for
+a given subsystem are available are enabled by using
+
+--log=<systemname>[:<level>]
+
+on the command line.
+
+Here <systemname> identifies the name of the subsystem to be logged
+and the optional colon and level argument is a standard
+java.util.logging.Level name (severe, warning, info, config, fine,
+finer, finest). If the level is left out for a particular subsystem,
+it defaults to "info". Any log message logged as the level or a level
+that is more important will be output to stderr by the logger.
+
+Several loggers can be enabled by a single command line option, by
+putting a comma after each subsystem/level tuple (or each subsystem if
+level is unspecified). The --log option can also be given multiple
+times on the same command line, with the same effect.
+
+For example: --log=codegen,fields:finest is equivalent to
+--log=codegen:info --log=fields:finest
+
+The subsystems that currently support logging are:
+
+
+* compiler
+
+The compiler is in charge of turning source code and function nodes
+into byte code, and installs the classes into a class loader
+controlled from the Context. Log messages are, for example, about
+things like new compile units being allocated. The compiler has global
+settings that all the tiers of codegen (e.g. Lower and CodeGenerator)
+use.
+
+
+* codegen
+
+The code generator is the emitter stage of the code pipeline, and
+turns the lowest tier of a FunctionNode into bytecode. Codegen logging
+shows byte codes as they are being emitted, line number information
+and jumps. It also shows the contents of the bytecode stack prior to
+each instruction being emitted. This is a good debugging aid. For
+example:
+
+[codegen] #41 line:2 (f)_afc824e
+[codegen] #42 load symbol x slot=2
+[codegen] #43 {1:O} load int 0
+[codegen] #44 {2:I O} dynamic_runtime_call GT:ZOI_I args=2 returnType=boolean
+[codegen] #45 signature (Ljava/lang/Object;I)Z
+[codegen] #46 {1:Z} ifeq ternary_false_5402fe28
+[codegen] #47 load symbol x slot=2
+[codegen] #48 {1:O} goto ternary_exit_107c1f2f
+[codegen] #49 ternary_false_5402fe28
+[codegen] #50 load symbol x slot=2
+[codegen] #51 {1:O} convert object -> double
+[codegen] #52 {1:D} neg
+[codegen] #53 {1:D} convert double -> object
+[codegen] #54 {1:O} ternary_exit_107c1f2f
+[codegen] #55 {1:O} return object
+
+shows a ternary node being generated for the sequence "return x > 0 ?
+x : -x"
+
+The first number on the log line is a unique monotonically increasing
+emission id per bytecode. There is no guarantee this is the same id
+between runs. depending on non deterministic code
+execution/compilation, but for small applications it usually is. If
+the system variable -Dnashorn.codegen.debug.trace=<x> is set, where x
+is a bytecode emission id, a stack trace will be shown as the
+particular bytecode is about to be emitted. This can be a quick way to
+determine where it comes from without attaching the debugger. "Who
+generated that neg?"
+
+The --log=codegen option is equivalent to setting the system variable
+"nashorn.codegen.debug" to true.
+
+
+* lower
+
+The lowering annotates a FunctionNode with symbols for each identifier
+and transforms high level constructs into lower level ones, that the
+CodeGenerator consumes.
+
+Lower logging typically outputs things like post pass actions,
+insertions of casts because symbol types have been changed and type
+specialization information. Currently very little info is generated by
+this logger. This will probably change.
+
+
+* access
+
+The --log=access option is equivalent to setting the system variable
+"nashorn.callsiteaccess.debug" to true. There are several levels of
+the access logger, usually the default level "info" is enough
+
+It is very simple to create your own logger. Use the DebugLogger class
+and give the subsystem name as a constructor argument.
+
+
+* fields
+
+The --log=fields option (at info level) is equivalent to setting the
+system variable "nashorn.fields.debug" to true. At the info level it
+will only show info about type assumptions that were invalidated. If
+the level is set to finest, it will also trace every AccessorProperty
+getter and setter in the program, show arguments, return values
+etc. It will also show the internal representation of respective field
+(Object in the normal case, unless running with the dual field
+representation)
\ No newline at end of file