jdk/src/java.base/share/specs/serialization/protocol.md
changeset 45534 f0b3d467215e
parent 45473 03c5450b6e4a
parent 45533 e6707cd51e28
child 45535 4b19310ae4ee
equal deleted inserted replaced
45473:03c5450b6e4a 45534:f0b3d467215e
     1 ---
       
     2 # Copyright (c) 2005, 2017, Oracle and/or its affiliates. All rights reserved.
       
     3 # DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
       
     4 #
       
     5 # This code is free software; you can redistribute it and/or modify it
       
     6 # under the terms of the GNU General Public License version 2 only, as
       
     7 # published by the Free Software Foundation.
       
     8 #
       
     9 # This code is distributed in the hope that it will be useful, but WITHOUT
       
    10 # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
       
    11 # FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
       
    12 # version 2 for more details (a copy is included in the LICENSE file that
       
    13 # accompanied this code).
       
    14 #
       
    15 # You should have received a copy of the GNU General Public License version
       
    16 # 2 along with this work; if not, write to the Free Software Foundation,
       
    17 # Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
       
    18 #
       
    19 # Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
       
    20 # or visit www.oracle.com if you need additional information or have any
       
    21 # questions.
       
    22 
       
    23 include-before: '[CONTENTS](index.html) | [PREV](version.html) | [NEXT](security.html)'
       
    24 include-after: '[CONTENTS](index.html) | [PREV](version.html) | [NEXT](security.html)'
       
    25 
       
    26 title: 'Java Object Serialization Specification: 6 - Object Serialization Stream Protocol'
       
    27 ---
       
    28 
       
    29 -   [Overview](#overview)
       
    30 -   [Stream Elements](#stream-elements)
       
    31 -   [Stream Protocol Versions](#stream-protocol-versions)
       
    32 -   [Grammar for the Stream Format](#grammar-for-the-stream-format)
       
    33 -   [Example](#example)
       
    34 
       
    35 -------------------------------------------------------------------------------
       
    36 
       
    37 ## 6.1 Overview
       
    38 
       
    39 The stream format satisfies the following design goals:
       
    40 
       
    41 -   Is compact and is structured for efficient reading.
       
    42 -   Allows skipping through the stream using only the knowledge of the
       
    43     structure and format of the stream. Does not require invoking any per class
       
    44     code.
       
    45 -   Requires only stream access to the data.
       
    46 
       
    47 ## 6.2 Stream Elements
       
    48 
       
    49 A basic structure is needed to represent objects in a stream. Each attribute of
       
    50 the object needs to be represented: its classes, its fields, and data written
       
    51 and later read by class-specific methods. The representation of objects in the
       
    52 stream can be described with a grammar. There are special representations for
       
    53 null objects, new objects, classes, arrays, strings, and back references to any
       
    54 object already in the stream. Each object written to the stream is assigned a
       
    55 handle that is used to refer back to the object. Handles are assigned
       
    56 sequentially starting from 0x7E0000. The handles restart at 0x7E0000 when the
       
    57 stream is reset.
       
    58 
       
    59 A class object is represented by the following:
       
    60 
       
    61 -   Its `ObjectStreamClass` object.
       
    62 
       
    63 An `ObjectStreamClass` object for a Class that is not a dynamic proxy class is
       
    64 represented by the following:
       
    65 
       
    66 -   The Stream Unique Identifier (SUID) of compatible classes.
       
    67 
       
    68 -   A set of flags indicating various properties of the class, such as whether
       
    69     the class defines a `writeObject` method, and whether the class is
       
    70     serializable, externalizable, or an enum type
       
    71 
       
    72 -   The number of serializable fields
       
    73 
       
    74 -   The array of fields of the class that are serialized by the default
       
    75     mechanismFor arrays and object fields, the type of the field is included as
       
    76     a string which must be in "field descriptor" format (e.g.,
       
    77     "`Ljava/lang/Object;`") as specified in The Java Virtual Machine
       
    78     Specification.
       
    79 
       
    80 -   Optional block-data records or objects written by the `annotateClass`
       
    81     method
       
    82 
       
    83 -   The `ObjectStreamClass` of its supertype (null if the superclass is not
       
    84     serializable)
       
    85 
       
    86 An `ObjectStreamClass` object for a dynamic proxy class is represented by the
       
    87 following:
       
    88 
       
    89 -   The number of interfaces that the dynamic proxy class implements
       
    90 
       
    91 -   The names of all of the interfaces implemented by the dynamic proxy class,
       
    92     listed in the order that they are returned by invoking the `getInterfaces`
       
    93     method on the Class object.
       
    94 
       
    95 -   Optional block-data records or objects written by the `annotateProxyClass`
       
    96     method.
       
    97 
       
    98 -   The ObjectStreamClass of its supertype, `java.lang.reflect.Proxy`.
       
    99 
       
   100 The representation of `String` objects consists of length information followed
       
   101 by the contents of the string encoded in modified UTF-8. The modified UTF-8
       
   102 encoding is the same as used in the Java Virtual Machine and in the
       
   103 `java.io.DataInput` and `DataOutput` interfaces; it differs from standard UTF-8
       
   104 in the representation of supplementary characters and of the null character.
       
   105 The form of the length information depends on the length of the string in
       
   106 modified UTF-8 encoding. If the modified UTF-8 encoding of the given `String`
       
   107 is less than 65536 bytes in length, the length is written as 2 bytes
       
   108 representing an unsigned 16-bit integer. Starting with the Java 2 platform,
       
   109 Standard Edition, v1.3, if the length of the string in modified UTF-8 encoding
       
   110 is 65536 bytes or more, the length is written in 8 bytes representing a signed
       
   111 64-bit integer. The typecode preceding the `String` in the serialization stream
       
   112 indicates which format was used to write the `String`.
       
   113 
       
   114 Arrays are represented by the following:
       
   115 
       
   116 -   Their `ObjectStreamClass` object.
       
   117 
       
   118 -   The number of elements.
       
   119 
       
   120 -   The sequence of values. The type of the values is implicit in the type of
       
   121     the array. for example the values of a byte array are of type byte.
       
   122 
       
   123 Enum constants are represented by the following:
       
   124 
       
   125 -   The `ObjectStreamClass` object of the constant's base enum type.
       
   126 
       
   127 -   The constant's name string.
       
   128 
       
   129 New objects in the stream are represented by the following:
       
   130 
       
   131 -   The most derived class of the object.
       
   132 
       
   133 -   Data for each serializable class of the object, with the highest superclass
       
   134     first. For each class the stream contains the following:
       
   135 
       
   136     -   The serializable fields.See [Section 1.5, "Defining Serializable Fields
       
   137         for a
       
   138         Class"](serial-arch.html#defining-serializable-fields-for-a-class).
       
   139 
       
   140     -   If the class has `writeObject`/`readObject` methods, there may be
       
   141         optional objects and/or block-data records of primitive types written
       
   142         by the `writeObject` method followed by an `endBlockData` code.
       
   143 
       
   144 All primitive data written by classes is buffered and wrapped in block-data
       
   145 records, regardless if the data is written to the stream within a `writeObject`
       
   146 method or written directly to the stream from outside a `writeObject` method.
       
   147 This data can only be read by the corresponding `readObject` methods or be read
       
   148 directly from the stream. Objects written by the `writeObject` method terminate
       
   149 any previous block-data record and are written either as regular objects or
       
   150 null or back references, as appropriate. The block-data records allow error
       
   151 recovery to discard any optional data. When called from within a class, the
       
   152 stream can discard any data or objects until the `endBlockData`.
       
   153 
       
   154 ## 6.3 Stream Protocol Versions
       
   155 
       
   156 It was necessary to make a change to the serialization stream format in JDK 1.2
       
   157 that is not backwards compatible to all minor releases of JDK 1.1. To provide
       
   158 for cases where backwards compatibility is required, a capability has been
       
   159 added to indicate what `PROTOCOL_VERSION` to use when writing a serialization
       
   160 stream. The method `ObjectOutputStream.useProtocolVersion` takes as a parameter
       
   161 the protocol version to use to write the serialization stream.
       
   162 
       
   163 The Stream Protocol Versions are as follows:
       
   164 
       
   165 -   `ObjectStreamConstants.PROTOCOL_VERSION_1`: Indicates the initial stream
       
   166     format.
       
   167 
       
   168 -   `ObjectStreamConstants.PROTOCOL_VERSION_2`: Indicates the new external data
       
   169     format. Primitive data is written in block data mode and is terminated with
       
   170     `TC_ENDBLOCKDATA`.
       
   171 
       
   172     Block data boundaries have been standardized. Primitive data written in
       
   173     block data mode is normalized to not exceed 1024 byte chunks. The benefit
       
   174     of this change was to tighten the specification of serialized data format
       
   175     within the stream. This change is fully backward and forward compatible.
       
   176 
       
   177 JDK 1.2 defaults to writing `PROTOCOL_VERSION_2`.
       
   178 
       
   179 JDK 1.1 defaults to writing `PROTOCOL_VERSION_1`.
       
   180 
       
   181 JDK 1.1.7 and greater can read both versions.
       
   182 
       
   183 Releases prior to JDK 1.1.7 can only read `PROTOCOL_VERSION_1`.
       
   184 
       
   185 ## 6.4 Grammar for the Stream Format
       
   186 
       
   187 The table below contains the grammar for the stream format. Nonterminal symbols
       
   188 are shown in italics. Terminal symbols in a *fixed width font*. Definitions of
       
   189 nonterminals are followed by a ":". The definition is followed by one or more
       
   190 alternatives, each on a separate line. The following table describes the
       
   191 notation:
       
   192 
       
   193   -------------  --------------------------------------------------------------
       
   194   **Notation**   **Meaning**
       
   195   -------------  --------------------------------------------------------------
       
   196   (*datatype*)   This token has the data type specified, such as byte.
       
   197 
       
   198   *token*\[n\]   A predefined number of occurrences of the token, that is an
       
   199                  array.
       
   200 
       
   201   *x0001*        A literal value expressed in hexadecimal. The number of hex
       
   202                  digits reflects the size of the value.
       
   203 
       
   204   <*xxx*>  A value read from the stream used to indicate the length of an
       
   205                  array.
       
   206   -------------  --------------------------------------------------------------
       
   207 
       
   208 Note that the symbol (utf) is used to designate a string written using 2-byte
       
   209 length information, and (long-utf) is used to designate a string written using
       
   210 8-byte length information. For details, refer to [Section 6.2, "Stream
       
   211 Elements"](#stream-elements).
       
   212 
       
   213 ### 6.4.1 Rules of the Grammar
       
   214 
       
   215 A Serialized stream is represented by any stream satisfying the *stream* rule.
       
   216 
       
   217 ```
       
   218 stream:
       
   219   magic version contents
       
   220 
       
   221 contents:
       
   222   content
       
   223   contents content
       
   224 
       
   225 content:
       
   226   object
       
   227   blockdata
       
   228 
       
   229 object:
       
   230   newObject
       
   231   newClass
       
   232   newArray
       
   233   newString
       
   234   newEnum
       
   235   newClassDesc
       
   236   prevObject
       
   237   nullReference
       
   238   exception
       
   239   TC_RESET
       
   240 
       
   241 newClass:
       
   242   TC_CLASS classDesc newHandle
       
   243 
       
   244 classDesc:
       
   245   newClassDesc
       
   246   nullReference
       
   247   (ClassDesc)prevObject      // an object required to be of type ClassDesc
       
   248 
       
   249 superClassDesc:
       
   250   classDesc
       
   251 
       
   252 newClassDesc:
       
   253   TC_CLASSDESC className serialVersionUID newHandle classDescInfo
       
   254   TC_PROXYCLASSDESC newHandle proxyClassDescInfo
       
   255 
       
   256 classDescInfo:
       
   257   classDescFlags fields classAnnotation superClassDesc
       
   258 
       
   259 className:
       
   260   (utf)
       
   261 
       
   262 serialVersionUID:
       
   263   (long)
       
   264 
       
   265 classDescFlags:
       
   266   (byte)                  // Defined in Terminal Symbols and Constants
       
   267 
       
   268 proxyClassDescInfo:
       
   269   (int)<count> proxyInterfaceName[count] classAnnotation
       
   270       superClassDesc
       
   271 
       
   272 proxyInterfaceName:
       
   273   (utf)
       
   274 
       
   275 fields:
       
   276   (short)<count> fieldDesc[count]
       
   277 
       
   278 fieldDesc:
       
   279   primitiveDesc
       
   280   objectDesc
       
   281 
       
   282 primitiveDesc:
       
   283   prim_typecode fieldName
       
   284 
       
   285 objectDesc:
       
   286   obj_typecode fieldName className1
       
   287 
       
   288 fieldName:
       
   289   (utf)
       
   290 
       
   291 className1:
       
   292   (String)object             // String containing the field's type,
       
   293                              // in field descriptor format
       
   294 
       
   295 classAnnotation:
       
   296   endBlockData
       
   297   contents endBlockData      // contents written by annotateClass
       
   298 
       
   299 prim_typecode:
       
   300   'B'       // byte
       
   301   'C'       // char
       
   302   'D'       // double
       
   303   'F'       // float
       
   304   'I'       // integer
       
   305   'J'       // long
       
   306   'S'       // short
       
   307   'Z'       // boolean
       
   308 
       
   309 obj_typecode:
       
   310   '['       // array
       
   311   'L'       // object
       
   312 
       
   313 newArray:
       
   314   TC_ARRAY classDesc newHandle (int)<size> values[size]
       
   315 
       
   316 newObject:
       
   317   TC_OBJECT classDesc newHandle classdata[]  // data for each class
       
   318 
       
   319 classdata:
       
   320   nowrclass                 // SC_SERIALIZABLE & classDescFlag &&
       
   321                             // !(SC_WRITE_METHOD & classDescFlags)
       
   322   wrclass objectAnnotation  // SC_SERIALIZABLE & classDescFlag &&
       
   323                             // SC_WRITE_METHOD & classDescFlags
       
   324   externalContents          // SC_EXTERNALIZABLE & classDescFlag &&
       
   325                             // !(SC_BLOCKDATA  & classDescFlags
       
   326   objectAnnotation          // SC_EXTERNALIZABLE & classDescFlag&&
       
   327                             // SC_BLOCKDATA & classDescFlags
       
   328 
       
   329 nowrclass:
       
   330   values                    // fields in order of class descriptor
       
   331 
       
   332 wrclass:
       
   333   nowrclass
       
   334 
       
   335 objectAnnotation:
       
   336   endBlockData
       
   337   contents endBlockData     // contents written by writeObject
       
   338                             // or writeExternal PROTOCOL_VERSION_2.
       
   339 
       
   340 blockdata:
       
   341   blockdatashort
       
   342   blockdatalong
       
   343 
       
   344 blockdatashort:
       
   345   TC_BLOCKDATA (unsigned byte)<size> (byte)[size]
       
   346 
       
   347 blockdatalong:
       
   348   TC_BLOCKDATALONG (int)<size> (byte)[size]
       
   349 
       
   350 endBlockData:
       
   351   TC_ENDBLOCKDATA
       
   352 
       
   353 externalContent:         // Only parseable by readExternal
       
   354   (bytes)                // primitive data
       
   355    object
       
   356 
       
   357 externalContents:         // externalContent written by
       
   358   externalContent         // writeExternal in PROTOCOL_VERSION_1.
       
   359   externalContents externalContent
       
   360 
       
   361 newString:
       
   362   TC_STRING newHandle (utf)
       
   363   TC_LONGSTRING newHandle (long-utf)
       
   364 
       
   365 newEnum:
       
   366   TC_ENUM classDesc newHandle enumConstantName
       
   367 
       
   368 enumConstantName:
       
   369   (String)object
       
   370 
       
   371 prevObject:
       
   372   TC_REFERENCE (int)handle
       
   373 
       
   374 nullReference:
       
   375   TC_NULL
       
   376 
       
   377 exception:
       
   378   TC_EXCEPTION reset (Throwable)object reset
       
   379 
       
   380 magic:
       
   381   STREAM_MAGIC
       
   382 
       
   383 version:
       
   384   STREAM_VERSION
       
   385 
       
   386 values:          // The size and types are described by the
       
   387                  // classDesc for the current object
       
   388 
       
   389 newHandle:       // The next number in sequence is assigned
       
   390                  // to the object being serialized or deserialized
       
   391 
       
   392 reset:           // The set of known objects is discarded
       
   393                  // so the objects of the exception do not
       
   394                  // overlap with the previously sent objects
       
   395                  // or with objects that may be sent after
       
   396                  // the exception
       
   397 ```
       
   398 
       
   399 ### 6.4.2 Terminal Symbols and Constants
       
   400 
       
   401 The following symbols in `java.io.ObjectStreamConstants` define the terminal
       
   402 and constant values expected in a stream.
       
   403 
       
   404 ```
       
   405 final static short STREAM_MAGIC = (short)0xaced;
       
   406 final static short STREAM_VERSION = 5;
       
   407 final static byte TC_NULL = (byte)0x70;
       
   408 final static byte TC_REFERENCE = (byte)0x71;
       
   409 final static byte TC_CLASSDESC = (byte)0x72;
       
   410 final static byte TC_OBJECT = (byte)0x73;
       
   411 final static byte TC_STRING = (byte)0x74;
       
   412 final static byte TC_ARRAY = (byte)0x75;
       
   413 final static byte TC_CLASS = (byte)0x76;
       
   414 final static byte TC_BLOCKDATA = (byte)0x77;
       
   415 final static byte TC_ENDBLOCKDATA = (byte)0x78;
       
   416 final static byte TC_RESET = (byte)0x79;
       
   417 final static byte TC_BLOCKDATALONG = (byte)0x7A;
       
   418 final static byte TC_EXCEPTION = (byte)0x7B;
       
   419 final static byte TC_LONGSTRING = (byte) 0x7C;
       
   420 final static byte TC_PROXYCLASSDESC = (byte) 0x7D;
       
   421 final static byte TC_ENUM = (byte) 0x7E;
       
   422 final static  int   baseWireHandle = 0x7E0000;
       
   423 ```
       
   424 
       
   425 The flag byte *classDescFlags* may include values of
       
   426 
       
   427 ```
       
   428 final static byte SC_WRITE_METHOD = 0x01; //if SC_SERIALIZABLE
       
   429 final static byte SC_BLOCK_DATA = 0x08;    //if SC_EXTERNALIZABLE
       
   430 final static byte SC_SERIALIZABLE = 0x02;
       
   431 final static byte SC_EXTERNALIZABLE = 0x04;
       
   432 final static byte SC_ENUM = 0x10;
       
   433 ```
       
   434 
       
   435 The flag `SC_WRITE_METHOD` is set if the Serializable class writing the stream
       
   436 had a `writeObject` method that may have written additional data to the stream.
       
   437 In this case a `TC_ENDBLOCKDATA` marker is always expected to terminate the
       
   438 data for that class.
       
   439 
       
   440 The flag `SC_BLOCKDATA` is set if the `Externalizable` class is written into
       
   441 the stream using `STREAM_PROTOCOL_2`. By default, this is the protocol used to
       
   442 write `Externalizable` objects into the stream in JDK 1.2. JDK 1.1 writes
       
   443 `STREAM_PROTOCOL_1`.
       
   444 
       
   445 The flag `SC_SERIALIZABLE` is set if the class that wrote the stream extended
       
   446 `java.io.Serializable` but not `java.io.Externalizable`, the class reading the
       
   447 stream must also extend `java.io.Serializable` and the default serialization
       
   448 mechanism is to be used.
       
   449 
       
   450 The flag `SC_EXTERNALIZABLE` is set if the class that wrote the stream extended
       
   451 `java.io.Externalizable`, the class reading the data must also extend
       
   452 `Externalizable` and the data will be read using its `writeExternal` and
       
   453 `readExternal` methods.
       
   454 
       
   455 The flag `SC_ENUM` is set if the class that wrote the stream was an enum type.
       
   456 The receiver's corresponding class must also be an enum type. Data for
       
   457 constants of the enum type will be written and read as described in [Section
       
   458 1.12, "Serialization of Enum
       
   459 Constants"](serial-arch.html#serialization-of-enum-constants).
       
   460 
       
   461 #### Example
       
   462 
       
   463 Consider the case of an original class and two instances in a linked list:
       
   464 
       
   465 ```
       
   466 class List implements java.io.Serializable {
       
   467     int value;
       
   468     List next;
       
   469     public static void main(String[] args) {
       
   470         try {
       
   471             List list1 = new List();
       
   472             List list2 = new List();
       
   473             list1.value = 17;
       
   474             list1.next = list2;
       
   475             list2.value = 19;
       
   476             list2.next = null;
       
   477 
       
   478             ByteArrayOutputStream o = new ByteArrayOutputStream();
       
   479             ObjectOutputStream out = new ObjectOutputStream(o);
       
   480             out.writeObject(list1);
       
   481             out.writeObject(list2);
       
   482             out.flush();
       
   483             ...
       
   484         } catch (Exception ex) {
       
   485             ex.printStackTrace();
       
   486         }
       
   487     }
       
   488 }
       
   489 ```
       
   490 
       
   491 The resulting stream contains:
       
   492 
       
   493 ```
       
   494     00: ac ed 00 05 73 72 00 04 4c 69 73 74 69 c8 8a 15 >....sr..Listi...<
       
   495     10: 40 16 ae 68 02 00 02 49 00 05 76 61 6c 75 65 4c >Z......I..valueL<
       
   496     20: 00 04 6e 65 78 74 74 00 06 4c 4c 69 73 74 3b 78 >..nextt..LList;x<
       
   497     30: 70 00 00 00 11 73 71 00 7e 00 00 00 00 00 13 70 >p....sq.~......p<
       
   498     40: 71 00 7e 00 03                                  >q.~..<
       
   499 ```
       
   500 
       
   501 -------------------------------------------------------------------------------
       
   502 
       
   503 *[Copyright](../../../legal/SMICopyright.html) &copy; 2005, 2017, Oracle
       
   504 and/or its affiliates. All rights reserved.*