relpipe-data/principles.xml
branchv_0
changeset 188 5b0fab48d59e
parent 150 7d7d4e1f293f
child 204 58c40f213028
equal deleted inserted replaced
187:c952261978e8 188:5b0fab48d59e
   122 			The <m:name/> data format should be concise – the data should be represented by reasonably small amount of bytes.
   122 			The <m:name/> data format should be concise – the data should be represented by reasonably small amount of bytes.
   123 			The format should support large amounts of small values and also sparse data (structures with many NULL/missing values) without wasting too much space.
   123 			The format should support large amounts of small values and also sparse data (structures with many NULL/missing values) without wasting too much space.
   124 			The data that are not written don't need to be compressed and thus have the best compression ratio.
   124 			The data that are not written don't need to be compressed and thus have the best compression ratio.
   125 		</p>
   125 		</p>
   126 		
   126 		
       
   127 		<h2>Streaming</h2>
       
   128 		
       
   129 		<p>
       
   130 			Relational tools should process streams of data and should hold only necessary data in the memory
       
   131 			i.e. the tool should produce the output (the first record) as soon as possible while still reading the input (following records).
       
   132 			Thus the memory usage does not depend on the volume of processed data.
       
   133 		</p>
       
   134 		
       
   135 		<p>
       
   136 			However, there are cases where such streaming is not feasible e.g. if we need to compute some statistics or a column widths while printing a table in the terminal.
       
   137 			In such situation, we must read the whole relation and only then generate the output.
       
   138 			But we should still be able to do streaming on the relations level e.i. if there are more relation, we always hold only one of them in the memory.
       
   139 		</p>
       
   140 		
       
   141 		<p>
       
   142 			This rule is important not only from the performance point of view but also for user experience.
       
   143 			The user should see the output as soon as possible i.e. the longer running processes will produce result continuously instead of flushing everything at the end.
       
   144 			This is also good for debugging and <em>looking inside the things</em>. 
       
   145 		</p>
       
   146 		
   127 		<h2>Unambiguity</h2>
   147 		<h2>Unambiguity</h2>
   128 		
   148 		
   129 		<p>
   149 		<p>
   130 			There should be only one way to represent a single value.
   150 			There should be only one way to represent a single value.
   131 			For example the booleans can be written as <code>00</code> (false) or <code>01</code> (true) and every other value (<code>02..FF</code>) should be invalid/unsupported.
   151 			For example the booleans can be written as <code>00</code> (false) or <code>01</code> (true) and every other value (<code>02..FF</code>) should be invalid/unsupported.