relpipe-data/principles.xml
branchv_0
changeset 148 d51787006954
parent 147 c004a45502b3
child 150 7d7d4e1f293f
equal deleted inserted replaced
147:c004a45502b3 148:d51787006954
     5 	<nadpis>Principles</nadpis>
     5 	<nadpis>Principles</nadpis>
     6 	<perex>Basic ideas, principles and rules behind the Relational pipes</perex>
     6 	<perex>Basic ideas, principles and rules behind the Relational pipes</perex>
     7 	<pořadí>12</pořadí>
     7 	<pořadí>12</pořadí>
     8 
     8 
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
    10 		<p>
    10 		
    11 			The world is relational!
    11 		<h2>Sane software</h2>
    12 		</p>
    12 		<p>
       
    13 			<m:name/> (both the specification and the reference implementation) should be developed according to the <a href="https://sane-software.globalcode.info/">Sane software manifesto</a> (not yet published).
       
    14 			Many of principles mentioned below are part of <em>being sane</em>. 
       
    15 		</p>
       
    16 		
       
    17 		<h2>Free software and open specification</h2>
       
    18 		
       
    19 		<p>
       
    20 			<m:name/> is and always will be a <a href="https://www.gnu.org/philosophy/free-sw.html">free software</a> and the specification of the format, tools and libraries will be open.
       
    21 			It must not be impaired by software patents or other similar restrictions.
       
    22 			In our country, we do not accept the existence of patents at all.
       
    23 		</p>
       
    24 		
       
    25 		<h2>Divide and conquer</h2>
       
    26 		<p>
       
    27 			Each program should do one thing and do it well. We should separate these three tasks:
       
    28 		</p>
       
    29 		
       
    30 		<ul>
       
    31 			<li>data acquisition / creation</li>
       
    32 			<li>data transformation</li>
       
    33 			<li>data presentation</li>
       
    34 		</ul>
       
    35 		
       
    36 		<p>
       
    37 			A single program should not combine two or more of these tasks. Or should at least allow to run in mode which does only one of them.
       
    38 			Thus we should be able to combine various programs together and get various presentations of the same data regardless the presentation features of the program that created the data.
       
    39 			We should be able to add another transformation on the path between the data origin and the data destination. For example filter out some unwanted data or modify or enhance the values.
       
    40 			Or we should be able to generate some mock/testing data and pass it through the original pipeline (sequence of transformations and the output filter) instead of the live data.
       
    41 			We should be free in how we combine the tools together.
       
    42 			We should be able to build even pipelines that was not expected by the authors of particulars tools we used.
       
    43 		</p>
       
    44 		
       
    45 		<p>
       
    46 			Authors should focus on their task only – e.g. <em>interaction with the Kernel and capturing the inotify events</em> and should not bother about the presentation of the captured data.
       
    47 			There might be many output formats that makes sense (CSV, XML, table, YAML, \0 separated values etc.),
       
    48 			but we should keep it <abbr title="Don't repeat yourself">DRY</abbr> and don't implement every format in every tool.
       
    49 			It would be a waste of time and also a source of errors, because when developing some additional format (which is not our core business) only <em>by the way</em> we would probably do it wrong. 
       
    50 		</p>
       
    51 		
       
    52 		
       
    53 		<h2>Inputs, outputs and transformations as reusable libraries</h2>
       
    54 		
       
    55 		<p>
       
    56 			Parts of the <m:name/> implementation might be used as a library instead of as a filter in a pipeline.
       
    57 			This is not a primary purpose of our software, but sometimes it might be useful.
       
    58 			In such scenario the data are never serialized in the <m:name/> format but flows through a single process and its method/function calls.
       
    59 			For instance, if we need a tabular or CSV output in our program, we could adopt the code from the <m:name/> implementation as a library and call it internally without generating data in the <m:name/> format.
       
    60 			This might bring some performance benefits.
       
    61 		</p>
       
    62 		
       
    63 		<p>
       
    64 			This is not a recommended approach, but should be possible.
       
    65 		</p>
       
    66 		
       
    67 		<p>
       
    68 			However, in any case, we should provide also an option of producing <em>raw</em> data in the <m:name/> format and allow others to convert it to any other format according their needs.
       
    69 		</p>
       
    70 		
       
    71 		<h2>Specification-first, contract-first</h2>
       
    72 		
       
    73 		<p>
       
    74 			The starting point for any developer should be the <m:a href="specification">specification</m:a> that defines the contract and the interface between the system components.
       
    75 			It should cover the data format and also the tools (inputs, transformers and outputs).
       
    76 			The specification must be verified by creating a reference implementation in at least one programming language.
       
    77 		</p>
       
    78 		
       
    79 		<h2>Small code footprint and modular design</h2>
       
    80 		
       
    81 		<p>
       
    82 			The length of the program measured in source lines of code (SLOC) should be as small as possible.
       
    83 			Of course, the goal is not putting multiple statements on a single line.
       
    84 			We should avoid unnecessary complexity (see <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">Cyclomatic complexity</a> – but the SLOC are easier to count and give also quite relevant information).
       
    85 		</p>
       
    86 		
       
    87 		<p>
       
    88 			Modular design allows users to include (download, compile, run) only the portions of software they need.
       
    89 			If the user needs e.g. regular expressions and XML output to be happy, he should not be forced to include also the code for CSV, YAML, JSON and PDF.
       
    90 		</p>
       
    91 		
       
    92 		<p>
       
    93 			Sane software is minimalistic in this way, which means that it is easy to audit, debug or modify.
       
    94 			Looking for a bug (or even a backdoor) or looking for the place where to add the new feature
       
    95 			is much easier in a software that has hundreds or tousands of SLOC than in a software consisting of hundreds of thousands or even millions of SLOC.
       
    96 		</p>
       
    97 		
       
    98 		<p>
       
    99 			The developer who wants to generate (or consume on the other side) relational data, should include only circa few hundreds of SLOC.
       
   100 			This is the amount of code that could be read through in an hour or two.
       
   101 			<!--
       
   102 			Thus implementing the relational output to an existing program should be matter of few hours.
       
   103 			-->
       
   104 		</p>
       
   105 		
       
   106 		
       
   107 		<h2>Sane dependencies</h2>
       
   108 		
       
   109 		<p>
       
   110 			The libraries and the tools should not depend on any libraries other than the standard library of given programming language.
       
   111 			In the best case, of course.
       
   112 			This might be in coflict with the previous rule and then it is the question what is lesser harm.
       
   113 			It definitely makes no sense to write e.g. XML or YAML parser ourselves as a part of our tool.
       
   114 			Using high quality and well tested library is the only sane option.
       
   115 			But what about XML output? We can develop a reliable XML generator on few lines of code because we can implement only the subset of the standard that we need.
       
   116 			Writing such code is much more sane than including some bulky library that has several orders of magnitude more lines of code than our program.
       
   117 		</p>
       
   118 		
       
   119 		<h2>Concise data serialization</h2>
       
   120 		
       
   121 		<p>
       
   122 			The <m:name/> data format should be concise – the data should be represented by reasonably small amount of bytes.
       
   123 			The format should support large amounts of small values and also sparse data (structures with many NULL/missing values) without wasting too much space.
       
   124 			The data that are not written don't need to be compressed and thus have the best compression ratio.
       
   125 		</p>
       
   126 		
       
   127 		<h2>Unambiguity</h2>
       
   128 		
       
   129 		<p>
       
   130 			There should be only one way to represent a single value.
       
   131 			For example the booleans can be written as <code>00</code> (false) or <code>01</code> (true) and every other value (<code>02..FF</code>) should be invalid/unsupported.
       
   132 			Exceptions might occur if there are relevant reasons, but they should be rare.
       
   133 		</p>
       
   134 		
       
   135 		
       
   136 		<h2>Multiple files concatenation</h2>
       
   137 		
       
   138 		<p>
       
   139 			It should be possible to concatenate multiple files or streams of relational data as easy as we can concatenate multiple text files
       
   140 			(given that such text files have same character encoding, have no BOM at the beginning and have a newline at the end).
       
   141 			If we can do:
       
   142 		</p>
       
   143 		
       
   144 		<m:pre jazyk="bash"><![CDATA[
       
   145 (cat file-1.txt; echo "some additional middle data"; cat file-2.txt) | wc -l
       
   146 ]]></m:pre>
       
   147 		
       
   148 		<p>
       
   149 			We should also be able to do:
       
   150 		</p>
       
   151 		
       
   152 		<m:pre jazyk="bash"><![CDATA[
       
   153 (cat file-1.rp; relpipe-in-fstab; cat file-2.rp) | relpipe-out-xml
       
   154 ]]></m:pre>
       
   155 
       
   156 		<p>
       
   157 			Also, it should be possible to append (<code>&gt;&gt;</code>) new records to the last relation without modifying the already written data.
       
   158 		</p>
       
   159 		
       
   160 		<h2>Work primarily with STDIO</h2>
       
   161 		
       
   162 		<p>
       
   163 			The tools should work primarily and by default with the standard input and standard output (STDIN and STDOUT).
       
   164 			Reading/writing from/to files or network should be (if present) a secondary and optional scenario.
       
   165 		</p>
       
   166 		
       
   167 		<p>
       
   168 			Standard error output (STDERR) should be used for errors/warnings/logs. By default, it should not produce any output, if everything goes well.
       
   169 		</p>
       
   170 		
       
   171 		<h2>Tools might be TTY-aware</h2>
       
   172 		
       
   173 		<p>
       
   174 			The input and output tools processing relational data might adapt their behaviour according to the fact whether their input resp. output is a terminal (TTY).
       
   175 		</p>
       
   176 		<p>
       
   177 			If the output is a TTY, it means that the output is displayed to the user, 
       
   178 			so the tool might e.g. colorize its output or do some other human-friendly formatting – 
       
   179 			which makes no sense, if the output is directed to a file or piped to another program.
       
   180 			Example:
       
   181 		</p>
       
   182 		
       
   183 		<m:pre jazyk="bash"><![CDATA[
       
   184 # This would print a table with fancy colors using ANSI sequences:
       
   185 relpipe-in-fstab | relpipe-out-tabular
       
   186 			
       
   187 # This would store the same table in a file but without any colors:
       
   188 relpipe-in-fstab | relpipe-out-tabular > table.txt]]></m:pre>
       
   189 		
       
   190 		<p>
       
   191 			If the input is a TTY, it means that the user is typing the values.
       
   192 			In such situation, the tool might accept another input format (text, human-friendly) or use some default file location instead.
       
   193 			Example:
       
   194 		</p>
       
   195 		
       
   196 		<m:pre jazyk="bash"><![CDATA[
       
   197 # This would read the /etc/fstab (which is the default location):
       
   198 relpipe-in-fstab | relpipe-out-tabular
       
   199 
       
   200 # Those would read the /etc/mtab instead:
       
   201 cat /etc/mtab | relpipe-in-fstab | relpipe-out-tabular
       
   202 relpipe-in-fstab < /etc/mtab | relpipe-out-tabular]]></m:pre>
       
   203 
       
   204 		<p>
       
   205 			However, the behaviour should be modified in visual and expectable manner only.
       
   206 			It should not e.g. switch from XML to YAML.
       
   207 		</p>
       
   208 		
       
   209 		<h2>Use --long-options</h2>
       
   210 		
       
   211 		<p>
       
   212 			Tools should accept arguments (if any) as <code>--long-options</code>.
       
   213 			When looking at a script, it should be clear – at first sight – what it does.
       
   214 			Which would not be if some cryptic short options like <code>-a -x -Z</code> were used.
       
   215 			In order to save our keyboards, there are features like <em>Bash completion</em>.
       
   216 		</p>
       
   217 		
       
   218 		
       
   219 		<h2>Fail-fast, be strict</h2>
       
   220 		
       
   221 		<p>
       
   222 			Because the relational data will be created by machines instead of being manually typed by erring humans,
       
   223 			we should fail-fast on an error. We should be strict and require valid inputs only.
       
   224 			Any error should be revealed as soon as possible and fixed.
       
   225 		</p>
       
   226 		
       
   227 		<p>
       
   228 			There might be tools or options for recovering corrupted data (caused e.g. by a failing HDD or a faulty network or a buggy software).
       
   229 			But the recovery mode is not the default one.
       
   230 		</p>
       
   231 		
       
   232 		<p>
       
   233 			We demand reliable systems – not random and accidential behaviour caused by software guessing <em>What might probably these bytes mean?</em>
       
   234 		</p>
       
   235 		
       
   236 		
       
   237 		
       
   238 		
       
   239 		
       
   240 		<h2></h2>
       
   241 		<h2></h2>
       
   242 		<h2></h2>
       
   243 		<h2></h2>
       
   244 		
    13 	</text>
   245 	</text>
    14 
   246 
    15 </stránka>
   247 </stránka>