5 <nadpis>Principles</nadpis> |
5 <nadpis>Principles</nadpis> |
6 <perex>Basic ideas, principles and rules behind the Relational pipes</perex> |
6 <perex>Basic ideas, principles and rules behind the Relational pipes</perex> |
7 <pořadí>12</pořadí> |
7 <pořadí>12</pořadí> |
8 |
8 |
9 <text xmlns="http://www.w3.org/1999/xhtml"> |
9 <text xmlns="http://www.w3.org/1999/xhtml"> |
10 <p> |
10 |
11 The world is relational! |
11 <h2>Sane software</h2> |
12 </p> |
12 <p> |
|
13 <m:name/> (both the specification and the reference implementation) should be developed according to the <a href="https://sane-software.globalcode.info/">Sane software manifesto</a> (not yet published). |
|
14 Many of principles mentioned below are part of <em>being sane</em>. |
|
15 </p> |
|
16 |
|
17 <h2>Free software and open specification</h2> |
|
18 |
|
19 <p> |
|
20 <m:name/> is and always will be a <a href="https://www.gnu.org/philosophy/free-sw.html">free software</a> and the specification of the format, tools and libraries will be open. |
|
21 It must not be impaired by software patents or other similar restrictions. |
|
22 In our country, we do not accept the existence of patents at all. |
|
23 </p> |
|
24 |
|
25 <h2>Divide and conquer</h2> |
|
26 <p> |
|
27 Each program should do one thing and do it well. We should separate these three tasks: |
|
28 </p> |
|
29 |
|
30 <ul> |
|
31 <li>data acquisition / creation</li> |
|
32 <li>data transformation</li> |
|
33 <li>data presentation</li> |
|
34 </ul> |
|
35 |
|
36 <p> |
|
37 A single program should not combine two or more of these tasks. Or should at least allow to run in mode which does only one of them. |
|
38 Thus we should be able to combine various programs together and get various presentations of the same data regardless the presentation features of the program that created the data. |
|
39 We should be able to add another transformation on the path between the data origin and the data destination. For example filter out some unwanted data or modify or enhance the values. |
|
40 Or we should be able to generate some mock/testing data and pass it through the original pipeline (sequence of transformations and the output filter) instead of the live data. |
|
41 We should be free in how we combine the tools together. |
|
42 We should be able to build even pipelines that was not expected by the authors of particulars tools we used. |
|
43 </p> |
|
44 |
|
45 <p> |
|
46 Authors should focus on their task only – e.g. <em>interaction with the Kernel and capturing the inotify events</em> and should not bother about the presentation of the captured data. |
|
47 There might be many output formats that makes sense (CSV, XML, table, YAML, \0 separated values etc.), |
|
48 but we should keep it <abbr title="Don't repeat yourself">DRY</abbr> and don't implement every format in every tool. |
|
49 It would be a waste of time and also a source of errors, because when developing some additional format (which is not our core business) only <em>by the way</em> we would probably do it wrong. |
|
50 </p> |
|
51 |
|
52 |
|
53 <h2>Inputs, outputs and transformations as reusable libraries</h2> |
|
54 |
|
55 <p> |
|
56 Parts of the <m:name/> implementation might be used as a library instead of as a filter in a pipeline. |
|
57 This is not a primary purpose of our software, but sometimes it might be useful. |
|
58 In such scenario the data are never serialized in the <m:name/> format but flows through a single process and its method/function calls. |
|
59 For instance, if we need a tabular or CSV output in our program, we could adopt the code from the <m:name/> implementation as a library and call it internally without generating data in the <m:name/> format. |
|
60 This might bring some performance benefits. |
|
61 </p> |
|
62 |
|
63 <p> |
|
64 This is not a recommended approach, but should be possible. |
|
65 </p> |
|
66 |
|
67 <p> |
|
68 However, in any case, we should provide also an option of producing <em>raw</em> data in the <m:name/> format and allow others to convert it to any other format according their needs. |
|
69 </p> |
|
70 |
|
71 <h2>Specification-first, contract-first</h2> |
|
72 |
|
73 <p> |
|
74 The starting point for any developer should be the <m:a href="specification">specification</m:a> that defines the contract and the interface between the system components. |
|
75 It should cover the data format and also the tools (inputs, transformers and outputs). |
|
76 The specification must be verified by creating a reference implementation in at least one programming language. |
|
77 </p> |
|
78 |
|
79 <h2>Small code footprint and modular design</h2> |
|
80 |
|
81 <p> |
|
82 The length of the program measured in source lines of code (SLOC) should be as small as possible. |
|
83 Of course, the goal is not putting multiple statements on a single line. |
|
84 We should avoid unnecessary complexity (see <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">Cyclomatic complexity</a> – but the SLOC are easier to count and give also quite relevant information). |
|
85 </p> |
|
86 |
|
87 <p> |
|
88 Modular design allows users to include (download, compile, run) only the portions of software they need. |
|
89 If the user needs e.g. regular expressions and XML output to be happy, he should not be forced to include also the code for CSV, YAML, JSON and PDF. |
|
90 </p> |
|
91 |
|
92 <p> |
|
93 Sane software is minimalistic in this way, which means that it is easy to audit, debug or modify. |
|
94 Looking for a bug (or even a backdoor) or looking for the place where to add the new feature |
|
95 is much easier in a software that has hundreds or tousands of SLOC than in a software consisting of hundreds of thousands or even millions of SLOC. |
|
96 </p> |
|
97 |
|
98 <p> |
|
99 The developer who wants to generate (or consume on the other side) relational data, should include only circa few hundreds of SLOC. |
|
100 This is the amount of code that could be read through in an hour or two. |
|
101 <!-- |
|
102 Thus implementing the relational output to an existing program should be matter of few hours. |
|
103 --> |
|
104 </p> |
|
105 |
|
106 |
|
107 <h2>Sane dependencies</h2> |
|
108 |
|
109 <p> |
|
110 The libraries and the tools should not depend on any libraries other than the standard library of given programming language. |
|
111 In the best case, of course. |
|
112 This might be in coflict with the previous rule and then it is the question what is lesser harm. |
|
113 It definitely makes no sense to write e.g. XML or YAML parser ourselves as a part of our tool. |
|
114 Using high quality and well tested library is the only sane option. |
|
115 But what about XML output? We can develop a reliable XML generator on few lines of code because we can implement only the subset of the standard that we need. |
|
116 Writing such code is much more sane than including some bulky library that has several orders of magnitude more lines of code than our program. |
|
117 </p> |
|
118 |
|
119 <h2>Concise data serialization</h2> |
|
120 |
|
121 <p> |
|
122 The <m:name/> data format should be concise – the data should be represented by reasonably small amount of bytes. |
|
123 The format should support large amounts of small values and also sparse data (structures with many NULL/missing values) without wasting too much space. |
|
124 The data that are not written don't need to be compressed and thus have the best compression ratio. |
|
125 </p> |
|
126 |
|
127 <h2>Unambiguity</h2> |
|
128 |
|
129 <p> |
|
130 There should be only one way to represent a single value. |
|
131 For example the booleans can be written as <code>00</code> (false) or <code>01</code> (true) and every other value (<code>02..FF</code>) should be invalid/unsupported. |
|
132 Exceptions might occur if there are relevant reasons, but they should be rare. |
|
133 </p> |
|
134 |
|
135 |
|
136 <h2>Multiple files concatenation</h2> |
|
137 |
|
138 <p> |
|
139 It should be possible to concatenate multiple files or streams of relational data as easy as we can concatenate multiple text files |
|
140 (given that such text files have same character encoding, have no BOM at the beginning and have a newline at the end). |
|
141 If we can do: |
|
142 </p> |
|
143 |
|
144 <m:pre jazyk="bash"><![CDATA[ |
|
145 (cat file-1.txt; echo "some additional middle data"; cat file-2.txt) | wc -l |
|
146 ]]></m:pre> |
|
147 |
|
148 <p> |
|
149 We should also be able to do: |
|
150 </p> |
|
151 |
|
152 <m:pre jazyk="bash"><![CDATA[ |
|
153 (cat file-1.rp; relpipe-in-fstab; cat file-2.rp) | relpipe-out-xml |
|
154 ]]></m:pre> |
|
155 |
|
156 <p> |
|
157 Also, it should be possible to append (<code>>></code>) new records to the last relation without modifying the already written data. |
|
158 </p> |
|
159 |
|
160 <h2>Work primarily with STDIO</h2> |
|
161 |
|
162 <p> |
|
163 The tools should work primarily and by default with the standard input and standard output (STDIN and STDOUT). |
|
164 Reading/writing from/to files or network should be (if present) a secondary and optional scenario. |
|
165 </p> |
|
166 |
|
167 <p> |
|
168 Standard error output (STDERR) should be used for errors/warnings/logs. By default, it should not produce any output, if everything goes well. |
|
169 </p> |
|
170 |
|
171 <h2>Tools might be TTY-aware</h2> |
|
172 |
|
173 <p> |
|
174 The input and output tools processing relational data might adapt their behaviour according to the fact whether their input resp. output is a terminal (TTY). |
|
175 </p> |
|
176 <p> |
|
177 If the output is a TTY, it means that the output is displayed to the user, |
|
178 so the tool might e.g. colorize its output or do some other human-friendly formatting – |
|
179 which makes no sense, if the output is directed to a file or piped to another program. |
|
180 Example: |
|
181 </p> |
|
182 |
|
183 <m:pre jazyk="bash"><![CDATA[ |
|
184 # This would print a table with fancy colors using ANSI sequences: |
|
185 relpipe-in-fstab | relpipe-out-tabular |
|
186 |
|
187 # This would store the same table in a file but without any colors: |
|
188 relpipe-in-fstab | relpipe-out-tabular > table.txt]]></m:pre> |
|
189 |
|
190 <p> |
|
191 If the input is a TTY, it means that the user is typing the values. |
|
192 In such situation, the tool might accept another input format (text, human-friendly) or use some default file location instead. |
|
193 Example: |
|
194 </p> |
|
195 |
|
196 <m:pre jazyk="bash"><![CDATA[ |
|
197 # This would read the /etc/fstab (which is the default location): |
|
198 relpipe-in-fstab | relpipe-out-tabular |
|
199 |
|
200 # Those would read the /etc/mtab instead: |
|
201 cat /etc/mtab | relpipe-in-fstab | relpipe-out-tabular |
|
202 relpipe-in-fstab < /etc/mtab | relpipe-out-tabular]]></m:pre> |
|
203 |
|
204 <p> |
|
205 However, the behaviour should be modified in visual and expectable manner only. |
|
206 It should not e.g. switch from XML to YAML. |
|
207 </p> |
|
208 |
|
209 <h2>Use --long-options</h2> |
|
210 |
|
211 <p> |
|
212 Tools should accept arguments (if any) as <code>--long-options</code>. |
|
213 When looking at a script, it should be clear – at first sight – what it does. |
|
214 Which would not be if some cryptic short options like <code>-a -x -Z</code> were used. |
|
215 In order to save our keyboards, there are features like <em>Bash completion</em>. |
|
216 </p> |
|
217 |
|
218 |
|
219 <h2>Fail-fast, be strict</h2> |
|
220 |
|
221 <p> |
|
222 Because the relational data will be created by machines instead of being manually typed by erring humans, |
|
223 we should fail-fast on an error. We should be strict and require valid inputs only. |
|
224 Any error should be revealed as soon as possible and fixed. |
|
225 </p> |
|
226 |
|
227 <p> |
|
228 There might be tools or options for recovering corrupted data (caused e.g. by a failing HDD or a faulty network or a buggy software). |
|
229 But the recovery mode is not the default one. |
|
230 </p> |
|
231 |
|
232 <p> |
|
233 We demand reliable systems – not random and accidential behaviour caused by software guessing <em>What might probably these bytes mean?</em> |
|
234 </p> |
|
235 |
|
236 |
|
237 |
|
238 |
|
239 |
|
240 <h2></h2> |
|
241 <h2></h2> |
|
242 <h2></h2> |
|
243 <h2></h2> |
|
244 |
13 </text> |
245 </text> |
14 |
246 |
15 </stránka> |
247 </stránka> |