author | František Kučera <franta-hg@frantovo.cz> |
Mon, 27 Jul 2020 17:51:53 +0200 | |
branch | v_0 |
changeset 310 | aeda3cb4528d |
parent 231 | ea49ee7a73c9 |
child 321 | e32e2e308de4 |
permissions | -rw-r--r-- |
23
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
1 |
<stránka |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
2 |
xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
3 |
xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
4 |
|
147
c004a45502b3
new pages: principles, roadmap, faq
František Kučera <franta-hg@frantovo.cz>
parents:
139
diff
changeset
|
5 |
<nadpis>Principles</nadpis> |
c004a45502b3
new pages: principles, roadmap, faq
František Kučera <franta-hg@frantovo.cz>
parents:
139
diff
changeset
|
6 |
<perex>Basic ideas, principles and rules behind the Relational pipes</perex> |
c004a45502b3
new pages: principles, roadmap, faq
František Kučera <franta-hg@frantovo.cz>
parents:
139
diff
changeset
|
7 |
<pořadí>12</pořadí> |
4
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
8 |
|
2
ab9099ff88fa
vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents:
1
diff
changeset
|
9 |
<text xmlns="http://www.w3.org/1999/xhtml"> |
148 | 10 |
|
11 |
<h2>Sane software</h2> |
|
2
ab9099ff88fa
vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents:
1
diff
changeset
|
12 |
<p> |
204
58c40f213028
principles: Sane Software Manifesto is already published as a draft
František Kučera <franta-hg@frantovo.cz>
parents:
188
diff
changeset
|
13 |
<m:name/> (both the specification and the reference implementation) should be developed according to the <a href="https://sane-software.globalcode.info/">Sane software manifesto</a> (draft). |
148 | 14 |
Many of principles mentioned below are part of <em>being sane</em>. |
15 |
</p> |
|
16 |
||
17 |
<h2>Free software and open specification</h2> |
|
18 |
||
19 |
<p> |
|
20 |
<m:name/> is and always will be a <a href="https://www.gnu.org/philosophy/free-sw.html">free software</a> and the specification of the format, tools and libraries will be open. |
|
21 |
It must not be impaired by software patents or other similar restrictions. |
|
22 |
In our country, we do not accept the existence of patents at all. |
|
23 |
</p> |
|
24 |
||
25 |
<h2>Divide and conquer</h2> |
|
26 |
<p> |
|
27 |
Each program should do one thing and do it well. We should separate these three tasks: |
|
28 |
</p> |
|
29 |
||
30 |
<ul> |
|
31 |
<li>data acquisition / creation</li> |
|
32 |
<li>data transformation</li> |
|
33 |
<li>data presentation</li> |
|
34 |
</ul> |
|
35 |
||
36 |
<p> |
|
37 |
A single program should not combine two or more of these tasks. Or should at least allow to run in mode which does only one of them. |
|
38 |
Thus we should be able to combine various programs together and get various presentations of the same data regardless the presentation features of the program that created the data. |
|
39 |
We should be able to add another transformation on the path between the data origin and the data destination. For example filter out some unwanted data or modify or enhance the values. |
|
40 |
Or we should be able to generate some mock/testing data and pass it through the original pipeline (sequence of transformations and the output filter) instead of the live data. |
|
41 |
We should be free in how we combine the tools together. |
|
42 |
We should be able to build even pipelines that was not expected by the authors of particulars tools we used. |
|
43 |
</p> |
|
44 |
||
45 |
<p> |
|
46 |
Authors should focus on their task only – e.g. <em>interaction with the Kernel and capturing the inotify events</em> and should not bother about the presentation of the captured data. |
|
47 |
There might be many output formats that makes sense (CSV, XML, table, YAML, \0 separated values etc.), |
|
48 |
but we should keep it <abbr title="Don't repeat yourself">DRY</abbr> and don't implement every format in every tool. |
|
49 |
It would be a waste of time and also a source of errors, because when developing some additional format (which is not our core business) only <em>by the way</em> we would probably do it wrong. |
|
50 |
</p> |
|
51 |
||
52 |
||
53 |
<h2>Inputs, outputs and transformations as reusable libraries</h2> |
|
54 |
||
55 |
<p> |
|
56 |
Parts of the <m:name/> implementation might be used as a library instead of as a filter in a pipeline. |
|
57 |
This is not a primary purpose of our software, but sometimes it might be useful. |
|
58 |
In such scenario the data are never serialized in the <m:name/> format but flows through a single process and its method/function calls. |
|
59 |
For instance, if we need a tabular or CSV output in our program, we could adopt the code from the <m:name/> implementation as a library and call it internally without generating data in the <m:name/> format. |
|
60 |
This might bring some performance benefits. |
|
61 |
</p> |
|
62 |
||
63 |
<p> |
|
64 |
This is not a recommended approach, but should be possible. |
|
65 |
</p> |
|
66 |
||
67 |
<p> |
|
68 |
However, in any case, we should provide also an option of producing <em>raw</em> data in the <m:name/> format and allow others to convert it to any other format according their needs. |
|
69 |
</p> |
|
70 |
||
71 |
<h2>Specification-first, contract-first</h2> |
|
72 |
||
73 |
<p> |
|
74 |
The starting point for any developer should be the <m:a href="specification">specification</m:a> that defines the contract and the interface between the system components. |
|
75 |
It should cover the data format and also the tools (inputs, transformers and outputs). |
|
76 |
The specification must be verified by creating a reference implementation in at least one programming language. |
|
77 |
</p> |
|
78 |
||
79 |
<h2>Small code footprint and modular design</h2> |
|
80 |
||
81 |
<p> |
|
82 |
The length of the program measured in source lines of code (SLOC) should be as small as possible. |
|
83 |
Of course, the goal is not putting multiple statements on a single line. |
|
84 |
We should avoid unnecessary complexity (see <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">Cyclomatic complexity</a> – but the SLOC are easier to count and give also quite relevant information). |
|
85 |
</p> |
|
86 |
||
87 |
<p> |
|
88 |
Modular design allows users to include (download, compile, run) only the portions of software they need. |
|
89 |
If the user needs e.g. regular expressions and XML output to be happy, he should not be forced to include also the code for CSV, YAML, JSON and PDF. |
|
90 |
</p> |
|
91 |
||
92 |
<p> |
|
93 |
Sane software is minimalistic in this way, which means that it is easy to audit, debug or modify. |
|
94 |
Looking for a bug (or even a backdoor) or looking for the place where to add the new feature |
|
95 |
is much easier in a software that has hundreds or tousands of SLOC than in a software consisting of hundreds of thousands or even millions of SLOC. |
|
96 |
</p> |
|
97 |
||
98 |
<p> |
|
99 |
The developer who wants to generate (or consume on the other side) relational data, should include only circa few hundreds of SLOC. |
|
100 |
This is the amount of code that could be read through in an hour or two. |
|
101 |
<!-- |
|
102 |
Thus implementing the relational output to an existing program should be matter of few hours. |
|
103 |
--> |
|
104 |
</p> |
|
105 |
||
231
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
106 |
<h2>Optional complexity</h2> |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
107 |
|
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
108 |
<p> |
310
aeda3cb4528d
examples: Querying an RDF triplestore using SPARQL
František Kučera <franta-hg@frantovo.cz>
parents:
231
diff
changeset
|
109 |
We are not scared by things like XML, SQL, RDF, Java or even C++ and we do not hate them. |
231
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
110 |
There are use cases where their complexity is reasonable and makes sense. |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
111 |
But on the other hand, there are many scenarios, where such complexity is not necessary or is even harmful. |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
112 |
This leads us to the conclusion: <em>the complexity must be optional</em>. |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
113 |
</p> |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
114 |
<p> |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
115 |
Thus <m:name/> data format is independent of above-mentioned <em>complex</em> technologies |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
116 |
and our implementation is divided into many separate modules (tools). |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
117 |
So the user could download, compile and run only the parts he really needs. |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
118 |
</p> |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
119 |
<p> |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
120 |
<m:name/> can serve as a <em>bridge</em> between the <em>complex world</em> and the <em>simple world</em>. |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
121 |
</p> |
ea49ee7a73c9
principles: Optional complexity
František Kučera <franta-hg@frantovo.cz>
parents:
210
diff
changeset
|
122 |
|
148 | 123 |
|
124 |
<h2>Sane dependencies</h2> |
|
125 |
||
126 |
<p> |
|
127 |
The libraries and the tools should not depend on any libraries other than the standard library of given programming language. |
|
128 |
In the best case, of course. |
|
129 |
This might be in coflict with the previous rule and then it is the question what is lesser harm. |
|
130 |
It definitely makes no sense to write e.g. XML or YAML parser ourselves as a part of our tool. |
|
131 |
Using high quality and well tested library is the only sane option. |
|
132 |
But what about XML output? We can develop a reliable XML generator on few lines of code because we can implement only the subset of the standard that we need. |
|
133 |
Writing such code is much more sane than including some bulky library that has several orders of magnitude more lines of code than our program. |
|
134 |
</p> |
|
135 |
||
136 |
<h2>Concise data serialization</h2> |
|
137 |
||
138 |
<p> |
|
139 |
The <m:name/> data format should be concise – the data should be represented by reasonably small amount of bytes. |
|
140 |
The format should support large amounts of small values and also sparse data (structures with many NULL/missing values) without wasting too much space. |
|
141 |
The data that are not written don't need to be compressed and thus have the best compression ratio. |
|
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
142 |
</p> |
148 | 143 |
|
188
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
144 |
<h2>Streaming</h2> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
145 |
|
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
146 |
<p> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
147 |
Relational tools should process streams of data and should hold only necessary data in the memory |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
148 |
i.e. the tool should produce the output (the first record) as soon as possible while still reading the input (following records). |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
149 |
Thus the memory usage does not depend on the volume of processed data. |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
150 |
</p> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
151 |
|
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
152 |
<p> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
153 |
However, there are cases where such streaming is not feasible e.g. if we need to compute some statistics or a column widths while printing a table in the terminal. |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
154 |
In such situation, we must read the whole relation and only then generate the output. |
210
f0a2916368e2
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
204
diff
changeset
|
155 |
But we should still be able to do streaming on the relations level i.e. if there are more relation, we always hold only one of them in the memory. |
188
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
156 |
</p> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
157 |
|
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
158 |
<p> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
159 |
This rule is important not only from the performance point of view but also for user experience. |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
160 |
The user should see the output as soon as possible i.e. the longer running processes will produce result continuously instead of flushing everything at the end. |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
161 |
This is also good for debugging and <em>looking inside the things</em>. |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
162 |
</p> |
5b0fab48d59e
principles: streaming
František Kučera <franta-hg@frantovo.cz>
parents:
150
diff
changeset
|
163 |
|
148 | 164 |
<h2>Unambiguity</h2> |
165 |
||
166 |
<p> |
|
167 |
There should be only one way to represent a single value. |
|
168 |
For example the booleans can be written as <code>00</code> (false) or <code>01</code> (true) and every other value (<code>02..FF</code>) should be invalid/unsupported. |
|
169 |
Exceptions might occur if there are relevant reasons, but they should be rare. |
|
170 |
</p> |
|
171 |
||
172 |
||
173 |
<h2>Multiple files concatenation</h2> |
|
174 |
||
175 |
<p> |
|
176 |
It should be possible to concatenate multiple files or streams of relational data as easy as we can concatenate multiple text files |
|
177 |
(given that such text files have same character encoding, have no BOM at the beginning and have a newline at the end). |
|
178 |
If we can do: |
|
179 |
</p> |
|
180 |
||
181 |
<m:pre jazyk="bash"><![CDATA[ |
|
182 |
(cat file-1.txt; echo "some additional middle data"; cat file-2.txt) | wc -l |
|
183 |
]]></m:pre> |
|
184 |
||
185 |
<p> |
|
186 |
We should also be able to do: |
|
187 |
</p> |
|
188 |
||
189 |
<m:pre jazyk="bash"><![CDATA[ |
|
190 |
(cat file-1.rp; relpipe-in-fstab; cat file-2.rp) | relpipe-out-xml |
|
191 |
]]></m:pre> |
|
192 |
||
193 |
<p> |
|
194 |
Also, it should be possible to append (<code>>></code>) new records to the last relation without modifying the already written data. |
|
195 |
</p> |
|
196 |
||
197 |
<h2>Work primarily with STDIO</h2> |
|
198 |
||
199 |
<p> |
|
200 |
The tools should work primarily and by default with the standard input and standard output (STDIN and STDOUT). |
|
201 |
Reading/writing from/to files or network should be (if present) a secondary and optional scenario. |
|
202 |
</p> |
|
203 |
||
204 |
<p> |
|
205 |
Standard error output (STDERR) should be used for errors/warnings/logs. By default, it should not produce any output, if everything goes well. |
|
206 |
</p> |
|
207 |
||
208 |
<h2>Tools might be TTY-aware</h2> |
|
209 |
||
210 |
<p> |
|
211 |
The input and output tools processing relational data might adapt their behaviour according to the fact whether their input resp. output is a terminal (TTY). |
|
212 |
</p> |
|
213 |
<p> |
|
214 |
If the output is a TTY, it means that the output is displayed to the user, |
|
215 |
so the tool might e.g. colorize its output or do some other human-friendly formatting – |
|
216 |
which makes no sense, if the output is directed to a file or piped to another program. |
|
217 |
Example: |
|
218 |
</p> |
|
219 |
||
220 |
<m:pre jazyk="bash"><![CDATA[ |
|
221 |
# This would print a table with fancy colors using ANSI sequences: |
|
222 |
relpipe-in-fstab | relpipe-out-tabular |
|
223 |
||
224 |
# This would store the same table in a file but without any colors: |
|
225 |
relpipe-in-fstab | relpipe-out-tabular > table.txt]]></m:pre> |
|
226 |
||
227 |
<p> |
|
228 |
If the input is a TTY, it means that the user is typing the values. |
|
229 |
In such situation, the tool might accept another input format (text, human-friendly) or use some default file location instead. |
|
230 |
Example: |
|
231 |
</p> |
|
232 |
||
233 |
<m:pre jazyk="bash"><![CDATA[ |
|
234 |
# This would read the /etc/fstab (which is the default location): |
|
235 |
relpipe-in-fstab | relpipe-out-tabular |
|
236 |
||
237 |
# Those would read the /etc/mtab instead: |
|
238 |
cat /etc/mtab | relpipe-in-fstab | relpipe-out-tabular |
|
239 |
relpipe-in-fstab < /etc/mtab | relpipe-out-tabular]]></m:pre> |
|
240 |
||
241 |
<p> |
|
242 |
However, the behaviour should be modified in visual and expectable manner only. |
|
243 |
It should not e.g. switch from XML to YAML. |
|
244 |
</p> |
|
245 |
||
246 |
<h2>Use --long-options</h2> |
|
247 |
||
248 |
<p> |
|
249 |
Tools should accept arguments (if any) as <code>--long-options</code>. |
|
250 |
When looking at a script, it should be clear – at first sight – what it does. |
|
251 |
Which would not be if some cryptic short options like <code>-a -x -Z</code> were used. |
|
252 |
In order to save our keyboards, there are features like <em>Bash completion</em>. |
|
253 |
</p> |
|
254 |
||
255 |
||
150
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
256 |
<h2>Be exact and reliable</h2> |
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
257 |
|
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
258 |
<p> |
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
259 |
<m:name/> should convey data without corrupting or waywardly modifying them. |
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
260 |
Implementation details (e.g. how values are encoded in the stream) should not affect transferred data and the user. |
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
261 |
</p> |
7d7d4e1f293f
principles: Be exact and reliable
František Kučera <franta-hg@frantovo.cz>
parents:
148
diff
changeset
|
262 |
|
148 | 263 |
<h2>Fail-fast, be strict</h2> |
264 |
||
265 |
<p> |
|
266 |
Because the relational data will be created by machines instead of being manually typed by erring humans, |
|
267 |
we should fail-fast on an error. We should be strict and require valid inputs only. |
|
268 |
Any error should be revealed as soon as possible and fixed. |
|
269 |
</p> |
|
270 |
||
271 |
<p> |
|
272 |
There might be tools or options for recovering corrupted data (caused e.g. by a failing HDD or a faulty network or a buggy software). |
|
273 |
But the recovery mode is not the default one. |
|
274 |
</p> |
|
275 |
||
276 |
<p> |
|
277 |
We demand reliable systems – not random and accidential behaviour caused by software guessing <em>What might probably these bytes mean?</em> |
|
278 |
</p> |
|
279 |
||
280 |
||
281 |
||
282 |
||
283 |
||
284 |
<h2></h2> |
|
285 |
<h2></h2> |
|
286 |
<h2></h2> |
|
287 |
<h2></h2> |
|
288 |
||
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
289 |
</text> |
4
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
290 |
|
1 | 291 |
</stránka> |