author | František Kučera <franta-hg@frantovo.cz> |
Tue, 04 Dec 2018 22:34:19 +0100 | |
branch | v_0 |
changeset 181 | 72cc1a9dbfca |
parent 148 | d51787006954 |
child 183 | 82897ccc01ce |
permissions | -rw-r--r-- |
23
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
1 |
<stránka |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
2 |
xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
3 |
xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
4 |
|
137 | 5 |
<nadpis>Relational pipes</nadpis> |
6 |
<perex>Official homepage of Relational pipes.</perex> |
|
4
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
7 |
<pořadí>10</pořadí> |
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
8 |
|
2
ab9099ff88fa
vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents:
1
diff
changeset
|
9 |
<text xmlns="http://www.w3.org/1999/xhtml"> |
ab9099ff88fa
vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents:
1
diff
changeset
|
10 |
<p> |
143 | 11 |
One of the great parts of the <m:unix/> |
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
12 |
<m:podČarou><m:unix tvar="vysvětlivka"/></m:podČarou> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
13 |
culture is the invention<m:podČarou>which is attributed to Doug McIlroy, see <a href="http://www.catb.org/~esr/writings/taoup/html/ch07s02.html#plumbing">The Art of Unix Programming: Pipes, Redirection, and Filters</a></m:podČarou> |
143 | 14 |
of <em>pipes</em> and the idea<m:podČarou>see <a href="http://www.catb.org/~esr/writings/taoup/html/ch01s06.html">The Art of Unix Programming: Basics of the Unix Philosophy</a></m:podČarou> |
15 |
that <em>one program should do one thing and do it well</em>. |
|
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
16 |
</p> |
138 | 17 |
|
143 | 18 |
<p> |
19 |
Each running program (process) has one input stream (called standard input or STDIN) and one output stream (called standard output or STDOUT) and also one additional output stream for logging/errors/warnings (STDERR). |
|
20 |
We can connect programs and pass the STDOUT of first one to the STDIN of the second one (etc.) using pipes. |
|
21 |
</p> |
|
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
22 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
23 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
24 |
A classic pipeline example (<m:a href="classic-example">explained</m:a>): |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
25 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
26 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
27 |
<m:classic-example/> |
143 | 28 |
|
29 |
<!-- |
|
30 |
<m:diagram orientace="vodorovně"> |
|
31 |
node[shape=box]; |
|
32 |
||
33 |
cat [label="cat /etc/fstab"]; |
|
34 |
dd []; |
|
35 |
grep [label="grep tmpfs"]; |
|
36 |
log [label="/tmp/dd.log"]; |
|
37 |
||
38 |
cat -> dd [label="STDOUT → STDIN"]; |
|
39 |
dd -> grep [label="STDOUT → STDIN"]; |
|
40 |
dd -> log [label="STDERR → file"]; |
|
41 |
</m:diagram> |
|
42 |
--> |
|
43 |
||
44 |
<p> |
|
45 |
According to this principle we can build complex and powerful programs (pipelines) by composing several simple, single-purpose and reusable programs. |
|
46 |
Such single-purpose programs (often called <em>filters</em>) are much easier to create, test and optimize and their authors don't have to bother about the complexity of the final pipeline. |
|
47 |
They even don't have to know, how their programs will be used in the future by others. |
|
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
48 |
This is a great design principle that brings us advanced flexibility, reusability, efficiency and reliability. |
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
49 |
Being in any role (author of a filter, builder of a pipeline etc.), we can always focus on our task only and do it well.<m:podČarou>see <a href="http://wiki.apidesign.org/wiki/Cluelessness">cluelessness</a> by Jaroslav Tulach in his <em>Practical API Design. Confessions of a Java Framework Architect</em></m:podČarou> |
143 | 50 |
And we can collaborate with others even if we don't know about them and we don't know that we are collaborating. |
51 |
Now think about putting this together with the free software ideas... How very! |
|
52 |
</p> |
|
53 |
||
54 |
<!-- |
|
138 | 55 |
<m:diagram orientace="vodorovně"> |
56 |
compound=true; |
|
57 |
node[shape=box]; |
|
58 |
||
59 |
subgraph cluster_in { |
|
60 |
label = "Inputs:"; |
|
61 |
cli; |
|
62 |
fstab; |
|
63 |
} |
|
64 |
||
65 |
subgraph cluster_tr { |
|
66 |
label = "Transformations:"; |
|
67 |
grep; |
|
68 |
sed; |
|
69 |
} |
|
70 |
||
71 |
subgraph cluster_out { |
|
72 |
label = "Outputs:"; |
|
73 |
xml; |
|
74 |
tabular; |
|
75 |
gui; |
|
76 |
} |
|
77 |
||
78 |
cli -> grep [ltail=cluster_in, lhead=cluster_tr]; |
|
79 |
grep -> xml [ltail=cluster_tr, lhead=cluster_out]; |
|
80 |
// cli -> xml [ltail=cluster_in, lhead=cluster_out]; |
|
81 |
||
82 |
</m:diagram> |
|
143 | 83 |
--> |
84 |
||
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
85 |
|
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
86 |
<p> |
148 | 87 |
But the question is: how the data passed through pipes should be formatted and structured. |
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
88 |
There is wide spectrum of options from simple unstructured text files (just arrays of lines) |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
89 |
through various <abbr title="delimiter-separated values e.g. CSV separated by comas">DSV</abbr> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
90 |
to formats like XML (YAML, JSON, ASN.1, Diameter, S-expressions etc.). |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
91 |
Simpler formats look temptingly but have many problems and limitations (see the Pitfalls section in the <m:a href="classic-example">Classic pipeline example</m:a>). |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
92 |
On the other hand, the advanced formats are capable to represent arbitrary object tree structures or even arbitrary graphs. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
93 |
They offer unlimited possibilities – and this is their strength and weaknes at the same time. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
94 |
</p> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
95 |
|
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
96 |
<!-- |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
97 |
<blockquote>Everything should be made as simple as possible, but not simpler.</blockquote> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
98 |
--> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
99 |
|
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
100 |
<p> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
101 |
It is not about the shape of the brackets, apostrophes, quotes or text vs. binary. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
102 |
It is not a technical question – it is in the semantic layer and human brain. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
103 |
Generic formats and their <em>arbitrary object trees/graphs</em> are (for humans, not for computers) difficult to understand and work with |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
104 |
– compared to simpler structures like arrays, maps or matrixes. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
105 |
</p> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
106 |
|
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
107 |
<p> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
108 |
This is the reason why we have chosen the relational model as our logical model. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
109 |
This model comes from 1969<m:podČarou>invented and described by Edgar F. Codd, |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
110 |
see <em>Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks, Research Report, IBM</em> from 1969 |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
111 |
and <em>A Relational Model of Data for Large Shared Data Banks</em> from 1970, |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
112 |
see also <a href="https://en.wikipedia.org/wiki/Relational_model">Relational model</a> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
113 |
</m:podČarou> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
114 |
and through decades it has proven its qualities and viability. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
115 |
This logical model is powerful enough to describe almost any data and – at the same time – it is still simple and easy to be understood by humans. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
116 |
</p> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
117 |
|
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
118 |
<p> |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
119 |
Thus the <m:name/> are streams containing zero or more relations. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
120 |
Each relation has a name, one or more attributes and zero or more records (tuples). |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
121 |
Each attribute has a name and a data-type. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
122 |
Records contain attribute values. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
123 |
We can imagine this stream as a sequence of tables (but the table is only one of many possible visual representations of such relational data). |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
124 |
</p> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
125 |
|
143 | 126 |
<h2>What <m:name/> are?</h2> |
127 |
||
128 |
<p> |
|
129 |
<m:name/> are an open <em>data format</em> designed for streaming structured data between two processes. |
|
130 |
Simultaneously with the format specification, we are also developing a <em>reference implementation</em> (libraries and tools) as a free software. |
|
131 |
Although we believe in the specification-first (or contract-first) approach, we always look and check, whether the theoretic concepts are feasible and whether they can be reasonably and reliably implemented. |
|
132 |
So befeore publishing any new specification or its version, we will verify it by creating a reference implementation at least in one programming language. |
|
133 |
</p> |
|
134 |
<p> |
|
135 |
More generally, <m:name/> are a philosophical continuation of the classic <m:unix/> pipelines and the relational model. |
|
136 |
</p> |
|
137 |
||
138 |
||
139 |
<h2>What <m:name/> are not?</h2> |
|
140 |
||
141 |
<p> |
|
142 |
<m:name/> respect the existing ecosystem and are rather an improvement or supplement than a replacement. |
|
148 | 143 |
So the <m:name/> are not a: |
143 | 144 |
</p> |
145 |
||
146 |
<ul> |
|
147 |
<li>Shell – we use existing shells (e.g. GNU Bash), work with any shell and even without a shell (e.g. as a stream format passed through a network or stored in a file).</li> |
|
145
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
148 |
<li>Terminal emulator – same as with shells, we use existing terminals and we can use <m:name/> also outside any terminal; if we interact with the terminal, we use standard means like Unicode, ANSI escape sequences etc.</li> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
149 |
<li>IDE – we can use standard <m:unix/> tools as an IDE (GNU Screen, Make etc.) or any other IDE.</li> |
143 | 150 |
<li>Programming language – <m:name/> are language-independent data format and can be produced or consumed in any programming language.</li> |
151 |
<li>Query language – although some of our tools are doing queries, filtering or transformations, we are not inventing a new query language – instead, we use existing languages like SQL, XPath or regular expressions.</li> |
|
152 |
<!--<li>Text editor – </li>--> |
|
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
143
diff
changeset
|
153 |
<li>Database system, DBMS – we focus on the stream processing rather than data storage. Although sometimes it makes sense to redirect data to a file and continue with the processing later.</li> |
143 | 154 |
</ul> |
155 |
||
156 |
||
157 |
<h2>Project status</h2> |
|
158 |
||
159 |
<p> |
|
160 |
The main ideas and the roadmap are quite clear, but many things will change (including the format internals and interfaces of the libraries and tools). |
|
145
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
161 |
Because we understand how important the API and ABI stability is, we are not ready to publish the version 1.0 yet. |
143 | 162 |
</p> |
163 |
<p> |
|
164 |
On the other hand, the already published tools (tagged as v0.x in v_0 branch) should work quite well (should compile, should run, should not segfault often, should not wipe your hard drive or kill your cat), |
|
165 |
so they might be useful for someone who likes our ideas and who is prepared to update own programs and scripts when the new version is ready. |
|
166 |
</p> |
|
138 | 167 |
|
168 |
||
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
169 |
</text> |
4
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
170 |
|
1 | 171 |
</stránka> |
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
172 |