author | František Kučera <franta-hg@frantovo.cz> |
Thu, 06 Dec 2018 21:16:49 +0100 | |
branch | v_0 |
changeset 189 | 47907749817f |
parent 172 | 793aedbbe1c3 |
child 318 | 137f63652fa2 |
permissions | -rw-r--r-- |
23
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
1 |
<stránka |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
2 |
xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
3 |
xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> |
0d2729ed16ed
zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents:
18
diff
changeset
|
4 |
|
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
5 |
<nadpis>Classic pipeline example</nadpis> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
6 |
<perex>Explained example of classic pipeline</perex> |
4
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
7 |
|
2
ab9099ff88fa
vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents:
1
diff
changeset
|
8 |
<text xmlns="http://www.w3.org/1999/xhtml"> |
ab9099ff88fa
vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents:
1
diff
changeset
|
9 |
<p> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
10 |
Assume that we have a text file containing a list of animals and their properties: |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
11 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
12 |
|
172
793aedbbe1c3
<m:pre/>: avoid warnings
František Kučera <franta-hg@frantovo.cz>
parents:
167
diff
changeset
|
13 |
<m:pre jazyk="text" src="animals.txt"/> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
14 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
15 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
16 |
We can pass this file through a pipeline: |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
17 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
18 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
19 |
<m:classic-example/> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
20 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
21 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
22 |
Particular steps of the pipeline are separated by the | pipe symbol. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
23 |
In the first step, we just read the file and print it on STDOUT.<m:podČarou>Of course, this is an <a href="http://porkmail.org/era/unix/award.html" title="Useless Use of Cat">UUoC</a>, but in examples the right order makes it easier to read than usage of < file redirections.</m:podČarou> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
24 |
In the second step, we filter only dogs and get: |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
25 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
26 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
27 |
<pre><![CDATA[big yellow dog |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
28 |
small white dog]]></pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
29 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
30 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
31 |
In the third step, we select second <em>field</em> (fields are separated by spaces) and get colours of our dogs: |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
32 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
33 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
34 |
<pre><![CDATA[yellow |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
35 |
white]]></pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
36 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
37 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
38 |
In the fourth step, we translate the values to uppercase and get: |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
39 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
40 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
41 |
<pre><![CDATA[YELLOW |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
42 |
WHITE]]></pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
43 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
44 |
<p> |
146
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
45 |
So we have a list of colors of our dogs printed in big letters. |
8c2e2dbee5cc
format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents:
145
diff
changeset
|
46 |
In case we have several dogs of same color, we could avoid duplicates simply by adding <code>| sort -u</code> in the pipeline (after the <code>cut</code> step). |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
47 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
48 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
49 |
<h2>The great parts</h2> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
50 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
51 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
52 |
The authors of <code>cat</code>, <code>grep</code>, <code>cut</code> or <code>tr</code> programs don't have to know anything about cats<m:podČarou>n.b. the cat in the command name is a different cat than in our text file</m:podČarou> and dogs and our business domain. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
53 |
They can focus on their tasks which are reading files, filtering by regular expressions, doing some substrings and text conversions. And they do it well without being distracted by any animals. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
54 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
55 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
56 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
57 |
And we don't have to know anything about the low-level programming in the C language or compile anything. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
58 |
We just simply build a pipeline in a shell (e.g. GNU Bash) from existing programs and focus on our business logic. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
59 |
And we do it well without being distracted by any low-level issues. |
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
60 |
</p> |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
61 |
|
145
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
62 |
<p> |
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
63 |
Each program used in the pipeline can be written in different programming language and they will work together. |
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
64 |
Tools written in C, C++, Java, Lisp, Perl, Python, Rust or any other language can be combined together. |
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
65 |
Thus optimal language can be used for each task. |
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
66 |
</p> |
42bbbccd87f3
small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents:
144
diff
changeset
|
67 |
|
166
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
68 |
<p> |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
69 |
The pipeline is reusable which is a big advantage compared to ad-hoc operations done in CLI or GUI tools. |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
70 |
Thus we can feed different data (with same structure of course) into the pipeline and get desired result. |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
71 |
</p> |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
72 |
|
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
73 |
<p> |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
74 |
Particular steps in the pipeline can be added, removed or exchanged. |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
75 |
And we can also debug the pipeline and check what each step produces (e.g. use <code>tee</code> to duplicate the intermediary outputs to a file or just execute only some first steps of the pipeline). |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
76 |
</p> |
d8be473e0d86
classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents:
151
diff
changeset
|
77 |
|
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
78 |
<h2>The pitfalls</h2> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
79 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
80 |
<p> |
167
0ec0b9d70bef
classic pipeline example: typo
František Kučera <franta-hg@frantovo.cz>
parents:
166
diff
changeset
|
81 |
This simple example looks quite flawless. |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
82 |
But actually it is very brittle. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
83 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
84 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
85 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
86 |
What if we have a very big cat that can be described by this line in our file? |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
87 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
88 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
89 |
<pre>dog-sized red cat</pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
90 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
91 |
<p>In the second step of the pipeline (<code>grep</code>) we will include this record and the final result will be:</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
92 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
93 |
<pre><![CDATA[RED |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
94 |
YELLOW |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
95 |
WHITE]]></pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
96 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
97 |
<p>Which is really unexpected and unwanted result. We don't have a RED dog and this is just an accident. The same would happen if we have a monkey of a <em>doggish</em> color.</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
98 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
99 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
100 |
This problem is caused by the fact that the <code>grep dog</code> filters lines containing the word <em>dog</em> regardless its position (first, second or third field). |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
101 |
Sometimes we could avoid such problems by a bit more complicated regular expression and/or by using Perl, but our pipeline wouldn't be as simple and legible as before. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
102 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
103 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
104 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
105 |
What if we have a turtle that has lighter color than other turtles? |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
106 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
107 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
108 |
<pre>small light green turtle</pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
109 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
110 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
111 |
If we do <code>grep turtle</code> it will work well in this case, but our pipeline will fail in the third step where the <code>cut</code> will select only <em>light</em> (instead of <em>light green</em>). |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
112 |
And the final result will be: |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
113 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
114 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
115 |
<pre><![CDATA[GREEN |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
116 |
LIGHT]]></pre> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
117 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
118 |
<p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
119 |
Which is definitively wrong because the second turtle is not LIGHT, it is LIGHT GREEN. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
120 |
This problem is caused by the fact that we don't have a well-defined separators between fields. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
121 |
Sometimes we could avoid such problems by restrictions/presumptions e.g. <em>the color must not contain a space character</em> (we could replace spaces by hyphens). |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
122 |
Or we could use some other field delimiter e.g. ; or | or ,. But still we would not be able to use such character in the field values. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
123 |
So we must invent some kind of escaping (like <code>\;</code> is not a separator but a part of the field value) |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
124 |
or add some quotes/apostrophes (which still requires escaping, because what if we have e.g. name field containing an apostrophe?). |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
125 |
And parsing such inputs by classic tools and regular expressions is not easy and sometimes even not possible. |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
126 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
127 |
|
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
128 |
<p> |
151 | 129 |
There are also other problems like character encoding, missing meta-data (e.g. field names and types), joining multiple files (Is there always a new-line character at the end of the file? Or is there some nasty BOM at the beginning of the file?) |
144
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
130 |
or passing several types of data in a single stream (we have list of animals and we can have e.g. also a list of foods or list of our staff where each list has different fields). |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
131 |
</p> |
ee7e96151673
classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents:
140
diff
changeset
|
132 |
|
87
25dec6931f18
Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents:
23
diff
changeset
|
133 |
</text> |
4
1bb39595a51c
genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents:
2
diff
changeset
|
134 |
|
1 | 135 |
</stránka> |