relpipe-data/classic-example.xml
author František Kučera <franta-hg@frantovo.cz>
Tue, 04 Dec 2018 22:34:19 +0100
branchv_0
changeset 181 72cc1a9dbfca
parent 172 793aedbbe1c3
child 318 137f63652fa2
permissions -rw-r--r--
footer: link to GNU / Free software
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
23
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     1
<stránka
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     4
	
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
     5
	<nadpis>Classic pipeline example</nadpis>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
     6
	<perex>Explained example of classic pipeline</perex>
4
1bb39595a51c genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents: 2
diff changeset
     7
2
ab9099ff88fa vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents: 1
diff changeset
     8
	<text xmlns="http://www.w3.org/1999/xhtml">
ab9099ff88fa vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents: 1
diff changeset
     9
		<p>
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    10
			Assume that we have a text file containing a list of animals and their properties:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    11
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    12
		
172
793aedbbe1c3 <m:pre/>: avoid warnings
František Kučera <franta-hg@frantovo.cz>
parents: 167
diff changeset
    13
		<m:pre jazyk="text" src="animals.txt"/>
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    14
				
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    15
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    16
			We can pass this file through a pipeline:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    17
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    18
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    19
		<m:classic-example/>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    20
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    21
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    22
			Particular steps of the pipeline are separated by the | pipe symbol.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    23
			In the first step, we just read the file and print it on STDOUT.<m:podČarou>Of course, this is an <a href="http://porkmail.org/era/unix/award.html" title="Useless Use of Cat">UUoC</a>, but in examples the right order makes it easier to read than usage of &lt; file redirections.</m:podČarou>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    24
			In the second step, we filter only dogs and get:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    25
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    26
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    27
		<pre><![CDATA[big yellow dog
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    28
small white dog]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    29
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    30
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    31
			In the third step, we select second <em>field</em> (fields are separated by spaces) and get colours of our dogs:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    32
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    33
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    34
		<pre><![CDATA[yellow
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    35
white]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    36
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    37
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    38
			In the fourth step, we translate the values to uppercase and get:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    39
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    40
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    41
		<pre><![CDATA[YELLOW
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    42
WHITE]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    43
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    44
		<p>
146
8c2e2dbee5cc format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents: 145
diff changeset
    45
			So we have a list of colors of our dogs printed in big letters. 
8c2e2dbee5cc format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents: 145
diff changeset
    46
			In case we have several dogs of same color, we could avoid duplicates simply by adding <code>| sort -u</code> in the pipeline (after the <code>cut</code> step).
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    47
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    48
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    49
		<h2>The great parts</h2>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    50
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    51
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    52
			The authors of <code>cat</code>, <code>grep</code>, <code>cut</code> or <code>tr</code> programs don't have to know anything about cats<m:podČarou>n.b. the cat in the command name is a different cat than in our text file</m:podČarou> and dogs and our business domain.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    53
			They can focus on their tasks which are reading files, filtering by regular expressions, doing some substrings and text conversions. And they do it well without being distracted by any animals.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    54
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    55
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    56
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    57
			And we don't have to know anything about the low-level programming in the C language or compile anything.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    58
			We just simply build a pipeline in a shell (e.g. GNU Bash) from existing programs and focus on our business logic.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    59
			And we do it well without being distracted by any low-level issues.
87
25dec6931f18 Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents: 23
diff changeset
    60
		</p>
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    61
		
145
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    62
		<p>
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    63
			Each program used in the pipeline can be written in different programming language and they will work together.
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    64
			Tools written in C, C++, Java, Lisp, Perl, Python, Rust or any other language can be combined together.
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    65
			Thus optimal language can be used for each task.
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    66
		</p>
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    67
		
166
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    68
		<p>
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    69
			The pipeline is reusable which is a big advantage compared to ad-hoc operations done in CLI or GUI tools.
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    70
			Thus we can feed different data (with same structure of course) into the pipeline and get desired result.
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    71
		</p>
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    72
		
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    73
		<p>
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    74
			Particular steps in the pipeline can be added, removed or exchanged. 
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    75
			And we can also debug the pipeline and check what each step produces (e.g. use <code>tee</code> to duplicate the intermediary outputs to a file or just execute only some first steps of the pipeline).
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    76
		</p>
d8be473e0d86 classic pipeline example: more great parts
František Kučera <franta-hg@frantovo.cz>
parents: 151
diff changeset
    77
		
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    78
		<h2>The pitfalls</h2>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    79
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    80
		<p>
167
0ec0b9d70bef classic pipeline example: typo
František Kučera <franta-hg@frantovo.cz>
parents: 166
diff changeset
    81
			This simple example looks quite flawless.
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    82
			But actually it is very brittle.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    83
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    84
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    85
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    86
			What if we have a very big cat that can be described by this line in our file?
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    87
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    88
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    89
		<pre>dog-sized red cat</pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    90
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    91
		<p>In the second step of the pipeline (<code>grep</code>) we will include this record and the final result will be:</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    92
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    93
		<pre><![CDATA[RED
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    94
YELLOW
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    95
WHITE]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    96
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    97
		<p>Which is really unexpected and unwanted result. We don't have a RED dog and this is just an accident. The same would happen if we have a monkey of a <em>doggish</em> color.</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    98
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    99
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   100
			This problem is caused by the fact that the <code>grep dog</code> filters lines containing the word <em>dog</em> regardless its position (first, second or third field).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   101
			Sometimes we could avoid such problems by a bit more complicated regular expression and/or by using Perl, but our pipeline wouldn't be as simple and legible as before.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   102
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   103
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   104
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   105
			What if we have a turtle that has lighter color than other turtles?
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   106
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   107
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   108
		<pre>small light green turtle</pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   109
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   110
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   111
			If we do <code>grep turtle</code> it will work well in this case, but our pipeline will fail in the third step where the <code>cut</code> will select only <em>light</em> (instead of <em>light green</em>).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   112
			And the final result will be:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   113
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   114
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   115
		<pre><![CDATA[GREEN
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   116
LIGHT]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   117
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   118
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   119
			Which is definitively wrong because the second turtle is not LIGHT, it is LIGHT GREEN.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   120
			This problem is caused by the fact that we don't have a well-defined separators between fields.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   121
			Sometimes we could avoid such problems by restrictions/presumptions e.g. <em>the color must not contain a space character</em> (we could replace spaces by hyphens).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   122
			Or we could use some other field delimiter e.g. ; or | or ,. But still we would not be able to use such character in the field values.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   123
			So we must invent some kind of escaping (like <code>\;</code> is not a separator but a part of the field value)
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   124
			or add some quotes/apostrophes (which still requires escaping, because what if we have e.g. name field containing an apostrophe?).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   125
			And parsing such inputs by classic tools and regular expressions is not easy and sometimes even not possible.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   126
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   127
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   128
		<p>
151
5697a01db388 roadmap
František Kučera <franta-hg@frantovo.cz>
parents: 146
diff changeset
   129
			There are also other problems like character encoding, missing meta-data (e.g. field names and types), joining multiple files (Is there always a new-line character at the end of the file? Or is there some nasty BOM at the beginning of the file?)
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   130
			or passing several types of data in a single stream (we have list of animals and we can have e.g. also a list of foods or list of our staff where each list has different fields).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   131
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   132
87
25dec6931f18 Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents: 23
diff changeset
   133
	</text>
4
1bb39595a51c genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents: 2
diff changeset
   134
1
a05c6f3cbc3e základ, první verze
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   135
</stránka>