relpipe-data/classic-example.xml
author František Kučera <franta-hg@frantovo.cz>
Tue, 27 Nov 2018 20:07:39 +0100
branchv_0
changeset 151 5697a01db388
parent 146 8c2e2dbee5cc
child 166 d8be473e0d86
permissions -rw-r--r--
roadmap
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
23
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     1
<stránka
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
0d2729ed16ed zkouška interního odkazu
František Kučera <franta-hg@frantovo.cz>
parents: 18
diff changeset
     4
	
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
     5
	<nadpis>Classic pipeline example</nadpis>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
     6
	<perex>Explained example of classic pipeline</perex>
4
1bb39595a51c genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents: 2
diff changeset
     7
2
ab9099ff88fa vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents: 1
diff changeset
     8
	<text xmlns="http://www.w3.org/1999/xhtml">
ab9099ff88fa vkládání zápatí, jmenné prostory, saxon
František Kučera <franta-hg@frantovo.cz>
parents: 1
diff changeset
     9
		<p>
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    10
			Assume that we have a text file containing a list of animals and their properties:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    11
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    12
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    13
		<m:pre src="animals.txt"/>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    14
				
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    15
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    16
			We can pass this file through a pipeline:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    17
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    18
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    19
		<m:classic-example/>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    20
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    21
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    22
			Particular steps of the pipeline are separated by the | pipe symbol.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    23
			In the first step, we just read the file and print it on STDOUT.<m:podČarou>Of course, this is an <a href="http://porkmail.org/era/unix/award.html" title="Useless Use of Cat">UUoC</a>, but in examples the right order makes it easier to read than usage of &lt; file redirections.</m:podČarou>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    24
			In the second step, we filter only dogs and get:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    25
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    26
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    27
		<pre><![CDATA[big yellow dog
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    28
small white dog]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    29
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    30
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    31
			In the third step, we select second <em>field</em> (fields are separated by spaces) and get colours of our dogs:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    32
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    33
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    34
		<pre><![CDATA[yellow
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    35
white]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    36
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    37
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    38
			In the fourth step, we translate the values to uppercase and get:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    39
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    40
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    41
		<pre><![CDATA[YELLOW
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    42
WHITE]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    43
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    44
		<p>
146
8c2e2dbee5cc format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents: 145
diff changeset
    45
			So we have a list of colors of our dogs printed in big letters. 
8c2e2dbee5cc format, structure and logical model – the relational model
František Kučera <franta-hg@frantovo.cz>
parents: 145
diff changeset
    46
			In case we have several dogs of same color, we could avoid duplicates simply by adding <code>| sort -u</code> in the pipeline (after the <code>cut</code> step).
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    47
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    48
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    49
		<h2>The great parts</h2>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    50
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    51
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    52
			The authors of <code>cat</code>, <code>grep</code>, <code>cut</code> or <code>tr</code> programs don't have to know anything about cats<m:podČarou>n.b. the cat in the command name is a different cat than in our text file</m:podČarou> and dogs and our business domain.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    53
			They can focus on their tasks which are reading files, filtering by regular expressions, doing some substrings and text conversions. And they do it well without being distracted by any animals.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    54
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    55
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    56
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    57
			And we don't have to know anything about the low-level programming in the C language or compile anything.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    58
			We just simply build a pipeline in a shell (e.g. GNU Bash) from existing programs and focus on our business logic.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    59
			And we do it well without being distracted by any low-level issues.
87
25dec6931f18 Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents: 23
diff changeset
    60
		</p>
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    61
		
145
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    62
		<p>
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    63
			Each program used in the pipeline can be written in different programming language and they will work together.
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    64
			Tools written in C, C++, Java, Lisp, Perl, Python, Rust or any other language can be combined together.
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    65
			Thus optimal language can be used for each task.
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    66
		</p>
42bbbccd87f3 small fixes and improvements
František Kučera <franta-hg@frantovo.cz>
parents: 144
diff changeset
    67
		
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    68
		<h2>The pitfalls</h2>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    69
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    70
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    71
			This simple example looks quite flawlessly.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    72
			But actually it is very brittle.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    73
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    74
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    75
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    76
			What if we have a very big cat that can be described by this line in our file?
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    77
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    78
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    79
		<pre>dog-sized red cat</pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    80
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    81
		<p>In the second step of the pipeline (<code>grep</code>) we will include this record and the final result will be:</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    82
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    83
		<pre><![CDATA[RED
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    84
YELLOW
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    85
WHITE]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    86
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    87
		<p>Which is really unexpected and unwanted result. We don't have a RED dog and this is just an accident. The same would happen if we have a monkey of a <em>doggish</em> color.</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    88
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    89
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    90
			This problem is caused by the fact that the <code>grep dog</code> filters lines containing the word <em>dog</em> regardless its position (first, second or third field).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    91
			Sometimes we could avoid such problems by a bit more complicated regular expression and/or by using Perl, but our pipeline wouldn't be as simple and legible as before.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    92
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    93
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    94
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    95
			What if we have a turtle that has lighter color than other turtles?
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    96
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    97
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    98
		<pre>small light green turtle</pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
    99
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   100
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   101
			If we do <code>grep turtle</code> it will work well in this case, but our pipeline will fail in the third step where the <code>cut</code> will select only <em>light</em> (instead of <em>light green</em>).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   102
			And the final result will be:
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   103
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   104
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   105
		<pre><![CDATA[GREEN
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   106
LIGHT]]></pre>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   107
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   108
		<p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   109
			Which is definitively wrong because the second turtle is not LIGHT, it is LIGHT GREEN.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   110
			This problem is caused by the fact that we don't have a well-defined separators between fields.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   111
			Sometimes we could avoid such problems by restrictions/presumptions e.g. <em>the color must not contain a space character</em> (we could replace spaces by hyphens).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   112
			Or we could use some other field delimiter e.g. ; or | or ,. But still we would not be able to use such character in the field values.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   113
			So we must invent some kind of escaping (like <code>\;</code> is not a separator but a part of the field value)
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   114
			or add some quotes/apostrophes (which still requires escaping, because what if we have e.g. name field containing an apostrophe?).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   115
			And parsing such inputs by classic tools and regular expressions is not easy and sometimes even not possible.
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   116
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   117
		
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   118
		<p>
151
5697a01db388 roadmap
František Kučera <franta-hg@frantovo.cz>
parents: 146
diff changeset
   119
			There are also other problems like character encoding, missing meta-data (e.g. field names and types), joining multiple files (Is there always a new-line character at the end of the file? Or is there some nasty BOM at the beginning of the file?)
144
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   120
			or passing several types of data in a single stream (we have list of animals and we can have e.g. also a list of foods or list of our staff where each list has different fields).
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   121
		</p>
ee7e96151673 classic pipeline example
František Kučera <franta-hg@frantovo.cz>
parents: 140
diff changeset
   122
87
25dec6931f18 Lepší odsazení, tabulátory.
František Kučera <franta-hg@frantovo.cz>
parents: 23
diff changeset
   123
	</text>
4
1bb39595a51c genrování hlavní nabídky #1
František Kučera <franta-hg@frantovo.cz>
parents: 2
diff changeset
   124
1
a05c6f3cbc3e základ, první verze
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   125
</stránka>