relpipe-data/examples-guile-projections.xml
branchv_0
changeset 246 fde0cd94fde6
child 316 d7ae02390fac
equal deleted inserted replaced
245:4919c8098008 246:fde0cd94fde6
       
     1 <stránka
       
     2 	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
       
     3 	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
       
     4 	
       
     5 	<nadpis>Doing projections with Guile</nadpis>
       
     6 	<perex>modifying attribute values and adding new attributes or removing them</perex>
       
     7 	<m:pořadí-příkladu>01500</m:pořadí-příkladu>
       
     8 
       
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
       
    10 		
       
    11 		<p>
       
    12 			The <code>relpipe-tr-guile</code> can not only filter records,
       
    13 			but can also modify them and even modify the structure of the relation – add or remove attributes.
       
    14 			
       
    15 		</p>
       
    16 		
       
    17 		<h2>Sample data</h2>
       
    18 		
       
    19 		<p>We have some CSV file:</p>
       
    20 		
       
    21 		<m:pre jazyk="text" src="examples/guile-1.csv"/>
       
    22 		
       
    23 		<p>and we convert it to a relation called <code>n</code>:</p>
       
    24 		
       
    25 		<m:pre jazyk="bash"><![CDATA[cat guile-1.csv \
       
    26 	| relpipe-in-csv n id integer name string a integer b integer c integer \
       
    27 	| relpipe-out-tabular]]></m:pre>
       
    28 		
       
    29 		<p>which printed as a table looks like this:</p>
       
    30 
       
    31 		<m:pre jazyk="text"><![CDATA[n:
       
    32  ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮
       
    33  │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │
       
    34  ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤
       
    35  │            1 │ first         │           1 │           2 │           3 │
       
    36  │            2 │ second        │           2 │          10 │        1024 │
       
    37  │            3 │ third         │           4 │           4 │          16 │
       
    38  ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯
       
    39 Record count: 3]]></m:pre>
       
    40 
       
    41 		<p>
       
    42 			Because it is annoying to write some code again and again, we will create a shell function and (re)use it later:
       
    43 		</p>
       
    44 
       
    45 		<m:pre jazyk="bash"><![CDATA[sample-data() {
       
    46 	cat guile-1.csv \
       
    47 		| relpipe-in-csv n id integer name string a integer b integer c integer;
       
    48 }]]></m:pre>
       
    49 
       
    50 		<p>
       
    51 			Another option is storing the relational data in a file and then reading this file.
       
    52 			Files are better option, if the transformation is costly and we do not need live/fresh data.
       
    53 		</p>
       
    54 
       
    55 		<h2>Modifying attribute values</h2>
       
    56 
       
    57 		<p>
       
    58 			Then, we can modify such relation using Guile – e.g. we can make the <code>name</code> uppercase and increase <code>id</code> by 1000:
       
    59 		</p>
       
    60 		
       
    61 		<m:pre jazyk="bash"><![CDATA[sample-data \
       
    62 	| relpipe-tr-guile \
       
    63 		--relation n \
       
    64 		--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
       
    65 	| relpipe-out-tabular]]></m:pre>
       
    66 	
       
    67 		<p>So we have:</p>
       
    68 
       
    69 		<m:pre jazyk="text"><![CDATA[n:
       
    70  ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮
       
    71  │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │
       
    72  ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤
       
    73  │         1001 │ FIRST         │           1 │           2 │           3 │
       
    74  │         1002 │ SECOND        │           2 │          10 │        1024 │
       
    75  │         1003 │ THIRD         │           4 │           4 │          16 │
       
    76  ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯
       
    77 Record count: 3]]></m:pre>
       
    78 
       
    79 
       
    80 		<h2>Removing attributes</h2>
       
    81 	
       
    82 		<p>
       
    83 			The relation on the output might have different structure that the relation on the input.
       
    84 			We can keep only some of the original attributes:
       
    85 		</p>
       
    86 	
       
    87 		<m:pre jazyk="bash"><![CDATA[sample-data \
       
    88 	| relpipe-tr-guile \
       
    89 		--relation n \
       
    90 		--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
       
    91 		--output-attribute 'id' integer \
       
    92 		--output-attribute 'name' string \
       
    93 	| relpipe-out-tabular]]></m:pre>
       
    94 	
       
    95 		<p>and have:</p>
       
    96 	
       
    97 		<m:pre jazyk="text"><![CDATA[n:
       
    98  ╭──────────────┬───────────────╮
       
    99  │ id (integer) │ name (string) │
       
   100  ├──────────────┼───────────────┤
       
   101  │         1001 │ FIRST         │
       
   102  │         1002 │ SECOND        │
       
   103  │         1003 │ THIRD         │
       
   104  ╰──────────────┴───────────────╯
       
   105 Record count: 3]]></m:pre>
       
   106 
       
   107 		<h2>Adding attributes</h2>
       
   108 		
       
   109 		<p>
       
   110 			If we do not want to completely redefine the structure of the relation,
       
   111 			we can keep all original attributes and just add definitions of some others:			
       
   112 		</p>
       
   113 		
       
   114 		<m:pre jazyk="bash"><![CDATA[sample-data \
       
   115 	| relpipe-tr-guile \
       
   116 		--relation n \
       
   117 		--for-each '(define $sum (+ $a $b $c) )' \
       
   118 		--output-attribute 'sum' integer \
       
   119 		--input-attributes-prepend \
       
   120 	| relpipe-out-tabular]]></m:pre>
       
   121 
       
   122 		<p>so we have a completely new attribute containing the sum of <code>a</code>, <code>b</code> and <code>c</code>:</p>
       
   123 
       
   124 		<m:pre jazyk="text"><![CDATA[n:
       
   125  ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────┬───────────────╮
       
   126  │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ sum (integer) │
       
   127  ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┼───────────────┤
       
   128  │            1 │ first         │           1 │           2 │           3 │             6 │
       
   129  │            2 │ second        │           2 │          10 │        1024 │          1036 │
       
   130  │            3 │ third         │           4 │           4 │          16 │            24 │
       
   131  ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────┴───────────────╯
       
   132 Record count: 3]]></m:pre>
       
   133 
       
   134 		<p>
       
   135 			We can change the attribute order by using <code>--input-attributes-append</code>
       
   136 			instead of <code>--input-attributes-prepend</code>.
       
   137 		</p>
       
   138 		
       
   139 		<h2>Changing the attribute type</h2>
       
   140 		
       
   141 		<p>
       
   142 			Each attribute has a data type (integer, string…).
       
   143 			And we can change the type. Of course we have to modify the data, because we can not put e.g. string value into an integer attribute.
       
   144 		</p>
       
   145 		
       
   146 		<m:pre jazyk="bash"><![CDATA[sample-data \
       
   147 	| relpipe-tr-guile \
       
   148 		--relation n \
       
   149 		--for-each '(define $id (string-upcase $name) )' \
       
   150 		--output-attribute 'id' string \
       
   151 		--output-attribute 'a' integer \
       
   152 		--output-attribute 'b' integer \
       
   153 		--output-attribute 'c' integer \
       
   154 	| relpipe-out-tabular]]></m:pre>
       
   155 	
       
   156 		<p>
       
   157 			The code above changed the type of <code>id</code> attribute from integer to string
       
   158 			and put uppercase <code>name</code> into it:
       
   159 		</p>
       
   160 	
       
   161 		<m:pre jazyk="text"><![CDATA[n:
       
   162  ╭─────────────┬─────────────┬─────────────┬─────────────╮
       
   163  │ id (string) │ a (integer) │ b (integer) │ c (integer) │
       
   164  ├─────────────┼─────────────┼─────────────┼─────────────┤
       
   165  │ FIRST       │           1 │           2 │           3 │
       
   166  │ SECOND      │           2 │          10 │        1024 │
       
   167  │ THIRD       │           4 │           4 │          16 │
       
   168  ╰─────────────┴─────────────┴─────────────┴─────────────╯
       
   169 Record count: 3]]></m:pre>
       
   170 
       
   171 
       
   172 		<h2>Projection and restriction</h2>
       
   173 		
       
   174 		<p>
       
   175 			We can do projection and restriction at the same time, during the same transformation:
       
   176 		</p>
       
   177 		
       
   178 		<m:pre jazyk="bash"><![CDATA[sample-data \
       
   179 	| relpipe-tr-guile \
       
   180 		--relation n \
       
   181 		--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
       
   182 		--output-attribute 'id' integer \
       
   183 		--output-attribute 'name' string \
       
   184 		--where '(= $c (* $a $b) )' \
       
   185 	| relpipe-out-tabular]]></m:pre>
       
   186 
       
   187 		<p>and have:</p>
       
   188 
       
   189 		<m:pre jazyk="bash"><![CDATA[n:
       
   190  ╭──────────────┬───────────────╮
       
   191  │ id (integer) │ name (string) │
       
   192  ├──────────────┼───────────────┤
       
   193  │         1003 │ THIRD         │
       
   194  ╰──────────────┴───────────────╯
       
   195 Record count: 1]]></m:pre>
       
   196 
       
   197 		<p>
       
   198 			And if we use <code>expt</code> instead of <code>*</code>, we will get SECOND instead of THIRD.
       
   199 		</p>
       
   200 		
       
   201 		<p>The example above has its SQL equivalent:</p>
       
   202 		
       
   203 		<m:pre jazyk="sql"><![CDATA[SELECT
       
   204 	id + 1000 AS id,
       
   205 	upper(name) AS name
       
   206 FROM n
       
   207 WHERE c = (a * b);]]></m:pre>
       
   208 
       
   209 		<p>
       
   210 			The difference is that <m:name/> do not require data to be stored anywhere,
       
   211 			because we (by default) process streams on the fly.
       
   212 			Thus one process can generate data, second one can transform them and the third one can convert them to some output format.
       
   213 			All processes are running at the same time and without need to cache all data at once.
       
   214 		</p>
       
   215 		
       
   216 	</text>
       
   217 
       
   218 </stránka>