|
1 <stránka |
|
2 xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" |
|
3 xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> |
|
4 |
|
5 <nadpis>Doing projections with Guile</nadpis> |
|
6 <perex>modifying attribute values and adding new attributes or removing them</perex> |
|
7 <m:pořadí-příkladu>01500</m:pořadí-příkladu> |
|
8 |
|
9 <text xmlns="http://www.w3.org/1999/xhtml"> |
|
10 |
|
11 <p> |
|
12 The <code>relpipe-tr-guile</code> can not only filter records, |
|
13 but can also modify them and even modify the structure of the relation – add or remove attributes. |
|
14 |
|
15 </p> |
|
16 |
|
17 <h2>Sample data</h2> |
|
18 |
|
19 <p>We have some CSV file:</p> |
|
20 |
|
21 <m:pre jazyk="text" src="examples/guile-1.csv"/> |
|
22 |
|
23 <p>and we convert it to a relation called <code>n</code>:</p> |
|
24 |
|
25 <m:pre jazyk="bash"><![CDATA[cat guile-1.csv \ |
|
26 | relpipe-in-csv n id integer name string a integer b integer c integer \ |
|
27 | relpipe-out-tabular]]></m:pre> |
|
28 |
|
29 <p>which printed as a table looks like this:</p> |
|
30 |
|
31 <m:pre jazyk="text"><![CDATA[n: |
|
32 ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮ |
|
33 │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ |
|
34 ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤ |
|
35 │ 1 │ first │ 1 │ 2 │ 3 │ |
|
36 │ 2 │ second │ 2 │ 10 │ 1024 │ |
|
37 │ 3 │ third │ 4 │ 4 │ 16 │ |
|
38 ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯ |
|
39 Record count: 3]]></m:pre> |
|
40 |
|
41 <p> |
|
42 Because it is annoying to write some code again and again, we will create a shell function and (re)use it later: |
|
43 </p> |
|
44 |
|
45 <m:pre jazyk="bash"><![CDATA[sample-data() { |
|
46 cat guile-1.csv \ |
|
47 | relpipe-in-csv n id integer name string a integer b integer c integer; |
|
48 }]]></m:pre> |
|
49 |
|
50 <p> |
|
51 Another option is storing the relational data in a file and then reading this file. |
|
52 Files are better option, if the transformation is costly and we do not need live/fresh data. |
|
53 </p> |
|
54 |
|
55 <h2>Modifying attribute values</h2> |
|
56 |
|
57 <p> |
|
58 Then, we can modify such relation using Guile – e.g. we can make the <code>name</code> uppercase and increase <code>id</code> by 1000: |
|
59 </p> |
|
60 |
|
61 <m:pre jazyk="bash"><![CDATA[sample-data \ |
|
62 | relpipe-tr-guile \ |
|
63 --relation n \ |
|
64 --for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \ |
|
65 | relpipe-out-tabular]]></m:pre> |
|
66 |
|
67 <p>So we have:</p> |
|
68 |
|
69 <m:pre jazyk="text"><![CDATA[n: |
|
70 ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮ |
|
71 │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ |
|
72 ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤ |
|
73 │ 1001 │ FIRST │ 1 │ 2 │ 3 │ |
|
74 │ 1002 │ SECOND │ 2 │ 10 │ 1024 │ |
|
75 │ 1003 │ THIRD │ 4 │ 4 │ 16 │ |
|
76 ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯ |
|
77 Record count: 3]]></m:pre> |
|
78 |
|
79 |
|
80 <h2>Removing attributes</h2> |
|
81 |
|
82 <p> |
|
83 The relation on the output might have different structure that the relation on the input. |
|
84 We can keep only some of the original attributes: |
|
85 </p> |
|
86 |
|
87 <m:pre jazyk="bash"><![CDATA[sample-data \ |
|
88 | relpipe-tr-guile \ |
|
89 --relation n \ |
|
90 --for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \ |
|
91 --output-attribute 'id' integer \ |
|
92 --output-attribute 'name' string \ |
|
93 | relpipe-out-tabular]]></m:pre> |
|
94 |
|
95 <p>and have:</p> |
|
96 |
|
97 <m:pre jazyk="text"><![CDATA[n: |
|
98 ╭──────────────┬───────────────╮ |
|
99 │ id (integer) │ name (string) │ |
|
100 ├──────────────┼───────────────┤ |
|
101 │ 1001 │ FIRST │ |
|
102 │ 1002 │ SECOND │ |
|
103 │ 1003 │ THIRD │ |
|
104 ╰──────────────┴───────────────╯ |
|
105 Record count: 3]]></m:pre> |
|
106 |
|
107 <h2>Adding attributes</h2> |
|
108 |
|
109 <p> |
|
110 If we do not want to completely redefine the structure of the relation, |
|
111 we can keep all original attributes and just add definitions of some others: |
|
112 </p> |
|
113 |
|
114 <m:pre jazyk="bash"><![CDATA[sample-data \ |
|
115 | relpipe-tr-guile \ |
|
116 --relation n \ |
|
117 --for-each '(define $sum (+ $a $b $c) )' \ |
|
118 --output-attribute 'sum' integer \ |
|
119 --input-attributes-prepend \ |
|
120 | relpipe-out-tabular]]></m:pre> |
|
121 |
|
122 <p>so we have a completely new attribute containing the sum of <code>a</code>, <code>b</code> and <code>c</code>:</p> |
|
123 |
|
124 <m:pre jazyk="text"><![CDATA[n: |
|
125 ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────┬───────────────╮ |
|
126 │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ sum (integer) │ |
|
127 ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┼───────────────┤ |
|
128 │ 1 │ first │ 1 │ 2 │ 3 │ 6 │ |
|
129 │ 2 │ second │ 2 │ 10 │ 1024 │ 1036 │ |
|
130 │ 3 │ third │ 4 │ 4 │ 16 │ 24 │ |
|
131 ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────┴───────────────╯ |
|
132 Record count: 3]]></m:pre> |
|
133 |
|
134 <p> |
|
135 We can change the attribute order by using <code>--input-attributes-append</code> |
|
136 instead of <code>--input-attributes-prepend</code>. |
|
137 </p> |
|
138 |
|
139 <h2>Changing the attribute type</h2> |
|
140 |
|
141 <p> |
|
142 Each attribute has a data type (integer, string…). |
|
143 And we can change the type. Of course we have to modify the data, because we can not put e.g. string value into an integer attribute. |
|
144 </p> |
|
145 |
|
146 <m:pre jazyk="bash"><![CDATA[sample-data \ |
|
147 | relpipe-tr-guile \ |
|
148 --relation n \ |
|
149 --for-each '(define $id (string-upcase $name) )' \ |
|
150 --output-attribute 'id' string \ |
|
151 --output-attribute 'a' integer \ |
|
152 --output-attribute 'b' integer \ |
|
153 --output-attribute 'c' integer \ |
|
154 | relpipe-out-tabular]]></m:pre> |
|
155 |
|
156 <p> |
|
157 The code above changed the type of <code>id</code> attribute from integer to string |
|
158 and put uppercase <code>name</code> into it: |
|
159 </p> |
|
160 |
|
161 <m:pre jazyk="text"><![CDATA[n: |
|
162 ╭─────────────┬─────────────┬─────────────┬─────────────╮ |
|
163 │ id (string) │ a (integer) │ b (integer) │ c (integer) │ |
|
164 ├─────────────┼─────────────┼─────────────┼─────────────┤ |
|
165 │ FIRST │ 1 │ 2 │ 3 │ |
|
166 │ SECOND │ 2 │ 10 │ 1024 │ |
|
167 │ THIRD │ 4 │ 4 │ 16 │ |
|
168 ╰─────────────┴─────────────┴─────────────┴─────────────╯ |
|
169 Record count: 3]]></m:pre> |
|
170 |
|
171 |
|
172 <h2>Projection and restriction</h2> |
|
173 |
|
174 <p> |
|
175 We can do projection and restriction at the same time, during the same transformation: |
|
176 </p> |
|
177 |
|
178 <m:pre jazyk="bash"><![CDATA[sample-data \ |
|
179 | relpipe-tr-guile \ |
|
180 --relation n \ |
|
181 --for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \ |
|
182 --output-attribute 'id' integer \ |
|
183 --output-attribute 'name' string \ |
|
184 --where '(= $c (* $a $b) )' \ |
|
185 | relpipe-out-tabular]]></m:pre> |
|
186 |
|
187 <p>and have:</p> |
|
188 |
|
189 <m:pre jazyk="bash"><![CDATA[n: |
|
190 ╭──────────────┬───────────────╮ |
|
191 │ id (integer) │ name (string) │ |
|
192 ├──────────────┼───────────────┤ |
|
193 │ 1003 │ THIRD │ |
|
194 ╰──────────────┴───────────────╯ |
|
195 Record count: 1]]></m:pre> |
|
196 |
|
197 <p> |
|
198 And if we use <code>expt</code> instead of <code>*</code>, we will get SECOND instead of THIRD. |
|
199 </p> |
|
200 |
|
201 <p>The example above has its SQL equivalent:</p> |
|
202 |
|
203 <m:pre jazyk="sql"><![CDATA[SELECT |
|
204 id + 1000 AS id, |
|
205 upper(name) AS name |
|
206 FROM n |
|
207 WHERE c = (a * b);]]></m:pre> |
|
208 |
|
209 <p> |
|
210 The difference is that <m:name/> do not require data to be stored anywhere, |
|
211 because we (by default) process streams on the fly. |
|
212 Thus one process can generate data, second one can transform them and the third one can convert them to some output format. |
|
213 All processes are running at the same time and without need to cache all data at once. |
|
214 </p> |
|
215 |
|
216 </text> |
|
217 |
|
218 </stránka> |