guile: Doing projections with Guile v_0
authorFrantišek Kučera <franta-hg@frantovo.cz>
Thu, 07 Feb 2019 13:08:29 +0100
branchv_0
changeset 246 fde0cd94fde6
parent 245 4919c8098008
child 247 087b8621fb3e
guile: Doing projections with Guile
relpipe-data/examples-guile-projections.xml
relpipe-data/examples/guile-1.csv
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-guile-projections.xml	Thu Feb 07 13:08:29 2019 +0100
@@ -0,0 +1,218 @@
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	
+	<nadpis>Doing projections with Guile</nadpis>
+	<perex>modifying attribute values and adding new attributes or removing them</perex>
+	<m:pořadí-příkladu>01500</m:pořadí-příkladu>
+
+	<text xmlns="http://www.w3.org/1999/xhtml">
+		
+		<p>
+			The <code>relpipe-tr-guile</code> can not only filter records,
+			but can also modify them and even modify the structure of the relation – add or remove attributes.
+			
+		</p>
+		
+		<h2>Sample data</h2>
+		
+		<p>We have some CSV file:</p>
+		
+		<m:pre jazyk="text" src="examples/guile-1.csv"/>
+		
+		<p>and we convert it to a relation called <code>n</code>:</p>
+		
+		<m:pre jazyk="bash"><![CDATA[cat guile-1.csv \
+	| relpipe-in-csv n id integer name string a integer b integer c integer \
+	| relpipe-out-tabular]]></m:pre>
+		
+		<p>which printed as a table looks like this:</p>
+
+		<m:pre jazyk="text"><![CDATA[n:
+ ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮
+ │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │
+ ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤
+ │            1 │ first         │           1 │           2 │           3 │
+ │            2 │ second        │           2 │          10 │        1024 │
+ │            3 │ third         │           4 │           4 │          16 │
+ ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯
+Record count: 3]]></m:pre>
+
+		<p>
+			Because it is annoying to write some code again and again, we will create a shell function and (re)use it later:
+		</p>
+
+		<m:pre jazyk="bash"><![CDATA[sample-data() {
+	cat guile-1.csv \
+		| relpipe-in-csv n id integer name string a integer b integer c integer;
+}]]></m:pre>
+
+		<p>
+			Another option is storing the relational data in a file and then reading this file.
+			Files are better option, if the transformation is costly and we do not need live/fresh data.
+		</p>
+
+		<h2>Modifying attribute values</h2>
+
+		<p>
+			Then, we can modify such relation using Guile – e.g. we can make the <code>name</code> uppercase and increase <code>id</code> by 1000:
+		</p>
+		
+		<m:pre jazyk="bash"><![CDATA[sample-data \
+	| relpipe-tr-guile \
+		--relation n \
+		--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
+	| relpipe-out-tabular]]></m:pre>
+	
+		<p>So we have:</p>
+
+		<m:pre jazyk="text"><![CDATA[n:
+ ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮
+ │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │
+ ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤
+ │         1001 │ FIRST         │           1 │           2 │           3 │
+ │         1002 │ SECOND        │           2 │          10 │        1024 │
+ │         1003 │ THIRD         │           4 │           4 │          16 │
+ ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯
+Record count: 3]]></m:pre>
+
+
+		<h2>Removing attributes</h2>
+	
+		<p>
+			The relation on the output might have different structure that the relation on the input.
+			We can keep only some of the original attributes:
+		</p>
+	
+		<m:pre jazyk="bash"><![CDATA[sample-data \
+	| relpipe-tr-guile \
+		--relation n \
+		--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
+		--output-attribute 'id' integer \
+		--output-attribute 'name' string \
+	| relpipe-out-tabular]]></m:pre>
+	
+		<p>and have:</p>
+	
+		<m:pre jazyk="text"><![CDATA[n:
+ ╭──────────────┬───────────────╮
+ │ id (integer) │ name (string) │
+ ├──────────────┼───────────────┤
+ │         1001 │ FIRST         │
+ │         1002 │ SECOND        │
+ │         1003 │ THIRD         │
+ ╰──────────────┴───────────────╯
+Record count: 3]]></m:pre>
+
+		<h2>Adding attributes</h2>
+		
+		<p>
+			If we do not want to completely redefine the structure of the relation,
+			we can keep all original attributes and just add definitions of some others:			
+		</p>
+		
+		<m:pre jazyk="bash"><![CDATA[sample-data \
+	| relpipe-tr-guile \
+		--relation n \
+		--for-each '(define $sum (+ $a $b $c) )' \
+		--output-attribute 'sum' integer \
+		--input-attributes-prepend \
+	| relpipe-out-tabular]]></m:pre>
+
+		<p>so we have a completely new attribute containing the sum of <code>a</code>, <code>b</code> and <code>c</code>:</p>
+
+		<m:pre jazyk="text"><![CDATA[n:
+ ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────┬───────────────╮
+ │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ sum (integer) │
+ ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┼───────────────┤
+ │            1 │ first         │           1 │           2 │           3 │             6 │
+ │            2 │ second        │           2 │          10 │        1024 │          1036 │
+ │            3 │ third         │           4 │           4 │          16 │            24 │
+ ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────┴───────────────╯
+Record count: 3]]></m:pre>
+
+		<p>
+			We can change the attribute order by using <code>--input-attributes-append</code>
+			instead of <code>--input-attributes-prepend</code>.
+		</p>
+		
+		<h2>Changing the attribute type</h2>
+		
+		<p>
+			Each attribute has a data type (integer, string…).
+			And we can change the type. Of course we have to modify the data, because we can not put e.g. string value into an integer attribute.
+		</p>
+		
+		<m:pre jazyk="bash"><![CDATA[sample-data \
+	| relpipe-tr-guile \
+		--relation n \
+		--for-each '(define $id (string-upcase $name) )' \
+		--output-attribute 'id' string \
+		--output-attribute 'a' integer \
+		--output-attribute 'b' integer \
+		--output-attribute 'c' integer \
+	| relpipe-out-tabular]]></m:pre>
+	
+		<p>
+			The code above changed the type of <code>id</code> attribute from integer to string
+			and put uppercase <code>name</code> into it:
+		</p>
+	
+		<m:pre jazyk="text"><![CDATA[n:
+ ╭─────────────┬─────────────┬─────────────┬─────────────╮
+ │ id (string) │ a (integer) │ b (integer) │ c (integer) │
+ ├─────────────┼─────────────┼─────────────┼─────────────┤
+ │ FIRST       │           1 │           2 │           3 │
+ │ SECOND      │           2 │          10 │        1024 │
+ │ THIRD       │           4 │           4 │          16 │
+ ╰─────────────┴─────────────┴─────────────┴─────────────╯
+Record count: 3]]></m:pre>
+
+
+		<h2>Projection and restriction</h2>
+		
+		<p>
+			We can do projection and restriction at the same time, during the same transformation:
+		</p>
+		
+		<m:pre jazyk="bash"><![CDATA[sample-data \
+	| relpipe-tr-guile \
+		--relation n \
+		--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
+		--output-attribute 'id' integer \
+		--output-attribute 'name' string \
+		--where '(= $c (* $a $b) )' \
+	| relpipe-out-tabular]]></m:pre>
+
+		<p>and have:</p>
+
+		<m:pre jazyk="bash"><![CDATA[n:
+ ╭──────────────┬───────────────╮
+ │ id (integer) │ name (string) │
+ ├──────────────┼───────────────┤
+ │         1003 │ THIRD         │
+ ╰──────────────┴───────────────╯
+Record count: 1]]></m:pre>
+
+		<p>
+			And if we use <code>expt</code> instead of <code>*</code>, we will get SECOND instead of THIRD.
+		</p>
+		
+		<p>The example above has its SQL equivalent:</p>
+		
+		<m:pre jazyk="sql"><![CDATA[SELECT
+	id + 1000 AS id,
+	upper(name) AS name
+FROM n
+WHERE c = (a * b);]]></m:pre>
+
+		<p>
+			The difference is that <m:name/> do not require data to be stored anywhere,
+			because we (by default) process streams on the fly.
+			Thus one process can generate data, second one can transform them and the third one can convert them to some output format.
+			All processes are running at the same time and without need to cache all data at once.
+		</p>
+		
+	</text>
+
+</stránka>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/guile-1.csv	Thu Feb 07 13:08:29 2019 +0100
@@ -0,0 +1,3 @@
+1,first,1,2,3
+2,second,2,10,1024
+3,third,4,4,16