# HG changeset patch # User František Kučera <franta-hg@frantovo.cz> # Date 1549541309 -3600 # Node ID fde0cd94fde67853ebb07ec5873f40422ce313fc # Parent 4919c8098008a9935e4777312b38aee2b20b2313 guile: Doing projections with Guile diff -r 4919c8098008 -r fde0cd94fde6 relpipe-data/examples-guile-projections.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-guile-projections.xml Thu Feb 07 13:08:29 2019 +0100 @@ -0,0 +1,218 @@ +<stránka + xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" + xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> + + <nadpis>Doing projections with Guile</nadpis> + <perex>modifying attribute values and adding new attributes or removing them</perex> + <m:pořadí-příkladu>01500</m:pořadí-příkladu> + + <text xmlns="http://www.w3.org/1999/xhtml"> + + <p> + The <code>relpipe-tr-guile</code> can not only filter records, + but can also modify them and even modify the structure of the relation – add or remove attributes. + + </p> + + <h2>Sample data</h2> + + <p>We have some CSV file:</p> + + <m:pre jazyk="text" src="examples/guile-1.csv"/> + + <p>and we convert it to a relation called <code>n</code>:</p> + + <m:pre jazyk="bash"><![CDATA[cat guile-1.csv \ + | relpipe-in-csv n id integer name string a integer b integer c integer \ + | relpipe-out-tabular]]></m:pre> + + <p>which printed as a table looks like this:</p> + + <m:pre jazyk="text"><![CDATA[n: + ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮ + │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ + ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤ + │ 1 │ first │ 1 │ 2 │ 3 │ + │ 2 │ second │ 2 │ 10 │ 1024 │ + │ 3 │ third │ 4 │ 4 │ 16 │ + ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯ +Record count: 3]]></m:pre> + + <p> + Because it is annoying to write some code again and again, we will create a shell function and (re)use it later: + </p> + + <m:pre jazyk="bash"><![CDATA[sample-data() { + cat guile-1.csv \ + | relpipe-in-csv n id integer name string a integer b integer c integer; +}]]></m:pre> + + <p> + Another option is storing the relational data in a file and then reading this file. + Files are better option, if the transformation is costly and we do not need live/fresh data. + </p> + + <h2>Modifying attribute values</h2> + + <p> + Then, we can modify such relation using Guile – e.g. we can make the <code>name</code> uppercase and increase <code>id</code> by 1000: + </p> + + <m:pre jazyk="bash"><![CDATA[sample-data \ + | relpipe-tr-guile \ + --relation n \ + --for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \ + | relpipe-out-tabular]]></m:pre> + + <p>So we have:</p> + + <m:pre jazyk="text"><![CDATA[n: + ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮ + │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ + ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤ + │ 1001 │ FIRST │ 1 │ 2 │ 3 │ + │ 1002 │ SECOND │ 2 │ 10 │ 1024 │ + │ 1003 │ THIRD │ 4 │ 4 │ 16 │ + ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯ +Record count: 3]]></m:pre> + + + <h2>Removing attributes</h2> + + <p> + The relation on the output might have different structure that the relation on the input. + We can keep only some of the original attributes: + </p> + + <m:pre jazyk="bash"><![CDATA[sample-data \ + | relpipe-tr-guile \ + --relation n \ + --for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \ + --output-attribute 'id' integer \ + --output-attribute 'name' string \ + | relpipe-out-tabular]]></m:pre> + + <p>and have:</p> + + <m:pre jazyk="text"><![CDATA[n: + ╭──────────────┬───────────────╮ + │ id (integer) │ name (string) │ + ├──────────────┼───────────────┤ + │ 1001 │ FIRST │ + │ 1002 │ SECOND │ + │ 1003 │ THIRD │ + ╰──────────────┴───────────────╯ +Record count: 3]]></m:pre> + + <h2>Adding attributes</h2> + + <p> + If we do not want to completely redefine the structure of the relation, + we can keep all original attributes and just add definitions of some others: + </p> + + <m:pre jazyk="bash"><![CDATA[sample-data \ + | relpipe-tr-guile \ + --relation n \ + --for-each '(define $sum (+ $a $b $c) )' \ + --output-attribute 'sum' integer \ + --input-attributes-prepend \ + | relpipe-out-tabular]]></m:pre> + + <p>so we have a completely new attribute containing the sum of <code>a</code>, <code>b</code> and <code>c</code>:</p> + + <m:pre jazyk="text"><![CDATA[n: + ╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────┬───────────────╮ + │ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ sum (integer) │ + ├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┼───────────────┤ + │ 1 │ first │ 1 │ 2 │ 3 │ 6 │ + │ 2 │ second │ 2 │ 10 │ 1024 │ 1036 │ + │ 3 │ third │ 4 │ 4 │ 16 │ 24 │ + ╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────┴───────────────╯ +Record count: 3]]></m:pre> + + <p> + We can change the attribute order by using <code>--input-attributes-append</code> + instead of <code>--input-attributes-prepend</code>. + </p> + + <h2>Changing the attribute type</h2> + + <p> + Each attribute has a data type (integer, string…). + And we can change the type. Of course we have to modify the data, because we can not put e.g. string value into an integer attribute. + </p> + + <m:pre jazyk="bash"><![CDATA[sample-data \ + | relpipe-tr-guile \ + --relation n \ + --for-each '(define $id (string-upcase $name) )' \ + --output-attribute 'id' string \ + --output-attribute 'a' integer \ + --output-attribute 'b' integer \ + --output-attribute 'c' integer \ + | relpipe-out-tabular]]></m:pre> + + <p> + The code above changed the type of <code>id</code> attribute from integer to string + and put uppercase <code>name</code> into it: + </p> + + <m:pre jazyk="text"><![CDATA[n: + ╭─────────────┬─────────────┬─────────────┬─────────────╮ + │ id (string) │ a (integer) │ b (integer) │ c (integer) │ + ├─────────────┼─────────────┼─────────────┼─────────────┤ + │ FIRST │ 1 │ 2 │ 3 │ + │ SECOND │ 2 │ 10 │ 1024 │ + │ THIRD │ 4 │ 4 │ 16 │ + ╰─────────────┴─────────────┴─────────────┴─────────────╯ +Record count: 3]]></m:pre> + + + <h2>Projection and restriction</h2> + + <p> + We can do projection and restriction at the same time, during the same transformation: + </p> + + <m:pre jazyk="bash"><![CDATA[sample-data \ + | relpipe-tr-guile \ + --relation n \ + --for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \ + --output-attribute 'id' integer \ + --output-attribute 'name' string \ + --where '(= $c (* $a $b) )' \ + | relpipe-out-tabular]]></m:pre> + + <p>and have:</p> + + <m:pre jazyk="bash"><![CDATA[n: + ╭──────────────┬───────────────╮ + │ id (integer) │ name (string) │ + ├──────────────┼───────────────┤ + │ 1003 │ THIRD │ + ╰──────────────┴───────────────╯ +Record count: 1]]></m:pre> + + <p> + And if we use <code>expt</code> instead of <code>*</code>, we will get SECOND instead of THIRD. + </p> + + <p>The example above has its SQL equivalent:</p> + + <m:pre jazyk="sql"><![CDATA[SELECT + id + 1000 AS id, + upper(name) AS name +FROM n +WHERE c = (a * b);]]></m:pre> + + <p> + The difference is that <m:name/> do not require data to be stored anywhere, + because we (by default) process streams on the fly. + Thus one process can generate data, second one can transform them and the third one can convert them to some output format. + All processes are running at the same time and without need to cache all data at once. + </p> + + </text> + +</stránka> diff -r 4919c8098008 -r fde0cd94fde6 relpipe-data/examples/guile-1.csv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples/guile-1.csv Thu Feb 07 13:08:29 2019 +0100 @@ -0,0 +1,3 @@ +1,first,1,2,3 +2,second,2,10,1024 +3,third,4,4,16