# HG changeset patch # User František Kučera # Date 1549541309 -3600 # Node ID fde0cd94fde67853ebb07ec5873f40422ce313fc # Parent 4919c8098008a9935e4777312b38aee2b20b2313 guile: Doing projections with Guile diff -r 4919c8098008 -r fde0cd94fde6 relpipe-data/examples-guile-projections.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples-guile-projections.xml Thu Feb 07 13:08:29 2019 +0100 @@ -0,0 +1,218 @@ + + + Doing projections with Guile + modifying attribute values and adding new attributes or removing them + 01500 + + + +

+ The relpipe-tr-guile can not only filter records, + but can also modify them and even modify the structure of the relation – add or remove attributes. + +

+ +

Sample data

+ +

We have some CSV file:

+ + + +

and we convert it to a relation called n:

+ + + +

which printed as a table looks like this:

+ + + +

+ Because it is annoying to write some code again and again, we will create a shell function and (re)use it later: +

+ + + +

+ Another option is storing the relational data in a file and then reading this file. + Files are better option, if the transformation is costly and we do not need live/fresh data. +

+ +

Modifying attribute values

+ +

+ Then, we can modify such relation using Guile – e.g. we can make the name uppercase and increase id by 1000: +

+ + + +

So we have:

+ + + + +

Removing attributes

+ +

+ The relation on the output might have different structure that the relation on the input. + We can keep only some of the original attributes: +

+ + + +

and have:

+ + + +

Adding attributes

+ +

+ If we do not want to completely redefine the structure of the relation, + we can keep all original attributes and just add definitions of some others: +

+ + + +

so we have a completely new attribute containing the sum of a, b and c:

+ + + +

+ We can change the attribute order by using --input-attributes-append + instead of --input-attributes-prepend. +

+ +

Changing the attribute type

+ +

+ Each attribute has a data type (integer, string…). + And we can change the type. Of course we have to modify the data, because we can not put e.g. string value into an integer attribute. +

+ + + +

+ The code above changed the type of id attribute from integer to string + and put uppercase name into it: +

+ + + + +

Projection and restriction

+ +

+ We can do projection and restriction at the same time, during the same transformation: +

+ + + +

and have:

+ + + +

+ And if we use expt instead of *, we will get SECOND instead of THIRD. +

+ +

The example above has its SQL equivalent:

+ + + +

+ The difference is that do not require data to be stored anywhere, + because we (by default) process streams on the fly. + Thus one process can generate data, second one can transform them and the third one can convert them to some output format. + All processes are running at the same time and without need to cache all data at once. +

+ +
+ +
diff -r 4919c8098008 -r fde0cd94fde6 relpipe-data/examples/guile-1.csv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/relpipe-data/examples/guile-1.csv Thu Feb 07 13:08:29 2019 +0100 @@ -0,0 +1,3 @@ +1,first,1,2,3 +2,second,2,10,1024 +3,third,4,4,16