examples: Aggregating data with Guile v_0
authorFrantišek Kučera <franta-hg@frantovo.cz>
Thu, 07 Feb 2019 16:24:21 +0100
branchv_0
changeset 248 e76ca9f7d6cb
parent 247 087b8621fb3e
child 249 ce8a4be95632
examples: Aggregating data with Guile
relpipe-data/examples-guile-aggregations.xml
relpipe-data/examples/guile-file-count-size-sum.sh
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-guile-aggregations.xml	Thu Feb 07 16:24:21 2019 +0100
@@ -0,0 +1,62 @@
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	
+	<nadpis>Aggregating data with Guile</nadpis>
+	<perex>counting records and computing sum</perex>
+	<m:pořadí-příkladu>01700</m:pořadí-příkladu>
+
+	<text xmlns="http://www.w3.org/1999/xhtml">
+		
+		<p>
+			In <code>relpipe-tr-guile</code> we can generate new records – not only modify records from the input.
+			There is <code>--has-more-records</code> option which – if evaluated as true – says: „read one more record from the Guile context and call me again“.
+			We can also suppress all original records by <code>--where '#f'</code>.
+			And we can also change the structure of the relation (see previous examples).
+			Thus we can iterate through a relation but completely replace its structure and content.
+		</p>
+		
+		<p>
+			What it is good for? We can do aggregations – we can count records, compute sum, maximum, minimum or average value etc.
+		</p>
+		
+		<m:pre jazyk="bash" src="examples/guile-file-count-size-sum.sh"/>
+		
+		<p>Usage example:</p>
+		
+		<m:pre jazyk="text"><![CDATA[$ ./guile-file-count-size-sum.sh /usr/share/icons/oxygen/
+filesystem:
+ ╭─────────────────┬───────────────╮
+ │ count (integer) │ sum (integer) │
+ ├─────────────────┼───────────────┤
+ │            6260 │      31091700 │
+ ╰─────────────────┴───────────────╯
+Record count: 1]]></m:pre>
+
+		<p>
+			In SQL same result can be achieved by:
+		</p>
+
+		<m:pre jazyk="sql"><![CDATA[SELECT
+	count(*) AS count,
+	sum(size) AS sum
+FROM filesystem;]]></m:pre>
+
+		<p>
+			This should be possible with <code>relpipe-tr-sql</code> in later versions.
+			SQL is much more declarative and for many cases a better tool.
+			In SQL we describe „how the result should look like“ instead of „how the result should be produced step by step“.			
+		</p>
+		
+		<p>
+			One day, there might also be a translator that parses SQL code and generates Guile code,
+			so we could have advantages of both worlds
+			a) concise and declarative syntax of SQL and 
+			b) streaming – which means no need for putting all the data in the RAM or on the disk.
+		</p>
+
+
+		
+	</text>
+
+</stránka>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/guile-file-count-size-sum.sh	Thu Feb 07 16:24:21 2019 +0100
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# argument: directory path
+# prints file count and sum of file sizes
+
+find "$1" -type f -print0 \
+	| relpipe-in-filesystem \
+		--file path \
+		--file size \
+	| relpipe-tr-guile \
+		--relation 'f.*' \
+		--output-attribute 'count' integer \
+		--output-attribute 'sum'   integer \
+		--before-records '
+			(define $sum   0)
+			(define $count 0)
+			(define return-sum #f)' \
+		--for-each '
+			(set! $sum   (+ $sum   $size) )
+			(set! $count (+ $count 1    ) )' \
+		--where '#f' \
+		--after-records '(set! return-sum #t)' \
+		--has-more-records '
+			(if return-sum
+				(begin (set! return-sum #f) #t)
+				#f
+			)' \
+	| relpipe-out-tabular