relpipe-data/examples-awk-filtering.xml
branchv_0
changeset 258 2868d772c27e
parent 245 4919c8098008
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-awk-filtering.xml	Tue May 28 21:18:20 2019 +0200
@@ -0,0 +1,108 @@
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	
+	<nadpis>Complex filtering with AWK</nadpis>
+	<perex>filtering records with AND, OR and functions</perex>
+	<m:pořadí-příkladu>02100</m:pořadí-příkladu>
+
+	<text xmlns="http://www.w3.org/1999/xhtml">
+
+		<p>
+			If we need more complex filtering than <code>relpipe-tr-grep</code> can offer, we can write an AWK transformation.
+			Then we can use AND and OR operators and functions like regular expression matching or numerical formulas.
+		</p>
+		
+		<p>
+			The tool <code>relpipe-tr-awk</code> calls real AWK program (usually GNU AWK) installed on our system and passes data of given relation to it.
+			Thus we can use any AWK feature in our pipeline while processing relational data.
+			Relational attributes are mapped to AWK variables, so we can reference them by their names instead of mere field numbers.
+		</p>
+		
+		<p>
+			The <code>--for-each</code> option is used for both filtering (instead of <code>--where</code>) 
+			and arbitrary code execution (for data modifications, adding records, computations or intentional side effects).
+			In AWK, filtering conditions are surrounded by <code>(…)</code> and actions by <code>{…}</code>.
+			Both can be combined together and multiple expressions can be separated by <code>;</code> semicolon.
+			The <code>record()</code> function should be called instead of AWK <code>print</code> (which should never be used directly).
+			Calling <code>record()</code> is not necessary, when only filtering is done (and there are no data modifications).
+		</p>
+		
+		<h2>Filtering numbers</h2>
+		
+		<p>With AWK we can filter records using standard numeric operators like ==, &lt;, &gt;, &gt;= etc.</p>
+		
+		<m:pre jazyk="bash"><![CDATA[find -print0 | relpipe-in-filesystem \
+	| relpipe-tr-awk \
+		--relation '.*' \
+			--for-each '(size > 2000)' \
+	| relpipe-out-tabular]]></m:pre>
+	
+		<p>and e.g. list files with certain sizes:</p>
+		
+		<pre><![CDATA[filesystem:
+ ╭──────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
+ │ path        (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
+ ├──────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
+ │ ./relpipe-tr-awk.cpp │ f             │           2880 │ hacker         │ hacker         │
+ │ ./CLIParser.h        │ f             │           5264 │ hacker         │ hacker         │
+ │ ./AwkHandler.h       │ f             │          17382 │ hacker         │ hacker         │
+ ╰──────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
+Record count: 3]]></pre>
+
+
+		<h2>Filtering strings</h2>
+		
+		<p>String values can be searched for certain regular expression:</p>
+		
+		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
+	| relpipe-tr-awk \
+		--relation '.*' \
+			--for-each '(mount_point ~ /cdrom/)' \
+	| relpipe-out-tabular]]></m:pre>
+	
+		<p>e.g. <code>fstab</code> records having <code>cdrom</code> in the <code>mount_point</code>:</p>
+	
+		<pre><![CDATA[fstab:
+ ╭─────────────────┬─────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮
+ │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
+ ├─────────────────┼─────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤
+ │                 │ /dev/sr0        │ /media/cdrom0        │ udf,iso9660   │ user,noauto      │              0 │              0 │
+ ╰─────────────────┴─────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯
+Record count: 1]]></pre>
+
+		<p>Case-insensitive search can be switched on by adding:</p>
+		
+		<pre>--define IGNORECASE integer 1</pre>
+		
+		<h2>AND and OR</h2>
+		
+		<p>We can combine multiple conditions using <code>||</code> and <code>&amp;&amp;</code> logical operators:</p>
+		
+		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
+	| relpipe-tr-awk \
+		--relation '.*' \
+			--for-each '(type == "btrfs" || pass == 1)' \
+	| relpipe-out-tabular]]></m:pre>
+	
+		<p>and build arbitrary complex filters</p>
+	
+		<pre><![CDATA[fstab:
+ ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮
+ │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options                      (string) │ dump (integer) │ pass (integer) │
+ ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤
+ │ UUID            │ 29758270-fd25-4a6c-a7bb-9a18302816af │ /                    │ ext4          │ relatime,user_xattr,errors=remount-ro │              0 │              1 │
+ │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime                              │              0 │              2 │
+ ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯
+Record count: 2]]></pre>
+
+		<p>Nested <code>(…)</code> work as expected.</p>
+
+		<p>
+			And AWK can do much more – it offers plenty of functions and language constructs that we can use in our transformations.
+			Comperhensive documentation can be found here: <a href="https://www.gnu.org/software/gawk/manual/">Gawk: Effective AWK Programming</a>.
+		</p>
+		
+	</text>
+
+</stránka>