--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-awk-filtering.xml Tue May 28 21:18:20 2019 +0200
@@ -0,0 +1,108 @@
+<stránka
+ xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+ xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+
+ <nadpis>Complex filtering with AWK</nadpis>
+ <perex>filtering records with AND, OR and functions</perex>
+ <m:pořadí-příkladu>02100</m:pořadí-příkladu>
+
+ <text xmlns="http://www.w3.org/1999/xhtml">
+
+ <p>
+ If we need more complex filtering than <code>relpipe-tr-grep</code> can offer, we can write an AWK transformation.
+ Then we can use AND and OR operators and functions like regular expression matching or numerical formulas.
+ </p>
+
+ <p>
+ The tool <code>relpipe-tr-awk</code> calls real AWK program (usually GNU AWK) installed on our system and passes data of given relation to it.
+ Thus we can use any AWK feature in our pipeline while processing relational data.
+ Relational attributes are mapped to AWK variables, so we can reference them by their names instead of mere field numbers.
+ </p>
+
+ <p>
+ The <code>--for-each</code> option is used for both filtering (instead of <code>--where</code>)
+ and arbitrary code execution (for data modifications, adding records, computations or intentional side effects).
+ In AWK, filtering conditions are surrounded by <code>(…)</code> and actions by <code>{…}</code>.
+ Both can be combined together and multiple expressions can be separated by <code>;</code> semicolon.
+ The <code>record()</code> function should be called instead of AWK <code>print</code> (which should never be used directly).
+ Calling <code>record()</code> is not necessary, when only filtering is done (and there are no data modifications).
+ </p>
+
+ <h2>Filtering numbers</h2>
+
+ <p>With AWK we can filter records using standard numeric operators like ==, <, >, >= etc.</p>
+
+ <m:pre jazyk="bash"><![CDATA[find -print0 | relpipe-in-filesystem \
+ | relpipe-tr-awk \
+ --relation '.*' \
+ --for-each '(size > 2000)' \
+ | relpipe-out-tabular]]></m:pre>
+
+ <p>and e.g. list files with certain sizes:</p>
+
+ <pre><![CDATA[filesystem:
+ ╭──────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
+ │ path (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
+ ├──────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
+ │ ./relpipe-tr-awk.cpp │ f │ 2880 │ hacker │ hacker │
+ │ ./CLIParser.h │ f │ 5264 │ hacker │ hacker │
+ │ ./AwkHandler.h │ f │ 17382 │ hacker │ hacker │
+ ╰──────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
+Record count: 3]]></pre>
+
+
+ <h2>Filtering strings</h2>
+
+ <p>String values can be searched for certain regular expression:</p>
+
+ <m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
+ | relpipe-tr-awk \
+ --relation '.*' \
+ --for-each '(mount_point ~ /cdrom/)' \
+ | relpipe-out-tabular]]></m:pre>
+
+ <p>e.g. <code>fstab</code> records having <code>cdrom</code> in the <code>mount_point</code>:</p>
+
+ <pre><![CDATA[fstab:
+ ╭─────────────────┬─────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮
+ │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
+ ├─────────────────┼─────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤
+ │ │ /dev/sr0 │ /media/cdrom0 │ udf,iso9660 │ user,noauto │ 0 │ 0 │
+ ╰─────────────────┴─────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯
+Record count: 1]]></pre>
+
+ <p>Case-insensitive search can be switched on by adding:</p>
+
+ <pre>--define IGNORECASE integer 1</pre>
+
+ <h2>AND and OR</h2>
+
+ <p>We can combine multiple conditions using <code>||</code> and <code>&&</code> logical operators:</p>
+
+ <m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
+ | relpipe-tr-awk \
+ --relation '.*' \
+ --for-each '(type == "btrfs" || pass == 1)' \
+ | relpipe-out-tabular]]></m:pre>
+
+ <p>and build arbitrary complex filters</p>
+
+ <pre><![CDATA[fstab:
+ ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮
+ │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
+ ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤
+ │ UUID │ 29758270-fd25-4a6c-a7bb-9a18302816af │ / │ ext4 │ relatime,user_xattr,errors=remount-ro │ 0 │ 1 │
+ │ UUID │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home │ btrfs │ relatime │ 0 │ 2 │
+ ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯
+Record count: 2]]></pre>
+
+ <p>Nested <code>(…)</code> work as expected.</p>
+
+ <p>
+ And AWK can do much more – it offers plenty of functions and language constructs that we can use in our transformations.
+ Comperhensive documentation can be found here: <a href="https://www.gnu.org/software/gawk/manual/">Gawk: Effective AWK Programming</a>.
+ </p>
+
+ </text>
+
+</stránka>