relpipe-data/examples-awk-filtering.xml
author František Kučera <franta-hg@frantovo.cz>
Tue, 28 May 2019 21:18:20 +0200
branchv_0
changeset 258 2868d772c27e
parent 245 relpipe-data/examples-guile-filtering.xml@4919c8098008
permissions -rw-r--r--
Release v0.12 – AWK
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     1
<stránka
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     4
	
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
     5
	<nadpis>Complex filtering with AWK</nadpis>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     6
	<perex>filtering records with AND, OR and functions</perex>
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
     7
	<m:pořadí-příkladu>02100</m:pořadí-příkladu>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     8
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     9
	<text xmlns="http://www.w3.org/1999/xhtml">
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    10
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    11
		<p>
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    12
			If we need more complex filtering than <code>relpipe-tr-grep</code> can offer, we can write an AWK transformation.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    13
			Then we can use AND and OR operators and functions like regular expression matching or numerical formulas.
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    14
		</p>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    15
		
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    16
		<p>
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    17
			The tool <code>relpipe-tr-awk</code> calls real AWK program (usually GNU AWK) installed on our system and passes data of given relation to it.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    18
			Thus we can use any AWK feature in our pipeline while processing relational data.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    19
			Relational attributes are mapped to AWK variables, so we can reference them by their names instead of mere field numbers.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    20
		</p>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    21
		
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    22
		<p>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    23
			The <code>--for-each</code> option is used for both filtering (instead of <code>--where</code>) 
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    24
			and arbitrary code execution (for data modifications, adding records, computations or intentional side effects).
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    25
			In AWK, filtering conditions are surrounded by <code>(…)</code> and actions by <code>{…}</code>.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    26
			Both can be combined together and multiple expressions can be separated by <code>;</code> semicolon.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    27
			The <code>record()</code> function should be called instead of AWK <code>print</code> (which should never be used directly).
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    28
			Calling <code>record()</code> is not necessary, when only filtering is done (and there are no data modifications).
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    29
		</p>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    30
		
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    31
		<h2>Filtering numbers</h2>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    32
		
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    33
		<p>With AWK we can filter records using standard numeric operators like ==, &lt;, &gt;, &gt;= etc.</p>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    34
		
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    35
		<m:pre jazyk="bash"><![CDATA[find -print0 | relpipe-in-filesystem \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    36
	| relpipe-tr-awk \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    37
		--relation '.*' \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    38
			--for-each '(size > 2000)' \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    39
	| relpipe-out-tabular]]></m:pre>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    40
	
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    41
		<p>and e.g. list files with certain sizes:</p>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    42
		
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    43
		<pre><![CDATA[filesystem:
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    44
 ╭──────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    45
 │ path        (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    46
 ├──────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    47
 │ ./relpipe-tr-awk.cpp │ f             │           2880 │ hacker         │ hacker         │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    48
 │ ./CLIParser.h        │ f             │           5264 │ hacker         │ hacker         │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    49
 │ ./AwkHandler.h       │ f             │          17382 │ hacker         │ hacker         │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    50
 ╰──────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    51
Record count: 3]]></pre>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    52
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    53
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    54
		<h2>Filtering strings</h2>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    55
		
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    56
		<p>String values can be searched for certain regular expression:</p>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    57
		
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    58
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    59
	| relpipe-tr-awk \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    60
		--relation '.*' \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    61
			--for-each '(mount_point ~ /cdrom/)' \
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    62
	| relpipe-out-tabular]]></m:pre>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    63
	
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    64
		<p>e.g. <code>fstab</code> records having <code>cdrom</code> in the <code>mount_point</code>:</p>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    65
	
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    66
		<pre><![CDATA[fstab:
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    67
 ╭─────────────────┬─────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    68
 │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    69
 ├─────────────────┼─────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    70
 │                 │ /dev/sr0        │ /media/cdrom0        │ udf,iso9660   │ user,noauto      │              0 │              0 │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    71
 ╰─────────────────┴─────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    72
Record count: 1]]></pre>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    73
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    74
		<p>Case-insensitive search can be switched on by adding:</p>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    75
		
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    76
		<pre>--define IGNORECASE integer 1</pre>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    77
		
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    78
		<h2>AND and OR</h2>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    79
		
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    80
		<p>We can combine multiple conditions using <code>||</code> and <code>&amp;&amp;</code> logical operators:</p>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    81
		
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    82
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    83
	| relpipe-tr-awk \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    84
		--relation '.*' \
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    85
			--for-each '(type == "btrfs" || pass == 1)' \
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    86
	| relpipe-out-tabular]]></m:pre>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    87
	
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    88
		<p>and build arbitrary complex filters</p>
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    89
	
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    90
		<pre><![CDATA[fstab:
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    91
 ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    92
 │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options                      (string) │ dump (integer) │ pass (integer) │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    93
 ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    94
 │ UUID            │ 29758270-fd25-4a6c-a7bb-9a18302816af │ /                    │ ext4          │ relatime,user_xattr,errors=remount-ro │              0 │              1 │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    95
 │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime                              │              0 │              2 │
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    96
 ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    97
Record count: 2]]></pre>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    98
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
    99
		<p>Nested <code>(…)</code> work as expected.</p>
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   100
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   101
		<p>
258
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
   102
			And AWK can do much more – it offers plenty of functions and language constructs that we can use in our transformations.
2868d772c27e Release v0.12 – AWK
František Kučera <franta-hg@frantovo.cz>
parents: 245
diff changeset
   103
			Comperhensive documentation can be found here: <a href="https://www.gnu.org/software/gawk/manual/">Gawk: Effective AWK Programming</a>.
245
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   104
		</p>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   105
		
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   106
	</text>
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   107
4919c8098008 examples: Complex filtering with Guile
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   108
</stránka>