relpipe-data/examples-awk-filtering.xml
branchv_0
changeset 258 2868d772c27e
parent 245 4919c8098008
equal deleted inserted replaced
257:a39066264509 258:2868d772c27e
       
     1 <stránka
       
     2 	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
       
     3 	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
       
     4 	
       
     5 	<nadpis>Complex filtering with AWK</nadpis>
       
     6 	<perex>filtering records with AND, OR and functions</perex>
       
     7 	<m:pořadí-příkladu>02100</m:pořadí-příkladu>
       
     8 
       
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
       
    10 
       
    11 		<p>
       
    12 			If we need more complex filtering than <code>relpipe-tr-grep</code> can offer, we can write an AWK transformation.
       
    13 			Then we can use AND and OR operators and functions like regular expression matching or numerical formulas.
       
    14 		</p>
       
    15 		
       
    16 		<p>
       
    17 			The tool <code>relpipe-tr-awk</code> calls real AWK program (usually GNU AWK) installed on our system and passes data of given relation to it.
       
    18 			Thus we can use any AWK feature in our pipeline while processing relational data.
       
    19 			Relational attributes are mapped to AWK variables, so we can reference them by their names instead of mere field numbers.
       
    20 		</p>
       
    21 		
       
    22 		<p>
       
    23 			The <code>--for-each</code> option is used for both filtering (instead of <code>--where</code>) 
       
    24 			and arbitrary code execution (for data modifications, adding records, computations or intentional side effects).
       
    25 			In AWK, filtering conditions are surrounded by <code>(…)</code> and actions by <code>{…}</code>.
       
    26 			Both can be combined together and multiple expressions can be separated by <code>;</code> semicolon.
       
    27 			The <code>record()</code> function should be called instead of AWK <code>print</code> (which should never be used directly).
       
    28 			Calling <code>record()</code> is not necessary, when only filtering is done (and there are no data modifications).
       
    29 		</p>
       
    30 		
       
    31 		<h2>Filtering numbers</h2>
       
    32 		
       
    33 		<p>With AWK we can filter records using standard numeric operators like ==, &lt;, &gt;, &gt;= etc.</p>
       
    34 		
       
    35 		<m:pre jazyk="bash"><![CDATA[find -print0 | relpipe-in-filesystem \
       
    36 	| relpipe-tr-awk \
       
    37 		--relation '.*' \
       
    38 			--for-each '(size > 2000)' \
       
    39 	| relpipe-out-tabular]]></m:pre>
       
    40 	
       
    41 		<p>and e.g. list files with certain sizes:</p>
       
    42 		
       
    43 		<pre><![CDATA[filesystem:
       
    44  ╭──────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
       
    45  │ path        (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
       
    46  ├──────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
       
    47  │ ./relpipe-tr-awk.cpp │ f             │           2880 │ hacker         │ hacker         │
       
    48  │ ./CLIParser.h        │ f             │           5264 │ hacker         │ hacker         │
       
    49  │ ./AwkHandler.h       │ f             │          17382 │ hacker         │ hacker         │
       
    50  ╰──────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
       
    51 Record count: 3]]></pre>
       
    52 
       
    53 
       
    54 		<h2>Filtering strings</h2>
       
    55 		
       
    56 		<p>String values can be searched for certain regular expression:</p>
       
    57 		
       
    58 		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
       
    59 	| relpipe-tr-awk \
       
    60 		--relation '.*' \
       
    61 			--for-each '(mount_point ~ /cdrom/)' \
       
    62 	| relpipe-out-tabular]]></m:pre>
       
    63 	
       
    64 		<p>e.g. <code>fstab</code> records having <code>cdrom</code> in the <code>mount_point</code>:</p>
       
    65 	
       
    66 		<pre><![CDATA[fstab:
       
    67  ╭─────────────────┬─────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮
       
    68  │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
       
    69  ├─────────────────┼─────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤
       
    70  │                 │ /dev/sr0        │ /media/cdrom0        │ udf,iso9660   │ user,noauto      │              0 │              0 │
       
    71  ╰─────────────────┴─────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯
       
    72 Record count: 1]]></pre>
       
    73 
       
    74 		<p>Case-insensitive search can be switched on by adding:</p>
       
    75 		
       
    76 		<pre>--define IGNORECASE integer 1</pre>
       
    77 		
       
    78 		<h2>AND and OR</h2>
       
    79 		
       
    80 		<p>We can combine multiple conditions using <code>||</code> and <code>&amp;&amp;</code> logical operators:</p>
       
    81 		
       
    82 		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
       
    83 	| relpipe-tr-awk \
       
    84 		--relation '.*' \
       
    85 			--for-each '(type == "btrfs" || pass == 1)' \
       
    86 	| relpipe-out-tabular]]></m:pre>
       
    87 	
       
    88 		<p>and build arbitrary complex filters</p>
       
    89 	
       
    90 		<pre><![CDATA[fstab:
       
    91  ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮
       
    92  │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options                      (string) │ dump (integer) │ pass (integer) │
       
    93  ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤
       
    94  │ UUID            │ 29758270-fd25-4a6c-a7bb-9a18302816af │ /                    │ ext4          │ relatime,user_xattr,errors=remount-ro │              0 │              1 │
       
    95  │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime                              │              0 │              2 │
       
    96  ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯
       
    97 Record count: 2]]></pre>
       
    98 
       
    99 		<p>Nested <code>(…)</code> work as expected.</p>
       
   100 
       
   101 		<p>
       
   102 			And AWK can do much more – it offers plenty of functions and language constructs that we can use in our transformations.
       
   103 			Comperhensive documentation can be found here: <a href="https://www.gnu.org/software/gawk/manual/">Gawk: Effective AWK Programming</a>.
       
   104 		</p>
       
   105 		
       
   106 	</text>
       
   107 
       
   108 </stránka>