|
1 <stránka |
|
2 xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana" |
|
3 xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro"> |
|
4 |
|
5 <nadpis>Complex filtering with AWK</nadpis> |
|
6 <perex>filtering records with AND, OR and functions</perex> |
|
7 <m:pořadí-příkladu>02100</m:pořadí-příkladu> |
|
8 |
|
9 <text xmlns="http://www.w3.org/1999/xhtml"> |
|
10 |
|
11 <p> |
|
12 If we need more complex filtering than <code>relpipe-tr-grep</code> can offer, we can write an AWK transformation. |
|
13 Then we can use AND and OR operators and functions like regular expression matching or numerical formulas. |
|
14 </p> |
|
15 |
|
16 <p> |
|
17 The tool <code>relpipe-tr-awk</code> calls real AWK program (usually GNU AWK) installed on our system and passes data of given relation to it. |
|
18 Thus we can use any AWK feature in our pipeline while processing relational data. |
|
19 Relational attributes are mapped to AWK variables, so we can reference them by their names instead of mere field numbers. |
|
20 </p> |
|
21 |
|
22 <p> |
|
23 The <code>--for-each</code> option is used for both filtering (instead of <code>--where</code>) |
|
24 and arbitrary code execution (for data modifications, adding records, computations or intentional side effects). |
|
25 In AWK, filtering conditions are surrounded by <code>(…)</code> and actions by <code>{…}</code>. |
|
26 Both can be combined together and multiple expressions can be separated by <code>;</code> semicolon. |
|
27 The <code>record()</code> function should be called instead of AWK <code>print</code> (which should never be used directly). |
|
28 Calling <code>record()</code> is not necessary, when only filtering is done (and there are no data modifications). |
|
29 </p> |
|
30 |
|
31 <h2>Filtering numbers</h2> |
|
32 |
|
33 <p>With AWK we can filter records using standard numeric operators like ==, <, >, >= etc.</p> |
|
34 |
|
35 <m:pre jazyk="bash"><![CDATA[find -print0 | relpipe-in-filesystem \ |
|
36 | relpipe-tr-awk \ |
|
37 --relation '.*' \ |
|
38 --for-each '(size > 2000)' \ |
|
39 | relpipe-out-tabular]]></m:pre> |
|
40 |
|
41 <p>and e.g. list files with certain sizes:</p> |
|
42 |
|
43 <pre><![CDATA[filesystem: |
|
44 ╭──────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮ |
|
45 │ path (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │ |
|
46 ├──────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤ |
|
47 │ ./relpipe-tr-awk.cpp │ f │ 2880 │ hacker │ hacker │ |
|
48 │ ./CLIParser.h │ f │ 5264 │ hacker │ hacker │ |
|
49 │ ./AwkHandler.h │ f │ 17382 │ hacker │ hacker │ |
|
50 ╰──────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯ |
|
51 Record count: 3]]></pre> |
|
52 |
|
53 |
|
54 <h2>Filtering strings</h2> |
|
55 |
|
56 <p>String values can be searched for certain regular expression:</p> |
|
57 |
|
58 <m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \ |
|
59 | relpipe-tr-awk \ |
|
60 --relation '.*' \ |
|
61 --for-each '(mount_point ~ /cdrom/)' \ |
|
62 | relpipe-out-tabular]]></m:pre> |
|
63 |
|
64 <p>e.g. <code>fstab</code> records having <code>cdrom</code> in the <code>mount_point</code>:</p> |
|
65 |
|
66 <pre><![CDATA[fstab: |
|
67 ╭─────────────────┬─────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮ |
|
68 │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │ |
|
69 ├─────────────────┼─────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤ |
|
70 │ │ /dev/sr0 │ /media/cdrom0 │ udf,iso9660 │ user,noauto │ 0 │ 0 │ |
|
71 ╰─────────────────┴─────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯ |
|
72 Record count: 1]]></pre> |
|
73 |
|
74 <p>Case-insensitive search can be switched on by adding:</p> |
|
75 |
|
76 <pre>--define IGNORECASE integer 1</pre> |
|
77 |
|
78 <h2>AND and OR</h2> |
|
79 |
|
80 <p>We can combine multiple conditions using <code>||</code> and <code>&&</code> logical operators:</p> |
|
81 |
|
82 <m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \ |
|
83 | relpipe-tr-awk \ |
|
84 --relation '.*' \ |
|
85 --for-each '(type == "btrfs" || pass == 1)' \ |
|
86 | relpipe-out-tabular]]></m:pre> |
|
87 |
|
88 <p>and build arbitrary complex filters</p> |
|
89 |
|
90 <pre><![CDATA[fstab: |
|
91 ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮ |
|
92 │ scheme (string) │ device (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │ |
|
93 ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤ |
|
94 │ UUID │ 29758270-fd25-4a6c-a7bb-9a18302816af │ / │ ext4 │ relatime,user_xattr,errors=remount-ro │ 0 │ 1 │ |
|
95 │ UUID │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home │ btrfs │ relatime │ 0 │ 2 │ |
|
96 ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯ |
|
97 Record count: 2]]></pre> |
|
98 |
|
99 <p>Nested <code>(…)</code> work as expected.</p> |
|
100 |
|
101 <p> |
|
102 And AWK can do much more – it offers plenty of functions and language constructs that we can use in our transformations. |
|
103 Comperhensive documentation can be found here: <a href="https://www.gnu.org/software/gawk/manual/">Gawk: Effective AWK Programming</a>. |
|
104 </p> |
|
105 |
|
106 </text> |
|
107 |
|
108 </stránka> |