relpipe-data/examples-guile-filtering.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 00:43:11 +0100
branchv_0
changeset 329 5bc2bb8b7946
parent 316 d7ae02390fac
permissions -rw-r--r--
Release v0.18

<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Complex filtering with Scheme</nadpis>
	<perex>filtering records with AND, OR and functions</perex>
	<m:pořadí-příkladu>01400</m:pořadí-příkladu>

	<text xmlns="http://www.w3.org/1999/xhtml">
		
		<p>
			For simple filtering, we can use <code>relpipe-tr-grep</code>.
			But what if we need to write some complex query that contains AND and OR operators?
			What if we need e.g. compare numbers – not only match texts against regular expressions?
			There is a tool capable to do this and much more: <code>relpipe-tr-scheme</code>!
		</p>
		
		<p>
			<a href="https://www.gnu.org/software/guile/">Guile</a> is the GNU implementation of Scheme language (something like Lisp and also full of parenthesis).
			The <code>relpipe-tr-scheme</code> reference implementation uses GNU Guile as a library, puts data in the Scheme context and evaluates Scheme expressions and then reads data from the Scheme context back and generates relational output from them.
			Good news are that it is not necessary to know Lisp/Scheme to use this tool. For the first steps, it can be used just as a query language – like SQL, just a bit Polish.
		</p>
		
		<h2>Filtering numbers</h2>
		
		<p>
			We are looking for „satanistic“ icons in our filesystem – those that have size = 666 bytes.
		</p>
		
		<m:pre jazyk="bash"><![CDATA[find /usr/share/icons/ -type f -print0 \
	| relpipe-in-filesystem \
	| relpipe-tr-scheme --relation 'files.*' --where '(= $size 666)' \
	| relpipe-out-tabular]]></m:pre>
	
		<p>Well, well… here we are:</p>
		
		<m:pre jazyk="text"><![CDATA[filesystem:
 ╭───────────────────────────────────────────────────────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
 │ path                                                         (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
 ├───────────────────────────────────────────────────────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
 │ /usr/share/icons/elementary-xfce/actions/24/tab-new.png               │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/elementary-xfce/apps/16/clock.png                    │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/elementary-xfce/mimes/22/x-office-spreadsheet.png    │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/Tango/22x22/apps/office-calendar.png                 │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/Tango/16x16/actions/process-stop.png                 │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/breeze/actions/24/align-vertical-center.svg          │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/breeze/devices/22/camera-photo.svg                   │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/oxygen/base/48x48/actions/tab-detach.png             │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/oxygen/base/32x32/actions/insert-horizontal-rule.png │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/breeze-dark/actions/24/align-vertical-center.svg     │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/breeze-dark/devices/22/camera-photo.svg              │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/gnome/22x22/status/weather-overcast.png              │ f             │            666 │ root           │ root           │
 │ /usr/share/icons/gnome/16x16/actions/go-home.png                      │ f             │            666 │ root           │ root           │
 ╰───────────────────────────────────────────────────────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
Record count: 13]]></m:pre>

		<p>The <code>--relation 'files.*'</code> is a regular expression that says which relations should be processed in Scheme – others are passed through unchanged.</p>
		
		<p>
			The <code>--where '(= $size 666)'</code> is our condition. 
			The Polish<m:podČarou>see <a href="https://en.wikipedia.org/wiki/Polish_notation">Polish notation</a></m:podČarou> thing means that we write <code>= $size 666</code> instead of <code>$size = 666</code>.
			It seems a bit weird but it makes sense – the <code>=</code> is a function that compares two numbers and returns a boolean value – 
			so we just call this function and pass <code>$size</code> and <code>666</code> arguments to it.
			And because it is a function, there are <code>(</code>parentheses<code>)</code>.
		</p>
		
		<p>
			Relational attributes are mapped to Scheme variables with same name, just prefixed with <code>$</code>.
			(we considered <code>
				<abbr title="Bitcoin">₿</abbr>
			</code> symbol, but <code>$</code> seems to be still more common on keyboards in 2019)
			While relational attribute name is an arbitrary string, Scheme variable names have some limitations, thus not all attributes can be mapped – those with spaces and some special characters are currently unsupported (this will be fixed in later versions by some kind of encoding/escaping).
		</p>
		
		<p>
			We can also look for 
			<code>--where '(&gt; $size 100)'</code> which means „size is greater than 100“
			or
			<code>--where '(&lt; $size 100)'</code> which means „size is smaller than 100“.
			The <code>&gt;=</code> and <code>&lt;=</code> also work as expected.
		</p>
		
		<h2>Filtering strings</h2>
		
		<p>
			Scheme is strongly typed language and we have to use proper functions/operators for each type.
			For strings, it is <code>string=</code> instead of <code>=</code> function:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-tr-scheme --relation 'fstab' --where '(string= $type "btrfs")' \
	| relpipe-out-tabular]]></m:pre>
	
		<p>The Btrfs filesystems in our <code>fstab</code>:</p>

		<m:pre jazyk="text"><![CDATA[fstab:
 ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮
 │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
 ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤
 │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime         │              0 │              2 │
 ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯
Record count: 1]]></m:pre>

		<p>
			There is also <code>string-prefix?</code> which evaluates whether the first string is a prefix of the second string:
		</p>

		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-tr-scheme --relation 'fstab' --where '(string-prefix? "/mnt" $mount_point)' \
	| relpipe-out-tabular]]></m:pre>
		
		<p>So we can find filesystems mounted somewhere under <code>/mnt</code>:</p>

		<m:pre jazyk="bash"><![CDATA[fstab:
 ╭─────────────────┬───────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮
 │ scheme (string) │ device       (string) │ mount_point (string) │ type (string) │ options                      (string) │ dump (integer) │ pass (integer) │
 ├─────────────────┼───────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤
 │                 │ /dev/sde              │ /mnt/data            │ ext4          │ relatime,user_xattr,errors=remount-ro │              0 │              2 │
 │                 │ /dev/mapper/sdf_crypt │ /mnt/private         │ xfs           │ relatime                              │              0 │              2 │
 ╰─────────────────┴───────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯
Record count: 2]]></m:pre>

		<p>
			There are much more functions – can be found in the <a href="https://www.gnu.org/software/guile/manual/guile.html">Guile documentation</a>
			– like case-insensitive variants (e.g. <code>string-ci=</code>) or regular expression search (<code>string-match</code>).
		</p>


		<h2>AND and OR</h2>
		
		<p>
			Like in SQL, we can join multiple conditions together with logical operators AND and OR.
			In Scheme these operators are also functions – they are written in the same <code>(</code>fashion<code>)</code>.
		</p>
		
		<p>
			So we can e.g. look for icons that are „satanistic“ or „Orwellian“:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[find /usr/share/icons/ -type f -print0 \
	| relpipe-in-filesystem --file path --file size \
	| relpipe-tr-scheme --relation 'files.*' --where '(or (= $size 666) (= $size 1984) )' \
	| relpipe-out-tabular]]></m:pre>
	
		<p>Files with sizes 666 bytes or 1984 bytes:</p>

		<m:pre jazyk="text"><![CDATA[filesystem:
 ╭───────────────────────────────────────────────────────────────────────┬────────────────╮
 │ path                                                         (string) │ size (integer) │
 ├───────────────────────────────────────────────────────────────────────┼────────────────┤
 │ /usr/share/icons/elementary-xfce/actions/48/mail-mark-important.png   │           1984 │
 │ /usr/share/icons/elementary-xfce/actions/24/tab-new.png               │            666 │
 │ /usr/share/icons/elementary-xfce/apps/16/clock.png                    │            666 │
 │ /usr/share/icons/elementary-xfce/mimes/22/x-office-spreadsheet.png    │            666 │
 │ /usr/share/icons/Humanity-Dark/status/22/krb-no-valid-ticket.svg      │           1984 │
 │ /usr/share/icons/Tango/22x22/apps/office-calendar.png                 │            666 │
 │ /usr/share/icons/Tango/16x16/actions/process-stop.png                 │            666 │
 │ /usr/share/icons/breeze/actions/24/align-vertical-center.svg          │            666 │
 │ /usr/share/icons/breeze/devices/22/camera-photo.svg                   │            666 │
 │ /usr/share/icons/oxygen/base/48x48/actions/tab-detach.png             │            666 │
 │ /usr/share/icons/oxygen/base/32x32/actions/insert-horizontal-rule.png │            666 │
 │ /usr/share/icons/Humanity/status/22/krb-no-valid-ticket.svg           │           1984 │
 │ /usr/share/icons/breeze-dark/actions/24/align-vertical-center.svg     │            666 │
 │ /usr/share/icons/breeze-dark/devices/22/camera-photo.svg              │            666 │
 │ /usr/share/icons/gnome/48x48/status/user-busy.png                     │           1984 │
 │ /usr/share/icons/gnome/22x22/status/weather-overcast.png              │            666 │
 │ /usr/share/icons/gnome/16x16/actions/go-home.png                      │            666 │
 ╰───────────────────────────────────────────────────────────────────────┴────────────────╯
Record count: 17]]></m:pre>

		<p>Or we can look for icons that are in SVG format and (at the same time) Orwellian:</p>
		
		<m:pre jazyk="bash"><![CDATA[find /usr/share/icons/ -type f -print0 \
	| relpipe-in-filesystem --file path --file size \
	| relpipe-tr-scheme \
		--relation 'files.*' \
		--where '(and (string-suffix? ".svg" $path) (= $size 1984) )' \
	| relpipe-out-tabular]]></m:pre>
	
		<p>Which is quite rare and we have only two such icons:</p>

		<m:pre jazyk="text"><![CDATA[filesystem:
 ╭──────────────────────────────────────────────────────────────────┬────────────────╮
 │ path                                                    (string) │ size (integer) │
 ├──────────────────────────────────────────────────────────────────┼────────────────┤
 │ /usr/share/icons/Humanity-Dark/status/22/krb-no-valid-ticket.svg │           1984 │
 │ /usr/share/icons/Humanity/status/22/krb-no-valid-ticket.svg      │           1984 │
 ╰──────────────────────────────────────────────────────────────────┴────────────────╯
Record count: 2]]></m:pre>

		<p>
			We can nest ANDs and ORs and other functions as deep as we need and build even very complex queries.
			Prentheses nesting is fun, isn't it?
		</p>


		
		
		
	</text>

</stránka>