relpipe/relpipe-web: relpipe-data/examples-grep-cut-fstab.xml@d39cfc926f95


<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Doing projection and restriction using cut and grep</nadpis>
	<perex>SELECT mount_point FROM fstab WHERE type IN ('btrfs', 'xfs')</perex>
	<m:pořadí-příkladu>01000</m:pořadí-příkladu>

	<text xmlns="http://www.w3.org/1999/xhtml">
		
		<p>
			While reading classic pipelines involving <code>grep</code> and <code>cut</code> commands
			we must notice that there is some similarity with simple SQL queries looking like:
		</p>
		
		<m:pre jazyk="SQL">SELECT "some", "cut", "fields" FROM stdin WHERE grep_matches(whole_line);</m:pre>
		
		<p>
			And that is true: <code>grep</code> does restriction<m:podČarou>
				<a href="https://en.wikipedia.org/wiki/Selection_(relational_algebra)">selecting</a> only certain records from the original relation according to their match with given conditions</m:podČarou>
			and <code>cut</code> does projection<m:podČarou>limited subset of what <a href="https://en.wikipedia.org/wiki/Projection_(relational_algebra)">projection</a> means</m:podČarou>.
			Now we can do these relational operations using our relational tools called <code>relpipe-tr-grep</code> and <code>relpipe-tr-cut</code>.
		</p>
		
		<p>
			Assume that we need only <code>mount_point</code> fields from our <code>fstab</code> where <code>type</code> is <code>btrfs</code> or <code>xfs</code>
			and we want to do something (a shell script block) with these directory paths.
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-tr-grep 'fstab' 'type' '^btrfs|xfs$' \
	| relpipe-tr-cut 'fstab' 'mount_point' \
	| relpipe-out-nullbyte \
	| while read -r -d '' m; do
		echo "$m";
	done]]></m:pre>
	
		<p>
			The <code>relpipe-tr-cut</code> tool has similar syntax to its <em>grep</em> and <em>sed</em> siblings and also uses the power of regular expressions.
			In this case it modifies on-the-fly the <code>fstab</code> relation and drops all its attributes except the <code>mount_point</code> one.
		</p>
		
		<p>
			Then we pass the data to the Bash <code>while</code> cycle.
			In such simple scenario (just <code>echo</code>), we could use <code>xargs</code> as in examples above,
			but in this syntax, we can write whole block of shell commands for each record/value and do more complex actions with them.
		</p>
		
		<h2>More projections with relpipe-tr-cut</h2>
		
		<p>
			Assume that we have a simple relation containing numbers:
		</p>
	
		<m:pre jazyk="bash"><![CDATA[seq 0 8 \
	| tr \\n \\0 \
	| relpipe-in-cli generate-from-stdin numbers 3 a integer b integer c integer \
	> numbers.rp]]></m:pre>

		<p>and second one containing letters:</p>

		<m:pre jazyk="bash"><![CDATA[relpipe-in-cli generate letters 2 a string b string A B C D > letters.rp]]></m:pre>

		<p>We saved them into two files and then combined them into a single file. We will work with them as they are a single stream of relations:</p>
		
		<m:pre jazyk="bash"><![CDATA[cat numbers.rp letters.rp > both.rp;
cat both.rp | relpipe-out-tabular]]></m:pre>
		
		<p>Will print:</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────┬─────────────╮
 │ a (integer) │ b (integer) │ c (integer) │
 ├─────────────┼─────────────┼─────────────┤
 │           0 │           1 │           2 │
 │           3 │           4 │           5 │
 │           6 │           7 │           8 │
 ╰─────────────┴─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────┬─────────────╮
 │ a  (string) │ b  (string) │
 ├─────────────┼─────────────┤
 │ A           │ B           │
 │ C           │ D           │
 ╰─────────────┴─────────────╯
Record count: 2]]></pre>

		<p>We can put away the <code>a</code> attribute from the <code>numbers</code> relation:</p>
		
		<m:pre jazyk="bash">cat both.rp | relpipe-tr-cut 'numbers' 'b|c' | relpipe-out-tabular</m:pre>
		
		<p>and leave the <code>letters</code> relation unaffected:</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────╮
 │ b (integer) │ c (integer) │
 ├─────────────┼─────────────┤
 │           1 │           2 │
 │           4 │           5 │
 │           7 │           8 │
 ╰─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────┬─────────────╮
 │ a  (string) │ b  (string) │
 ├─────────────┼─────────────┤
 │ A           │ B           │
 │ C           │ D           │
 ╰─────────────┴─────────────╯
Record count: 2]]></pre>

		<p>Or we can remove <code>a</code> from both relations resp. keep there only attributes whose names match <code>'b|c'</code> regex:</p>

		<m:pre jazyk="bash">cat both.rp | relpipe-tr-cut '.*' 'b|c' | relpipe-out-tabular</m:pre>
		
		<p>Instead of <code>'.*'</code> we could use <code>'numbers|letters'</code> and in this case it will give the same result:</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────╮
 │ b (integer) │ c (integer) │
 ├─────────────┼─────────────┤
 │           1 │           2 │
 │           4 │           5 │
 │           7 │           8 │
 ╰─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────╮
 │ b  (string) │
 ├─────────────┤
 │ B           │
 │ D           │
 ╰─────────────╯
Record count: 2]]></pre>

		<p>All the time, we are reducing the attributes. But we can also multiply them or change their order:</p>
		
		<m:pre jazyk="bash">cat both.rp | relpipe-tr-cut 'numbers' 'b|a|c' 'b' 'a' 'a' | relpipe-out-tabular</m:pre>
		
		<p>
			n.b. the order in <code>'b|a|c'</code> does not matter and if such regex matches, it preserves the original order of the attributes;
			but if we use multiple regexes to specify attributes, their order and count matters:
		</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────┬─────────────┬─────────────┬─────────────┬─────────────╮
 │ a (integer) │ b (integer) │ c (integer) │ b (integer) │ a (integer) │ a (integer) │
 ├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
 │           0 │           1 │           2 │           1 │           0 │           0 │
 │           3 │           4 │           5 │           4 │           3 │           3 │
 │           6 │           7 │           8 │           7 │           6 │           6 │
 ╰─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────┬─────────────╮
 │ a  (string) │ b  (string) │
 ├─────────────┼─────────────┤
 │ A           │ B           │
 │ C           │ D           │
 ╰─────────────┴─────────────╯
Record count: 2]]></pre>

		<p>
			The <code>letters</code> relation stays rock steady and <code>relpipe-tr-cut 'numbers'</code> does not affect it in any way.
		</p>
		
	</text>

</stránka>
author	František Kučera <franta-hg@frantovo.cz>
	Wed, 31 Jul 2019 16:01:34 +0200
branch	v_0
changeset 264	d39cfc926f95
parent 244	d4f401b5f90c
child 301	7029e6c47700
permissions	-rw-r--r--