relpipe/relpipe-web: relpipe-data/examples.xml@f71d300205b7


<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Examples</nadpis>
	<perex>Usage examples of Relational pipes tools</perex>
	<pořadí>40</pořadí>

	<text xmlns="http://www.w3.org/1999/xhtml">
		
		
		<p>
			All examples were tested in <a href="https://www.gnu.org/software/bash/">GNU Bash</a>.
			But they should also work in other shells.
		</p>
		
		<h2>relpipe-in-cli: Hello Wordl!</h2>
		
		<p>
			Let's start with an obligatory Hello World example.
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-cli generate "relation_from_cli" 3 \
	"a" "integer" \
	"b" "string" \
	"c" "boolean" \
	"1" "Hello" "true" \
	"2" "World!" "false"]]></m:pre>
	
		<p>
			This command generates relational data.
			In order to see them, we need to convert them to some other format.
			For now, we will use the "tabular" format and pipe relational data to the <code>relpipe-out-tabular</code>.
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-cli generate "relation_from_cli" 3 \
		"a" "integer" \
		"b" "string" \
		"c" "boolean" \
		"1" "Hello" "true" \
		"2" "World!" "false" \
	| relpipe-out-tabular]]></m:pre>
	
		<p>Output:</p>

		<pre><![CDATA[relation_from_cli:
 ╭─────────────┬────────────┬─────────────╮
 │ a (integer) │ b (string) │ c (boolean) │
 ├─────────────┼────────────┼─────────────┤
 │           1 │ Hello      │        true │
 │           2 │ World!     │       false │
 ╰─────────────┴────────────┴─────────────╯
Record count: 2
]]></pre>

		<p>
			The syntax is simple as we see above. We specify the name of the relation, number of attributes,
			and then their definitions (names and types),
			followed by the data.
		</p>

		<p>
			A single stream may contain multiple relations:
		</p>		
		
		<m:pre jazyk="bash"><![CDATA[(relpipe-in-cli generate a 1 x string hello; \
 relpipe-in-cli generate b 1 y string world) \
	| relpipe-out-tabular]]></m:pre>
			
		<p>
			Thus we can combine various commands or files and pass the result to a single relational output filter (<code>relpipe-out-tabular</code> in this case) and get:
		</p>
		
		<pre><![CDATA[a:
 ╭────────────╮
 │ x (string) │
 ├────────────┤
 │ hello      │
 ╰────────────╯
Record count: 1
b:
 ╭────────────╮
 │ y (string) │
 ├────────────┤
 │ world      │
 ╰────────────╯
Record count: 1]]></pre>
		
		<h2>relpipe-in-cli: STDIN</h2>
		
		<p>
			The number of <abbr title="Command-line interface">CLI</abbr> arguments is limited and they are passed at once to the process.
			So there is option to pass the values from STDIN instead of CLI arguments.
			Values on STDIN are expected to be separated by the null-byte.
			We can generate such data e.g. using <code>echo</code> and <code>tr</code> (or using <code>printf</code> or other commands):
		</p>
		
		<m:pre jazyk="bash"><![CDATA[echo -e "1\nHello\ntrue\n2\nWorld\nfalse" \
	| tr \\n \\0 \
	| relpipe-in-cli generate-from-stdin relation_from_stdin 3 \
		a integer \
		b string \
		c boolean \
	| relpipe-out-tabular]]></m:pre>

		<p>
			The output is same as above.
			We can use this approach to convert various formats to relational data.
			There are lot of data already in the form of null-separated values e.g. the process arguments:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[cat /proc/$(pidof mc)/cmdline \
	| relpipe-in-cli generate-from-stdin mc_args 1 a string \
	| relpipe-out-tabular
]]></m:pre>
	
		<p>If we have <code>mc /etc/ /tmp/</code> running in some other terminal, the output will be:</p>
		
		<pre><![CDATA[mc_args:
 ╭────────────╮
 │ a (string) │
 ├────────────┤
 │ mc         │
 │ /etc/      │
 │ /tmp/      │
 ╰────────────╯
Record count: 3]]></pre>

		<p>
			Also the <code>find</code> command can produce data separated by the null-byte:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[find /etc/ -name '*ssh*_*' -print0 \
	| relpipe-in-cli generate-from-stdin files 1 file_name string \
	| relpipe-out-tabular]]></m:pre>
	
		<p>Will display something like this:</p>
		
		<pre><![CDATA[files:
 ╭───────────────────────────────────╮
 │ file_name                (string) │
 ├───────────────────────────────────┤
 │ /etc/ssh/ssh_host_ecdsa_key       │
 │ /etc/ssh/sshd_config              │
 │ /etc/ssh/ssh_host_ed25519_key.pub │
 │ /etc/ssh/ssh_host_ecdsa_key.pub   │
 │ /etc/ssh/ssh_host_rsa_key         │
 │ /etc/ssh/ssh_config               │
 │ /etc/ssh/ssh_host_ed25519_key     │
 │ /etc/ssh/ssh_import_id            │
 │ /etc/ssh/ssh_host_rsa_key.pub     │
 ╰───────────────────────────────────╯
Record count: 9]]></pre>
		
		
		<h2>relpipe-in-fstab</h2>
		
		<p>
			Using command <code>relpipe-in-fstab</code> we can convert the <code>/etc/fstab</code> or <code>/etc/mtab</code> to relational data 
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab | relpipe-out-tabular]]></m:pre>
		
		<p>
			and see them as a nice table:
		</p>
		
		<pre><![CDATA[fstab:
 ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬───────────────────────────────────────┬────────────────┬────────────────╮
 │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options                      (string) │ dump (integer) │ pass (integer) │
 ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼───────────────────────────────────────┼────────────────┼────────────────┤
 │ UUID            │ 29758270-fd25-4a6c-a7bb-9a18302816af │ /                    │ ext4          │ relatime,user_xattr,errors=remount-ro │              0 │              1 │
 │                 │ /dev/sr0                             │ /media/cdrom0        │ udf,iso9660   │ user,noauto                           │              0 │              0 │
 │                 │ /dev/sde                             │ /mnt/data            │ ext4          │ relatime,user_xattr,errors=remount-ro │              0 │              2 │
 │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime                              │              0 │              2 │
 │                 │ /dev/mapper/sdf_crypt                │ /mnt/private         │ xfs           │ relatime                              │              0 │              2 │
 ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴───────────────────────────────────────┴────────────────┴────────────────╯
Record count: 5]]></pre>

		<p>And we can do the same also with a remote <code>fstab</code> or <code>mtab</code>; just by adding <code>ssh</code> to the pipeline:</p>

		<m:pre jazyk="bash"><![CDATA[ssh example.com cat /etc/mtab | relpipe-in-fstab | relpipe-out-tabular]]></m:pre>
		
		<p>
			The <code>cat</code> runs remotely. The <code>relpipe-in-fstab</code> and <code>relpipe-out-tabular</code> run on our machine.
		</p>
		
		<p>
			n.b. the <code>relpipe-in-fstab</code> reads the <code>/etc/fstab</code> if executed on TTY. Otherwise, it reads the STDIN.
		</p>
		
		<h2>relpipe-out-xml</h2>
		
		<p>
			Relational data can be converted to various formats and one of them is the XML.
			This is a good option for further processing e.g. using XSLT transformation or passing the XML data to some other tool.
			Just use <code>relpipe-out-xml</code> instead of <code>relpipe-out-tabular</code> and the rest of the pipeline remains unchanged:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[ssh example.com cat /etc/mtab | relpipe-in-fstab | relpipe-out-xml]]></m:pre>
		
		<p>
			Will produce XML like this:
		</p>
		
		<m:pre jazyk="xml" src="examples/relpipe-out-fstab.xml"/>

		<p>
			Thanks to XSLT, this XML can be easily converted e.g. to an XHTML table (<code>table|tr|td</code>) or other format.
			Someone can convert such data to a (La)TeX table.
		</p>
		
		<p>
			n.b. the format is not final and will change i future versions (XML namespace, more metadata etc.).
		</p>
		
		
		<h2>relpipe-tr-validator</h2>
		
		<p>
			Just a passthrough command, so these pipelines should produce the same hash:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[
relpipe-in-fstab | relpipe-tr-validator | sha512sum
relpipe-in-fstab | sha512sum]]></m:pre>

		<p>
			This tool can be used for testing whether a file contains valid relational data:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[
if relpipe-tr-validator < "some-file.rp" &> /dev/null; then
	echo "valid relational data";
else
	echo "garbage";
fi]]></m:pre>
		
		<p>or as a one-liner:</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-tr-validator < "some-file.rp" &> /dev/null && echo "ok" || echo "error"]]></m:pre>
		
		<p>
			If an error is found, it is reported on STDERR. So just omit the <code>&amp;</code> in order to see the error message.
		</p>
		
		
		<h2>/etc/fstab formatting using -in-fstab, -out-nullbyte, xargs and Perl</h2>
		
		<p>
			As we have seen before, we can convert <code>/etc/fstab</code> (or <code>mtab</code>)
			to e.g. an XML or a nice and colorful table using <m:name/>.
			But we can also convert these data back to the <code>fstab</code> format. And do it with proper indentation/padding.
			Fstab has a simple format where values are separated by one or more whitespace characters.
			But without proper indentation, these files look a bit obfuscated and hard to read (however, they are valid).
		</p>
		
		<m:pre jazyk="text" src="examples/relpipe-out-fstab.txt"/>
		
		<p>
			So let's build a pipeline that reformats the <code>fstab</code> and makes it more readable.
		</p>
			
		<m:pre jazyk="bash">relpipe-in-fstab | relpipe-out-fstab &gt; reformatted-fstab.txt</m:pre>
			
		<p>
			We can hack together a script called <code>relpipe-out-fstab</code> that accepts relational data and produces <code>fstab</code> data.
			Later this will be probably implemented as a regular tool, but for now, it is just an example of a ad-hoc shell script:
		</p>
		
		<m:pre jazyk="bash" src="examples/relpipe-out-fstab.sh" odkaz="ano"/>
		
		<p>
			In the first part, we prepend a single record (<code>relpipe-in-cli</code>) before the data coming from STDIN (<code>cat</code>).
			Then, we use <code>relpipe-out-nullbyte</code> to convert relational data to values separated by a null-byte.
			This command processes only attribute values (skips relation and attribute names).
			Then we used <code>xargs</code> to read the null-separated values and execute a Perl command for each record (pass to it a same number of arguments, as we have attributes: <code>--max-args=7</code>).
			Perl does the actual formatting: adds padding and does some little tunning (merges two attributes and replaces empty values with <em>none</em>).
		</p>
		
		<p>This is formatted version of the <code>fstab</code> above:</p>
		
		<m:pre jazyk="text" src="examples/relpipe-out-fstab.formatted.txt"/>
		
		<p>
			And using following command we can verify, that the files differ only in comments and whitespace:
		</p>
		
		<pre>relpipe-in-fstab | relpipe-out-fstab | diff -w /etc/fstab -</pre>

		<p>
			Another check (should print same hashes):
		</p>
		
		<pre><![CDATA[relpipe-in-fstab | sha512sum 
relpipe-in-fstab | relpipe-out-fstab | relpipe-in-fstab | sha512sum]]></pre>
		
		<p>
			Regular implementation of <code>relpipe-out-fstab</code> will probably keep the comments
			(it needs also one more attribute and small change in <code>relpipe-in-fstab</code>).
		</p>
		
		<p>
			For just mere <code>fstab</code> reformatting, this approach is a bit overengineering.
			We could skip the whole relational thing and do just something like this:
		</p>
		
		<m:pre jazyk="bash">cat /etc/fstab | grep -v '^#' | sed -E 's/\s+/\n/g' | tr \\n \\0 | xargs -0 -n7 ...</m:pre>
		
		<p>
			plus prepend the comment (or do everything in Perl).
			But this example is intended as a demostration, how we can
			1) prepend some additional data before the data from STDIN
			2) use <m:name/> and traditional tools like <code>xargs</code> or <code>perl</code> together.
			And BTW we have implemented a (simple but working) <em>relpipe output filter</em> – and did it without any serious programming, just put some existing commands together :-)
		</p>
		
		<blockquote>
			<p>
				There is more Unix-nature in one line of shell script than there is in ten thousand lines of C.
				<m:podČarou>see <a href="http://www.catb.org/~esr/writings/unix-koans/ten-thousand.html">Master Foo and the Ten Thousand Lines</a></m:podČarou>
			</p>
		</blockquote>
		
		<h2>Writing an output filter in Bash</h2>
		
		<p>
			In previous example we created an output filter in Perl. 
			We converted a relation to values separated by <code>\0</code> and then passed it through <code>xargs</code> to a perl <em>one-liner</em> (or a <em>multi-liner</em> in this case).
			But we can write such output filter in pure Bash without <code>xargs</code> and <code>perl</code>.
			Of course, it is still limited to a single relation (or it can process multiple relations of same type and do something like implicit <code>UNION ALL</code>).
		</p>
		
		<p>
			We will define a function that will help us with reading the <code>\0</code>-separated values and putting them into shell variables:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[read_nullbyte() { for v in "$@"; do export "$v"; read -r -d '' "$v"; done }]]></m:pre>
		
		<!--
			This version will not require the last \0:
				read_zero() { for v in "$@"; do export "$v"; read -r -d '' "$v" || [ ! -z "${!v}" ]; done }
			at least in case when the last value is not missing.
			Other values might be null/missing: \0\0 is OK.
		-->
		
		<p>
			Currently, there is no known way how to do this without a custom function (just with <code>read</code> built-in command of Bash and its parameters).
			But it is just a single line function, so not a big deal.
		</p>
		
		<p>
			And then we just read the values, put them in shell variables and process them in a cycle in a shell block of code:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-out-nullbyte \
	| while read_nullbyte scheme device mount_point fs_type options dump pass; do
		echo "Device ${scheme:+$scheme=}$device is mounted" \
		     "at $mount_point and contains $fs_type.";
	done]]></m:pre>
	
		<p>
			Which will print:
		</p>
		
		<pre><![CDATA[Device UUID=29758270-fd25-4a6c-a7bb-9a18302816af is mounted at / and contains ext4.
Device /dev/sr0 is mounted at /media/cdrom0 and contains udf,iso9660.
Device /dev/sde is mounted at /mnt/data and contains ext4.
Device UUID=a2b5f230-a795-4f6f-a39b-9b57686c86d5 is mounted at /home and contains btrfs.
Device /dev/mapper/sdf_crypt is mounted at /mnt/private and contains xfs.]]></pre>

		<p>
			Using this method, we can convert any single relation to any format (preferably some text one, but <code>printf</code> can produce also binary data).
			This is good for ad-hoc conversions and single-relation data.
			More powerful tools can be written in C++ and other languages like Java, Python, Guile etc. (when particular libraries are available).
		</p>
		
		<h2>Rename VG in /etc/fstab using relpipe-tr-sed</h2>
		
		<p>
			Assume that we have an <code>/etc/fstab</code> with many lines defining the mount-points (directories) of particular devices (disks) and we are using LVM.
			If we rename a volume group (VG), we have to change all of them. The lines look like this one:
		</p>
		
		<pre>/dev/alpha/photos    /mnt/photos/    btrfs    noauto,noatime,nodiratime    0  0</pre>
		
		<p>
			We want to change all lines from <code>alpha</code> to <code>beta</code> (the new VG name).
			This can be done by the power of regular expressions<m:podČarou>see <a href="https://en.wikibooks.org/wiki/Regular_Expressions/Simple_Regular_Expressions">Regular Expressions</a> at Wikibooks</m:podČarou> and this pipeline:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-tr-sed 'fstab' 'device' '^/dev/alpha/' '/dev/beta/' \
	| relpipe-out-fstab]]></m:pre>
	
		<p>
			The <code>relpipe-tr-sed</code> tool works only with given relation (<code>fstab</code>) and given attribute (<code>device</code>)
			and it would leave untouched other relations and attributes in the stream.
			So it would not replace the strings on unwanted places (if there are any random matches).
		</p>
		
		<p>
			Even the relation names and attribute names are specified as a regular expression, so we can (purposefully) modify multiple relations or attributes.
			For example we can put zeroes in both <code>dump</code> and <code>pass</code> attributes:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab | relpipe-tr-sed 'fstab' 'dump|pass' '.+' '0' | relpipe-out-fstab]]></m:pre>
		
		<p>
			n.b. the data types must be respected, we can not e.g. put <code>abc</code> in the <code>pass</code> attribute because it is declared as <code>integer</code>.
		</p>
		
		<h2>Using relpipe-tr-sed with groups and backreferences</h2>
		
		<p>
			This tool also support regex groups and backreferences. Thus we can use parts of the matched string in our replacement string:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-cli generate r 1 a string "some string xxx_123 some zzz_456 other" \
	| relpipe-tr-sed 'r' 'a' '([a-z]{3})_([0-9]+)' '$2:$1' \
	| relpipe-out-tabular]]></m:pre>
		
		<p>Which would convert this:</p>
		<pre><![CDATA[r:
 ╭────────────────────────────────────────╮
 │ a                             (string) │
 ├────────────────────────────────────────┤
 │ some string xxx_123 some zzz_456 other │
 ╰────────────────────────────────────────╯
Record count: 1]]></pre>
		
		<p>into this:</p>
		<pre><![CDATA[r:
 ╭────────────────────────────────────────╮
 │ a                             (string) │
 ├────────────────────────────────────────┤
 │ some string 123:xxx some 456:zzz other │
 ╰────────────────────────────────────────╯
Record count: 1]]></pre>

		<p>
			If there were any other relations or attributes in the stream, they would be unaffected by this transformation,
			becase we specified <code>'r' 'a'</code> instead of some wider regular expression that would match more relations or attributes.
		</p>
		
		<h2>Filter /etc/fstab using relpipe-tr-grep</h2>
		
		<p>
			If we are interested only in certain records in some relation, we can filter it using <code>relpipe-tr-grep</code>.
			If we want to list e.g. only Btrfs and XFS file systems from our <code>fstab</code> (see above), we will run:
		</p>
		
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab | relpipe-tr-grep 'fstab' 'type' 'btrfs|xfs' | relpipe-out-tabular]]></m:pre>
				
		<p>and we will get following filtered result:</p>
		<pre><![CDATA[fstab:
 ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬──────────────────┬────────────────┬────────────────╮
 │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options (string) │ dump (integer) │ pass (integer) │
 ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼──────────────────┼────────────────┼────────────────┤
 │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime         │              0 │              2 │
 │                 │ /dev/mapper/sdf_crypt                │ /mnt/private         │ xfs           │ relatime         │              0 │              2 │
 ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴──────────────────┴────────────────┴────────────────╯
Record count: 2]]></pre>

		<p>
			Command arguments are similar to <code>relpipe-tr-sed</code>.
			Everything is a regular expression.
			Only relations matching the regex will be filtered, others will flow through the pipeline unmodified.
			If the attribute regex matches more attribute names, filtering will be done with logical OR
			i.e. the record is included if at least one of that attributes matches the search regex.
		</p>
		
		<p>
			If we need exact match of the whole attribute, we have to use something like <code>'^btrfs|xfs$'</code>,
			otherwise mere substring-match is enough to include the record.
		</p>
		
		<h2>SELECT mount_point FROM fstab WHERE type IN ('btrfs', 'xfs')</h2>
		
		<p>
			While reading classic pipelines involving <code>grep</code> and <code>cut</code> commands
			we must notice that there is some similarity with simple SQL queries looking like:
		</p>
		
		<m:pre jazyk="SQL">SELECT "some", "cut", "fields" FROM stdin WHERE grep_matches(whole_line);</m:pre>
		
		<p>
			And that is true: <code>grep</code> does restriction<m:podČarou>
				<a href="https://en.wikipedia.org/wiki/Selection_(relational_algebra)">selecting</a> only certain records from the original relation according to their match with given conditions</m:podČarou>
			and <code>cut</code> does projection<m:podČarou>limited subset of what <a href="https://en.wikipedia.org/wiki/Projection_(relational_algebra)">projection</a> means</m:podČarou>.
			Now we can do these relational operations using our relational tools called <code>relpipe-tr-grep</code> and <code>relpipe-tr-cut</code>.
		</p>
		
		<p>
			Assume that we need only <code>mount_point</code> fields from our <code>fstab</code> where <code>type</code> is <code>btrfs</code> or <code>xfs</code>
			and we want to do something (a shell script block) with these directory paths.
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-tr-grep 'fstab' 'type' '^btrfs|xfs$' \
	| relpipe-tr-cut 'fstab' 'mount_point' \
	| relpipe-out-nullbyte \
	| while read -r -d '' m; do
		echo "$m";
	done]]></m:pre>
	
		<p>
			The <code>relpipe-tr-cut</code> tool has similar syntax to its <em>grep</em> and <em>sed</em> siblings and also uses the power of regular expressions.
			In this case it modifies on-the-fly the <code>fstab</code> relation and drops all its attributes except the <code>mount_point</code> one.
		</p>
		
		<p>
			Then we pass the data to the Bash <code>while</code> cycle.
			In such simple scenario (just <code>echo</code>), we could use <code>xargs</code> as in examples above,
			but in this syntax, we can write whole block of shell commands for each record/value and do more complex actions with them.
		</p>
		
		<h2>More projections with relpipe-tr-cut</h2>
		
		<p>
			Assume that we have a simple relation containing numbers:
		</p>
	
		<m:pre jazyk="bash"><![CDATA[seq 0 8 \
	| tr \\n \\0 \
	| relpipe-in-cli generate-from-stdin numbers 3 a integer b integer c integer \
	> numbers.rp]]></m:pre>

		<p>and second one containing letters:</p>

		<m:pre jazyk="bash"><![CDATA[relpipe-in-cli generate letters 2 a string b string A B C D > letters.rp]]></m:pre>

		<p>We saved them into two files and then combined them into a single file. We will work with them as they are a single stream of relations:</p>
		
		<m:pre jazyk="bash"><![CDATA[cat numbers.rp letters.rp > both.rp;
cat both.rp | relpipe-out-tabular]]></m:pre>
		
		<p>Will print:</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────┬─────────────╮
 │ a (integer) │ b (integer) │ c (integer) │
 ├─────────────┼─────────────┼─────────────┤
 │           0 │           1 │           2 │
 │           3 │           4 │           5 │
 │           6 │           7 │           8 │
 ╰─────────────┴─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────┬─────────────╮
 │ a  (string) │ b  (string) │
 ├─────────────┼─────────────┤
 │ A           │ B           │
 │ C           │ D           │
 ╰─────────────┴─────────────╯
Record count: 2]]></pre>

		<p>We can put away the <code>a</code> attribute from the <code>numbers</code> relation:</p>
		
		<m:pre jazyk="bash">cat both.rp | relpipe-tr-cut 'numbers' 'b|c' | relpipe-out-tabular</m:pre>
		
		<p>and leave the <code>letters</code> relation unaffected:</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────╮
 │ b (integer) │ c (integer) │
 ├─────────────┼─────────────┤
 │           1 │           2 │
 │           4 │           5 │
 │           7 │           8 │
 ╰─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────┬─────────────╮
 │ a  (string) │ b  (string) │
 ├─────────────┼─────────────┤
 │ A           │ B           │
 │ C           │ D           │
 ╰─────────────┴─────────────╯
Record count: 2]]></pre>

		<p>Or we can remove <code>a</code> from both relations resp. keep there only attributes whose names match <code>'b|c'</code> regex:</p>

		<m:pre jazyk="bash">cat both.rp | relpipe-tr-cut '.*' 'b|c' | relpipe-out-tabular</m:pre>
		
		<p>Instead of <code>'.*'</code> we could use <code>'numbers|letters'</code> and in this case it will give the same result:</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────╮
 │ b (integer) │ c (integer) │
 ├─────────────┼─────────────┤
 │           1 │           2 │
 │           4 │           5 │
 │           7 │           8 │
 ╰─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────╮
 │ b  (string) │
 ├─────────────┤
 │ B           │
 │ D           │
 ╰─────────────╯
Record count: 2]]></pre>

		<p>All the time, we are reducing the attributes. But we can also multiply them or change their order:</p>
		
		<m:pre jazyk="bash">cat both.rp | relpipe-tr-cut 'numbers' 'b|a|c' 'b' 'a' 'a' | relpipe-out-tabular</m:pre>
		
		<p>
			n.b. the order in <code>'b|a|c'</code> does not matter and if such regex matches, it preserves the original order of the attributes;
			but if we use multiple regexes to specify attributes, their order and count matters:
		</p>
		
		<pre><![CDATA[numbers:
 ╭─────────────┬─────────────┬─────────────┬─────────────┬─────────────┬─────────────╮
 │ a (integer) │ b (integer) │ c (integer) │ b (integer) │ a (integer) │ a (integer) │
 ├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
 │           0 │           1 │           2 │           1 │           0 │           0 │
 │           3 │           4 │           5 │           4 │           3 │           3 │
 │           6 │           7 │           8 │           7 │           6 │           6 │
 ╰─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────╯
Record count: 3
letters:
 ╭─────────────┬─────────────╮
 │ a  (string) │ b  (string) │
 ├─────────────┼─────────────┤
 │ A           │ B           │
 │ C           │ D           │
 ╰─────────────┴─────────────╯
Record count: 2]]></pre>

		<p>
			The <code>letters</code> relation stays rock steady and <code>relpipe-tr-cut 'numbers'</code> does not affect it in any way.
		</p>
		
		
		<h2>Read an Atom feed using XQuery and relpipe-in-xml</h2>
		
		<p>
			Atom Syndication Format is a standard for publishing web feeds a.k.a web syndication. 
			These feeds are usually consumed by a <em>feed reeder</em> that aggregates news from many websites and displays them in a uniform format.
			The Atom feed is an XML with a list of recent news containing their titles, URLs and short annotations.
			It also contains some metadata (website author, title etc.).
		</p>
		<p>
			Using this simple XQuery<m:podČarou>see <a href="https://en.wikibooks.org/wiki/XQuery">XQuery</a> at Wikibooks</m:podČarou>
			<em>FLWOR Expression</em>
			we convert the Atom feed into the XML serialization of relational data:
		</p>
		
		<m:pre jazyk="xq" src="examples/atom.xq" odkaz="ano"/>
		
		<p>
			This is similar operation to <a href="https://www.postgresql.org/docs/current/functions-xml.html">xmltable</a> used in SQL databases.
			It converts an XML tree structure to the relational form.
			In our case, the output is still XML, but in a format that can be read by <code>relpipe-in-xml</code>.
			All put together in a single shell script:
		</p>
		
		<m:pre jazyk="bash" src="examples/atom.sh"/>
		
		<p>Will generate a table with web news:</p>
		
		<m:pre jazyk="text" src="examples/atom.txt"/>
		
		<p>
			For frequent usage we can create a script or funcrion called <code>relpipe-in-atom</code>
			that reads Atom XML on STDIN and generates relational data on STDOUT.
			And then do any of these:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[wget … | relpipe-in-atom | relpipe-out-tabular
wget … | relpipe-in-atom | relpipe-out-csv
wget … | relpipe-in-atom | relpipe-out-gui
wget … | relpipe-in-atom | relpipe-out-nullbyte | while read_nullbyte published title url; do echo "$title"; done
wget … | relpipe-in-atom | relpipe-out-csv | csv2rec | …
]]></m:pre>

		<p>
			There are several implementations of XQuery.
			<a href="http://galax.sourceforge.net/">Galax</a> is one of them. 
			<a href="http://xqilla.sourceforge.net/">XQilla</a> or
			<a href="http://basex.org/basex/xquery/">BaseX</a> are another ones (and support newer versions of the standard).
			There are also XSLT processors like <a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a>.
			BaseX can be used instead of Galax – we just replace
			<code>galax-run -context-item /dev/stdin</code> with <code>basex -i /dev/stdin</code>.
		</p>
		
		<p>
			Reading Atom feeds in a terminal might not be the best way to get news from a website,
			but this simple example learns us how to convert arbitrary XML to relational data.
			And of course, we can generate multiple relations from a single XML using a single XQuery script.
			XQuery can be also used for operations like JOIN or UNION and for filtering and other transformations
			as will be shown in further examples.
		</p>
		
		<h2>Read files metadata using relpipe-in-filesystem</h2>
		
		<p>
			Our filesystems contain valuable information and using proper tools we can extract them.
			Using <code>relpipe-in-filesystem</code> we can gather metadata of our files and process them in relational way.
			This tools does not traverse our filesystem (remember the rule: <em>do one thing and do it well</em>),
			instead, it eats a list of file paths separated by <code>\0</code>.
			It is typically used together with the <code>find</code> command, but we can also create such list by hand using e.g. <code>printf</code> command or <code>tr \\n \\0</code>.
		</p>
		
		<m:pre jazyk="bash">find /etc/ssh/ -print0 | relpipe-in-filesystem | relpipe-out-tabular</m:pre>
		
		<p>
			In the basic scenario, it behaves like <code>ls -l</code>, just more modular and machine-readable:
		</p>
		
		<pre><![CDATA[filesystem:
 ╭───────────────────────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
 │ path                     (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
 ├───────────────────────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
 │ /etc/ssh/                         │ d             │              0 │ root           │ root           │
 │ /etc/ssh/moduli                   │ f             │         553122 │ root           │ root           │
 │ /etc/ssh/ssh_host_ecdsa_key       │ f             │            227 │ root           │ root           │
 │ /etc/ssh/sshd_config              │ f             │           3262 │ root           │ root           │
 │ /etc/ssh/ssh_host_ed25519_key.pub │ f             │             91 │ root           │ root           │
 │ /etc/ssh/ssh_host_ecdsa_key.pub   │ f             │            171 │ root           │ root           │
 │ /etc/ssh/ssh_host_rsa_key         │ f             │           1679 │ root           │ root           │
 │ /etc/ssh/ssh_config               │ f             │           1580 │ root           │ root           │
 │ /etc/ssh/ssh_host_ed25519_key     │ f             │            399 │ root           │ root           │
 │ /etc/ssh/ssh_import_id            │ f             │            338 │ root           │ root           │
 │ /etc/ssh/ssh_host_rsa_key.pub     │ f             │            391 │ root           │ root           │
 ╰───────────────────────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
Record count: 11]]></pre>

		<p>
			We can specify desired attributes and also their aliases:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[find /etc/ssh/ -print0 \
	| relpipe-in-filesystem \
		--file path --as artefact \
		--file size \
		--file owner --as dear_owner \
	| relpipe-out-tabular]]></m:pre>
	
		<p>And we will get a subset with renamed attributes:</p>
	
		<pre><![CDATA[filesystem:
 ╭───────────────────────────────────┬────────────────┬─────────────────────╮
 │ artefact                 (string) │ size (integer) │ dear_owner (string) │
 ├───────────────────────────────────┼────────────────┼─────────────────────┤
 │ /etc/ssh/                         │              0 │ root                │
 │ /etc/ssh/moduli                   │         553122 │ root                │
 │ /etc/ssh/ssh_host_ecdsa_key       │            227 │ root                │
 │ /etc/ssh/sshd_config              │           3262 │ root                │
 │ /etc/ssh/ssh_host_ed25519_key.pub │             91 │ root                │
 │ /etc/ssh/ssh_host_ecdsa_key.pub   │            171 │ root                │
 │ /etc/ssh/ssh_host_rsa_key         │           1679 │ root                │
 │ /etc/ssh/ssh_config               │           1580 │ root                │
 │ /etc/ssh/ssh_host_ed25519_key     │            399 │ root                │
 │ /etc/ssh/ssh_import_id            │            338 │ root                │
 │ /etc/ssh/ssh_host_rsa_key.pub     │            391 │ root                │
 ╰───────────────────────────────────┴────────────────┴─────────────────────╯
Record count: 11]]></pre>

		<p>
			We can also choose, which path format fits our needs best:
		</p>


		<m:pre jazyk="bash"><![CDATA[find ../../etc/ssh/ -print0 \
	| relpipe-in-filesystem \
		--file path \
		--file path_absolute \
		--file path_canonical \
		--file name \
	| relpipe-out-tabular]]></m:pre>
	
		<p>The <code>path</code> attribute contains the exact same value as was on input. Other formats are derived:</p>
	
		<pre><![CDATA[filesystem:
 ╭────────────────────────────────────────┬───────────────────────────────────────────────────┬───────────────────────────────────┬──────────────────────────╮
 │ path                          (string) │ path_absolute                            (string) │ path_canonical           (string) │ name            (string) │
 ├────────────────────────────────────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼──────────────────────────┤
 │ ../../etc/ssh/                         │ /home/hack/../../etc/ssh/                         │ /etc/ssh                          │                          │
 │ ../../etc/ssh/moduli                   │ /home/hack/../../etc/ssh/moduli                   │ /etc/ssh/moduli                   │ moduli                   │
 │ ../../etc/ssh/ssh_host_ecdsa_key       │ /home/hack/../../etc/ssh/ssh_host_ecdsa_key       │ /etc/ssh/ssh_host_ecdsa_key       │ ssh_host_ecdsa_key       │
 │ ../../etc/ssh/sshd_config              │ /home/hack/../../etc/ssh/sshd_config              │ /etc/ssh/sshd_config              │ sshd_config              │
 │ ../../etc/ssh/ssh_host_ed25519_key.pub │ /home/hack/../../etc/ssh/ssh_host_ed25519_key.pub │ /etc/ssh/ssh_host_ed25519_key.pub │ ssh_host_ed25519_key.pub │
 │ ../../etc/ssh/ssh_host_ecdsa_key.pub   │ /home/hack/../../etc/ssh/ssh_host_ecdsa_key.pub   │ /etc/ssh/ssh_host_ecdsa_key.pub   │ ssh_host_ecdsa_key.pub   │
 │ ../../etc/ssh/ssh_host_rsa_key         │ /home/hack/../../etc/ssh/ssh_host_rsa_key         │ /etc/ssh/ssh_host_rsa_key         │ ssh_host_rsa_key         │
 │ ../../etc/ssh/ssh_config               │ /home/hack/../../etc/ssh/ssh_config               │ /etc/ssh/ssh_config               │ ssh_config               │
 │ ../../etc/ssh/ssh_host_ed25519_key     │ /home/hack/../../etc/ssh/ssh_host_ed25519_key     │ /etc/ssh/ssh_host_ed25519_key     │ ssh_host_ed25519_key     │
 │ ../../etc/ssh/ssh_import_id            │ /home/hack/../../etc/ssh/ssh_import_id            │ /etc/ssh/ssh_import_id            │ ssh_import_id            │
 │ ../../etc/ssh/ssh_host_rsa_key.pub     │ /home/hack/../../etc/ssh/ssh_host_rsa_key.pub     │ /etc/ssh/ssh_host_rsa_key.pub     │ ssh_host_rsa_key.pub     │
 ╰────────────────────────────────────────┴───────────────────────────────────────────────────┴───────────────────────────────────┴──────────────────────────╯
Record count: 11]]></pre>

		<p>
			We can also <em>select</em> symlink targets or their types.
			If some file is missing or is inaccessible due to permissions, only <code>path</code> is printed for it.
		</p>
		
		<p>
			Tip: if we are looking for files in the current directory and want omit the „.“ we just call: <code>find -printf '%P\0'</code> instead of <code>find -print0</code>.
		</p>
		
		
		<h2>Using relpipe-in-filesystem to read extended attributes</h2>
		
		<p>
			Extended attributes (xattr) are additional <em>key=value</em> pairs that can be attached to our files.
			They are not stored inside the files, but on the filesystem.
			Thus they are independent of particular file format (which might not support metadata)
			and we can use them e.g. for tagging, cataloguing or adding some notes to our files.
			Some tools like GNU Wget use extended attributes to store metadata like the original URL from which the file was downloaded.
		</p>
		
		<m:pre jazyk="bash"><![CDATA[wget --recursive --level=1 https://relational-pipes.globalcode.info/
find -type f -printf '%P\0' \
	| relpipe-in-filesystem --file path --file size --xattr xdg.origin.url  \
	| relpipe-out-tabular
]]></m:pre>

		<p>And now we know, where the files on our disk came from:</p>

		<pre><![CDATA[filesystem:
 ╭───────────────────────────┬────────────────┬────────────────────────────────────────────────────────────────────╮
 │ path             (string) │ size (integer) │ xdg.origin.url                                            (string) │
 ├───────────────────────────┼────────────────┼────────────────────────────────────────────────────────────────────┤
 │ index.html                │          12159 │ https://relational-pipes.globalcode.info/v_0/                      │
 │ v_0/atom.xml              │           4613 │ https://relational-pipes.globalcode.info/v_0/atom.xml              │
 │ v_0/rss.xml               │           4926 │ https://relational-pipes.globalcode.info/v_0/rss.xml               │
 │ v_0/js/skript.js          │           2126 │ https://relational-pipes.globalcode.info/v_0/js/skript.js          │
 │ v_0/css/styl.css          │           2988 │ https://relational-pipes.globalcode.info/v_0/css/styl.css          │
 │ v_0/css/relpipe.css       │           1095 │ https://relational-pipes.globalcode.info/v_0/css/relpipe.css       │
 │ v_0/css/syntaxe.css       │           3584 │ https://relational-pipes.globalcode.info/v_0/css/syntaxe.css       │
 │ v_0/index.xhtml           │          12159 │ https://relational-pipes.globalcode.info/v_0/index.xhtml           │
 │ v_0/grafika/logo.png      │           3298 │ https://relational-pipes.globalcode.info/v_0/grafika/logo.png      │
 │ v_0/principles.xhtml      │          17171 │ https://relational-pipes.globalcode.info/v_0/principles.xhtml      │
 │ v_0/roadmap.xhtml         │          11097 │ https://relational-pipes.globalcode.info/v_0/roadmap.xhtml         │
 │ v_0/faq.xhtml             │          11080 │ https://relational-pipes.globalcode.info/v_0/faq.xhtml             │
 │ v_0/specification.xhtml   │          12983 │ https://relational-pipes.globalcode.info/v_0/specification.xhtml   │
 │ v_0/implementation.xhtml  │          10810 │ https://relational-pipes.globalcode.info/v_0/implementation.xhtml  │
 │ v_0/examples.xhtml        │          76958 │ https://relational-pipes.globalcode.info/v_0/examples.xhtml        │
 │ v_0/license.xhtml         │          65580 │ https://relational-pipes.globalcode.info/v_0/license.xhtml         │
 │ v_0/screenshots.xhtml     │           5708 │ https://relational-pipes.globalcode.info/v_0/screenshots.xhtml     │
 │ v_0/download.xhtml        │           5204 │ https://relational-pipes.globalcode.info/v_0/download.xhtml        │
 │ v_0/contact.xhtml         │           4940 │ https://relational-pipes.globalcode.info/v_0/contact.xhtml         │
 │ v_0/classic-example.xhtml │           9539 │ https://relational-pipes.globalcode.info/v_0/classic-example.xhtml │
 ╰───────────────────────────┴────────────────┴────────────────────────────────────────────────────────────────────╯
Record count: 20]]></pre>

		<p>
			If we like the BeOS/Haiku style, we can create empty files with some attributes attached and use our filesystem as a simple database
			and query it using relational tools.
			It will lack indexing, but for basic scenarios like <em>address book</em> it will be fast enough
			and we can feel a bit of BeOS/Haiku atmosphere in our contemporary GNU/Linux systems.
			But be careful with that because some editors delete and recreate files while saving them, which destroys the xattrs.
			Tools like <code>rsync</code> or <code>tar</code> with <code>--xattrs</code> option will backup our attributes securely.
		</p>

		
	</text>

</stránka>
author	František Kučera <franta-hg@frantovo.cz>
	Fri, 18 Jan 2019 21:34:58 +0100
branch	v_0
changeset 241	f71d300205b7
parent 240	d81c623de788
child 244	d4f401b5f90c
permissions	-rw-r--r--