relpipe-data/examples-recfile.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 00:43:11 +0100
branchv_0
changeset 329 5bc2bb8b7946
parent 256 822ffd23d679
permissions -rw-r--r--
Release v0.18

<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Integrating Relational pipes with GNU Recutils</nadpis>
	<perex>using recfile format as input and output + filtering</perex>
	<m:pořadí-příkladu>01900</m:pořadí-příkladu>

	<text xmlns="http://www.w3.org/1999/xhtml">
		
		<p>
			Recfile is the native format of <a href="https://www.gnu.org/software/recutils/">GNU Recutils</a>.
			Recfiles are text files that contain records of various types.
			They are human-editable and serve as simple databases.
			<m:name/> support input and output in this format since v0.11.
		</p>
		
		
		<p>
			We can convert any relational data to the recfile format by using <code>relpipe-out-recfile</code> – e.g. our <code>fstab</code> will look like this:
		</p>

		<m:pre jazyk="text" src="examples/relpipe-out-fstab.rec.txt"/>
		
		<p>
			Then we can edit this data (e.g. in GNU Emacs which has mode for this format) or store it in a version control system like Mercurial or Git.
			Because it is a text format (like XML, which is also supported and good for this purpose),
			we can efficiently track changes in data across versions, do <code>diff</code> or (with some care) even <code>patch</code>.
			And we can use whole GNU Recutils toolchain while working with such data.
		</p>
		
		<p>
			Obligatory example of filtering our <code>fstab</code>:
		</p>
		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab | relpipe-out-recfile | recsel -e "type = 'btrfs' || type = 'xfs'"]]></m:pre>
		
		<p>Will give us a recfile:</p>
		
		<m:pre jazyk="text"><![CDATA[scheme: UUID
device: a2b5f230-a795-4f6f-a39b-9b57686c86d5
mount_point: /home
type: btrfs
options: relatime
dump: 0
pass: 2

scheme:
device: /dev/mapper/sdf_crypt
mount_point: /mnt/private
type: xfs
options: relatime
dump: 0
pass: 2]]></m:pre>
		
		<p>And we can convert it back to the relational format using <code>relpipe-in-recfile</code>:</p>		
		<m:pre jazyk="bash"><![CDATA[relpipe-in-fstab \
	| relpipe-out-recfile \
	| recsel -e "type = 'btrfs' || type = 'xfs'" \
	| relpipe-in-recfile \
	| relpipe-out-tabular]]></m:pre>
		
		<p>and print as a table in our terminal:</p>		
		<m:pre jazyk="text"><![CDATA[recfile:
 ╭─────────────────┬──────────────────────────────────────┬──────────────────────┬───────────────┬──────────────────┬───────────────┬───────────────╮
 │ scheme (string) │ device                      (string) │ mount_point (string) │ type (string) │ options (string) │ dump (string) │ pass (string) │
 ├─────────────────┼──────────────────────────────────────┼──────────────────────┼───────────────┼──────────────────┼───────────────┼───────────────┤
 │ UUID            │ a2b5f230-a795-4f6f-a39b-9b57686c86d5 │ /home                │ btrfs         │ relatime         │ 0             │ 2             │
 │                 │ /dev/mapper/sdf_crypt                │ /mnt/private         │ xfs           │ relatime         │ 0             │ 2             │
 ╰─────────────────┴──────────────────────────────────────┴──────────────────────┴───────────────┴──────────────────┴───────────────┴───────────────╯
Record count: 2]]></m:pre>
		
		<p>
			n.b. in the v0.11 the conversion to recfiles and back is not 100% lossless (unlike XML)
			because <m:name/> support only three data types (string, unsigned integer and boolean) in this version;
			this will be improved in later releases (more data types are planned before v1.0)
		</p>
		
		<p>
			Because some web browsers or tools can store the original URL in extended attributes while downloading a file,
			we can use <code>recsel</code> to find files downloaded from some particular domain:
		</p>
		<m:pre jazyk="bash"><![CDATA[find -print0 | relpipe-in-filesystem \
	--file path \
	--file size \
	--file type \
	--xattr xdg.origin.url --as url \
	| relpipe-out-recfile \
	| recsel -e 'url ~ "^https?://([^/]*\.)?archive\.org/"']]></m:pre>
		
		<p>
			<m:name/> can be also used together with <a href="https://sql-dk.globalcode.info/">SQL-DK</a> (in 2019-03-05 development version)
			to pipe data from big relational databases like PostgreSQL or MariaDB to other formats like recfiles.
			Having a script:
		</p>
				
		<m:pre jazyk="bash" src="examples/sql-dk_pg_1.sh" odkaz="ano"/>
		
		<p>
			We can convert result sets from any SQL queries to relational format and then work with such data without connection to the original database.
			Thus we can cache (<em>materialize</em>) the results locally in a file and use them even offline.
			Or we can run the SQL query each time and have fresh data:
		</p>
		
		<m:pre jazyk="text"><![CDATA[sql-dk_pg_1.sh | relpipe-out-recfile]]></m:pre>
		
		<p>Will result in:</p>
		<m:pre jazyk="text" src="examples/sql-dk_pg_1.rec.txt"/>
		
		<p>Or we can view the data in classic tabular way using <code>relpipe-out-tabular</code>:</p>
		<m:pre jazyk="text" src="examples/sql-dk_pg_1.tabular.txt"/>
		
		<p>
			Materialized (or fresh) data from the database can be further transformed 
			using <code>relpipe-tr-*</code> commands like grep, sed, cut, guile, 
			or (through the recfile conversion) by the <code>recsel</code> command from GNU Recutils.
		</p>
		
		<p>
			The <code>relpipe-in-recfile</code> will help with conversion of recfiles to various formats like XHTML,
			pretty-printing or with xargs-like processing
			(using <code>relpipe-out-nullbyte</code> and regular <code>xargs</code> or <code>read_nullbyte</code> function
			as described in the <m:a href="examples-out-bash">Writing an output filter in Bash</m:a> example).
			Thus we can have data-driven Bash scripts based on our recfiles.
		</p>

		
	</text>

</stránka>