author František Kučera <>
Tue, 28 May 2019 21:52:15 +0200
changeset 259 13a521e9d34d
parent 254 23247d93a012
child 262 846510a73535
permissions -rw-r--r--
Added tag relpipe-v0.12 for changeset 2868d772c27e

	<nadpis>Reading an Atom feed using XQuery</nadpis>
	<perex>converting arbitrary XML into relational data using XQuery</perex>

	<text xmlns="">
			Atom Syndication Format is a standard for publishing web feeds a.k.a web syndication. 
			These feeds are usually consumed by a <em>feed reeder</em> that aggregates news from many websites and displays them in a uniform format.
			The Atom feed is an XML with a list of recent news containing their titles, URLs and short annotations.
			It also contains some metadata (website author, title etc.).
			Using this simple XQuery<m:podČarou>see <a href="">XQuery</a> at Wikibooks</m:podČarou>
			<em>FLWOR Expression</em>
			we convert the Atom feed into the XML serialization of relational data:
		<m:pre jazyk="xq" src="examples/atom.xq" odkaz="ano"/>
			This is similar operation to <a href="">xmltable</a> used in SQL databases.
			It converts an XML tree structure to the relational form.
			In our case, the output is still XML, but in a format that can be read by <code>relpipe-in-xml</code>.
			All put together in a single shell script:
		<m:pre jazyk="bash" src="examples/"/>
		<p>Will generate a table with web news:</p>
		<m:pre jazyk="text" src="examples/atom.txt"/>
		<p>Or with <code>relpipe-out-recfile</code>, we will get output in the recfile format (<a href="">GNU Recutils</a>), like this:</p>
		<m:pre jazyk="text" src="examples/atom.rec"/>
			For frequent usage we can create a script or funcrion called <code>relpipe-in-atom</code>
			that reads Atom XML on STDIN and generates relational data on STDOUT.
			And then do any of these:
		<m:pre jazyk="bash"><![CDATA[wget … | relpipe-in-atom | relpipe-out-tabular
wget … | relpipe-in-atom | relpipe-out-csv
wget … | relpipe-in-atom | relpipe-out-gui
wget … | relpipe-in-atom | relpipe-out-nullbyte | while read_nullbyte published title url; do echo "$title"; done
wget … | relpipe-in-atom | relpipe-out-recfile

			There are several implementations of XQuery.
			<a href="">Galax</a> is one of them. 
			<a href="">XQilla</a> or
			<a href="">BaseX</a> are another ones (and support newer versions of the standard).
			There are also XSLT processors like <a href="">xsltproc</a>.
			BaseX can be used instead of Galax – we just replace
			<code>galax-run -context-item /dev/stdin</code> with <code>basex -i /dev/stdin</code>.
			Reading Atom feeds in a terminal might not be the best way to get news from a website,
			but this simple example learns us how to convert arbitrary XML to relational data.
			And of course, we can generate multiple relations from a single XML using a single XQuery script.
			XQuery can be also used for operations like JOIN or UNION and for filtering and other transformations
			as will be shown in further examples.
