examples: Read an Atom feed using XQuery and relpipe-in-xml v_0
authorFrantišek Kučera <franta-hg@frantovo.cz>
Fri, 11 Jan 2019 22:39:30 +0100
branchv_0
changeset 232 c7d05cf04b76
parent 231 ea49ee7a73c9
child 233 a8029bd1c87a
examples: Read an Atom feed using XQuery and relpipe-in-xml
relpipe-data/examples.xml
relpipe-data/examples/atom.sh
relpipe-data/examples/atom.txt
relpipe-data/examples/atom.xq
--- a/relpipe-data/examples.xml	Fri Jan 11 22:39:05 2019 +0100
+++ b/relpipe-data/examples.xml	Fri Jan 11 22:39:30 2019 +0100
@@ -687,6 +687,67 @@
 		</p>
 		
 		
+		<h2>Read an Atom feed using XQuery and relpipe-in-xml</h2>
+		
+		<p>
+			Atom Syndication Format is a standard for publishing web feeds a.k.a web syndication. 
+			These feeds are usually consumed by a <em>feed reeder</em> that aggregates news from many websites and displays them in a uniform format.
+			The Atom feed is an XML with a list of recent news containing their titles, URLs and short annotations.
+			It also contains some metadata (website author, title etc.).
+		</p>
+		<p>
+			Using this simple XQuery<m:podČarou>see <a href="https://en.wikibooks.org/wiki/XQuery">XQuery</a> at Wikibooks</m:podČarou>
+			<em>FLWOR Expression</em>
+			we convert the Atom feed into the XML serialization of relational data:
+		</p>
+		
+		<m:pre jazyk="xq" src="examples/atom.xq" odkaz="ano"/>
+		
+		<p>
+			This is similar operation to <a href="https://www.postgresql.org/docs/current/functions-xml.html">xmltable</a> used in SQL databases.
+			It converts an XML tree structure to the relational form.
+			In our case, the output is still XML, but in a format that can be read by <code>relpipe-in-xml</code>.
+			All put together in a single shell script:
+		</p>
+		
+		<m:pre jazyk="bash" src="examples/atom.sh"/>
+		
+		<p>Will generate a table with web news:</p>
+		
+		<m:pre jazyk="text" src="examples/atom.txt"/>
+		
+		<p>
+			For frequent usage we can create a script or funcrion called <code>relpipe-in-atom</code>
+			that reads Atom XML on STDIN and generates relational data on STDOUT.
+			And then do any of these:
+		</p>
+		
+		<m:pre jazyk="bash"><![CDATA[wget … | relpipe-in-atom | relpipe-out-tabular
+wget … | relpipe-in-atom | relpipe-out-csv
+wget … | relpipe-in-atom | relpipe-out-gui
+wget … | relpipe-in-atom | relpipe-out-nullbyte | while read_nullbyte published title url; do echo "$title"; done
+wget … | relpipe-in-atom | relpipe-out-csv | csv2rec | …
+]]></m:pre>
+
+		<p>
+			There are several implementations of XQuery.
+			<a href="http://galax.sourceforge.net/">Galax</a> is one of them. 
+			<a href="http://xqilla.sourceforge.net/">XQilla</a> or
+			<a href="http://basex.org/basex/xquery/">BaseX</a> are another ones (and support newer versions of the standard).
+			There are also XSLT processors like <a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a>.
+			BaseX can be used instead of Galax – we just replace
+			<code>galax-run -context-item /dev/stdin</code> with <code>basex -i /dev/stdin</code>.
+		</p>
+		
+		<p>
+			Reading Atom feeds in a terminal might not be the best way to get news from a website,
+			but this simple example learns us how to convert arbitrary XML to relational data.
+			And of course, we can generate multiple relations from a single XML using a single XQuery script.
+			XQuery can be also used for operations like JOIN or UNION and for filtering and other transformations
+			as will be shown in further examples.
+		</p>
+		
+		
 	</text>
 
 </stránka>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/atom.sh	Fri Jan 11 22:39:30 2019 +0100
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+get_atom() {
+	wget --quiet --output-document - https://blog.frantovo.cz/agregace/c/
+	# wget --quiet --output-document - https://blog.frantovo.cz/agregace/k/
+	# cat atom.xml
+}
+
+get_atom | galax-run -context-item /dev/stdin atom.xq | relpipe-in-xml | relpipe-out-tabular
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/atom.txt	Fri Jan 11 22:39:30 2019 +0100
@@ -0,0 +1,26 @@
+atom:
+ ╭──────────────────────┬───────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
+ │ published   (string) │ title                                                        (string) │ url                                                                                                                                       (string) │
+ ├──────────────────────┼───────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
+ │ 2018-12-24T13:37:24Z │ GNU Bash: Vánoční tipy                                                │ https://blog.frantovo.cz/c/370/GNU%20Bash%3A%20V%C3%A1no%C4%8Dn%C3%AD%20tipy                                                                       │
+ │ 2018-08-04T23:23:00Z │ HiFive1 – deska s otevřeným čipem RISC-V                              │ https://blog.frantovo.cz/c/368/HiFive1%20%E2%80%93%20deska%20s%C2%A0otev%C5%99en%C3%BDm%20%C4%8Dipem%20RISC-V                                      │
+ │ 2018-06-30T13:37:08Z │ The Things Network – LoRaWAN – IoT                                    │ https://blog.frantovo.cz/c/366/The%20Things%20Network%20%E2%80%93%20LoRaWAN%20%E2%80%93%C2%A0IoT                                                   │
+ │ 2018-03-31T19:48:00Z │ Roland Rubix44 – externí zvuková karta                                │ https://blog.frantovo.cz/c/365/Roland%20Rubix44%20%E2%80%93%20extern%C3%AD%20zvukov%C3%A1%20karta                                                  │
+ │ 2017-11-25T20:26:49Z │ Přepisování parametrů příkazové řádky                                 │ https://blog.frantovo.cz/c/362/P%C5%99episov%C3%A1n%C3%AD%20parametr%C5%AF%20p%C5%99%C3%ADkazov%C3%A9%20%C5%99%C3%A1dky                            │
+ │ 2017-07-01T22:16:00Z │ Java a záludnost ternárního operátoru                                 │ https://blog.frantovo.cz/c/359/Java%20a%C2%A0z%C3%A1ludnost%20tern%C3%A1rn%C3%ADho%20oper%C3%A1toru                                                │
+ │ 2017-06-11T19:05:13Z │ Paralelní port jako generátor signálu                                 │ https://blog.frantovo.cz/c/358/Paraleln%C3%AD%20port%20jako%20gener%C3%A1tor%20sign%C3%A1lu                                                        │
+ │ 2016-12-28T22:50:00Z │ Herní ovladače počátku 90. let                                        │ https://blog.frantovo.cz/c/356/Hern%C3%AD%20ovlada%C4%8De%20po%C4%8D%C3%A1tku%2090.%20let                                                          │
+ │ 2016-11-12T20:16:00Z │ GPIO v Raspberry Pi jako soubory                                      │ https://blog.frantovo.cz/c/355/GPIO%20v%C2%A0Raspberry%20Pi%20jako%20soubory                                                                       │
+ │ 2016-02-29T23:45:00Z │ Nakupujeme v zahraničí po Internetu                                   │ https://blog.frantovo.cz/c/353/Nakupujeme%20v%C2%A0zahrani%C4%8D%C3%AD%20po%20Internetu                                                            │
+ │ 2015-12-24T17:25:41Z │ Malajsie: Kuala Lumpur a hackerspacy                                  │ https://blog.frantovo.cz/c/354/Malajsie%3A%20Kuala%20Lumpur%20a%C2%A0hackerspacy                                                                   │
+ │ 2015-10-04T12:25:07Z │ Opravujeme chyby v softwaru: inotify-tools                            │ https://blog.frantovo.cz/c/352/Opravujeme%20chyby%20v%C2%A0softwaru%3A%20inotify-tools                                                             │
+ │ 2015-09-30T23:10:01Z │ CLOC: počítáme řádky kódu                                             │ https://blog.frantovo.cz/c/351/CLOC%3A%20po%C4%8D%C3%ADt%C3%A1me%20%C5%99%C3%A1dky%20k%C3%B3du                                                     │
+ │ 2015-06-20T20:03:31Z │ binfmt_misc: spouštíme javovské programy podobně jako nativní binárky │ https://blog.frantovo.cz/c/349/binfmt_misc%3A%20spou%C5%A1t%C3%ADme%20javovsk%C3%A9%20programy%20podobn%C4%9B%20jako%20nativn%C3%AD%20bin%C3%A1rky │
+ │ 2015-06-13T22:56:58Z │ Přepisujeme soukromé proměnné v Javě pomocí reflexe                   │ https://blog.frantovo.cz/c/348/P%C5%99episujeme%20soukrom%C3%A9%20prom%C4%9Bnn%C3%A9%20v%C2%A0Jav%C4%9B%20pomoc%C3%AD%20reflexe                    │
+ │ 2015-04-04T16:47:44Z │ Jak jsem si (ne)koupil notebook                                       │ https://blog.frantovo.cz/c/341/Jak%20jsem%20si%20%28ne%29koupil%20notebook                                                                         │
+ │ 2015-02-15T18:55:35Z │ Těžíme akumulátory 18650                                              │ https://blog.frantovo.cz/c/340/T%C4%9B%C5%BE%C3%ADme%20akumul%C3%A1tory%2018650                                                                    │
+ │ 2015-01-17T23:23:00Z │ Java 8: Stream API                                                    │ https://blog.frantovo.cz/c/339/Java%208%3A%20Stream%20API                                                                                          │
+ │ 2015-01-04T01:47:49Z │ JXD S7800B: kapesní herní konsole                                     │ https://blog.frantovo.cz/c/338/JXD%20S7800B%3A%20kapesn%C3%AD%20hern%C3%AD%20konsole                                                               │
+ │ 2014-12-28T14:31:07Z │ Vánoční hvězda – 3D                                                   │ https://blog.frantovo.cz/c/337/V%C3%A1no%C4%8Dn%C3%AD%20hv%C4%9Bzda%20%E2%80%93%203D                                                               │
+ ╰──────────────────────┴───────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
+Record count: 20
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/atom.xq	Fri Jan 11 22:39:30 2019 +0100
@@ -0,0 +1,26 @@
+xquery version "1.0";
+
+declare namespace relpipe="tag:globalcode.info,2018:relpipe";
+declare namespace atom="http://www.w3.org/2005/Atom";
+
+<relpipe xmlns="tag:globalcode.info,2018:relpipe">
+	<relation>
+		<name>atom</name>
+		<attributes-metadata>
+			<attribute-metadata name="published" type="string"/>
+			<attribute-metadata name="title" type="string"/>
+			<attribute-metadata name="url" type="string"/>
+		</attributes-metadata>
+
+		{
+			for $e in /atom:feed/atom:entry
+			order by $e/atom:published descending
+			return
+				<record>
+					<attribute>{$e/atom:published/text()}</attribute>
+					<attribute>{$e/atom:title/text()}</attribute>
+					<attribute>{string($e/atom:link/@href)}</attribute>
+				</record>
+		}
+	</relation>
+</relpipe>