examples: relpipe-in-xmltable v_0
authorFrantišek Kučera <franta-hg@frantovo.cz>
Thu, 25 Jul 2019 22:16:12 +0200
branchv_0
changeset 263 8bf13358a50a
parent 262 846510a73535
child 264 d39cfc926f95
examples: relpipe-in-xmltable
relpipe-data/download.xml
relpipe-data/examples-in-xmltable-atom.xml
relpipe-data/examples-in-xmltable-libvirt.xml
relpipe-data/examples/atom-xmltable.sh
relpipe-data/examples/atom-xmltable.txt
relpipe-data/examples/atom-xmltable.xml
relpipe-data/implementation.xml
--- a/relpipe-data/download.xml	Wed Jul 24 14:18:42 2019 +0200
+++ b/relpipe-data/download.xml	Thu Jul 25 22:16:12 2019 +0200
@@ -20,6 +20,7 @@
 hg clone https://hg.globalcode.info/relpipe/relpipe-in-fstab.cpp;
 hg clone https://hg.globalcode.info/relpipe/relpipe-in-recfile.cpp;
 hg clone https://hg.globalcode.info/relpipe/relpipe-in-xml.cpp;
+hg clone https://hg.globalcode.info/relpipe/relpipe-in-xmltable.cpp;
 hg clone https://hg.globalcode.info/relpipe/relpipe-lib-cli.cpp;
 hg clone https://hg.globalcode.info/relpipe/relpipe-lib-protocol.cpp;
 hg clone https://hg.globalcode.info/relpipe/relpipe-lib-reader.cpp;
@@ -60,6 +61,7 @@
 			<li>2019-02-20: <m:a href="release-v0.10">v0.10</m:a></li>
 			<li>2019-04-08: <m:a href="release-v0.11">v0.11</m:a></li>
 			<li>2019-05-28: <m:a href="release-v0.12">v0.12</m:a></li>
+			<li>2019-07-30: <m:a href="release-v0.13">v0.13</m:a></li>
 		</ul>
 		
 	</text>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-in-xmltable-atom.xml	Thu Jul 25 22:16:12 2019 +0200
@@ -0,0 +1,45 @@
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	
+	<nadpis>Reading an Atom feed using XMLTable</nadpis>
+	<perex>converting arbitrary XML into relational data using XMLTable</perex>
+	<m:pořadí-příkladu>02800</m:pořadí-příkladu>
+
+	<text xmlns="http://www.w3.org/1999/xhtml">
+		
+		<p>
+			In this exaple we will achieve the same result as in <m:a href="examples-xquery-atom">previous one with XQuery</m:a>
+			but we will use a different tool – <code>relpipe-in-xmltable</code>.
+			This approach differs from XQery one in several aspects:
+		</p>
+		
+		<ul>
+			<li>no need for writing a script/program – we need just to write an one-liner resp. call one command with parameters</li>
+			<li>no need for an external tool like Galax, BaseX or XQilla</li>
+			<li>no serialization to the intermediary XML and its immediate deserialization – the XML input is parsed in the same proces which outputs relational data</li>
+			<li>simpler (but less powerful) tool – we write only two or more XPath expressions</li>
+		</ul>
+		
+		<p>This is the (shortened) structure of our XML input:</p>
+
+		<m:pre jazyk="xml" src="examples/atom-xmltable.xml"/>
+
+		<p>This pipeline will download the XML data and transform it to two relations:</p>
+
+		<m:pre jazyk="bash" src="examples/atom-xmltable.sh"/>
+		
+		<p>The first one contains individual entries and the second one contains the common header:</p>
+		
+		<m:pre jazyk="text" src="examples/atom-xmltable.txt"/>
+		
+		<p>
+			This example shows how to work with namespaces and how to generate multiple relations from a single XML input.
+			It also shows that the name of the relation do not have to be a literal but might be derived from the input document.
+		</p>
+		
+		<p>If we add the <code>id</code> attribute to the entries table, we can aggregate entries from various sources and still be able to JOIN them with their metadata.</p>
+		
+	</text>
+
+</stránka>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples-in-xmltable-libvirt.xml	Thu Jul 25 22:16:12 2019 +0200
@@ -0,0 +1,139 @@
+<stránka
+	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
+	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
+	
+	<nadpis>Reading Libvirt XML files using XMLTable</nadpis>
+	<perex>converting arbitrary XML into one or more relations</perex>
+	<m:pořadí-příkladu>02700</m:pořadí-příkladu>
+
+	<text xmlns="http://www.w3.org/1999/xhtml">
+		
+		<p>
+			<a href="https://libvirt.org/">Libvirt</a> is a popular API/tool for managing virtual machines (KVM/Qemu, LXC etc.) and stores its configuration in XML files.
+			Thanks to the tool <code>relpipe-in-xmltable</code> we can get aggregated overview of our virtual machines.
+			This tool does similar job like the <a href="https://www.postgresql.org/docs/current/functions-xml.html">xmltable</a> function known from SQL.
+			It uses the <a href="https://www.w3.org/TR/xpath/all/">XPath</a> language for selecting parts of the input XML – one XPath expression points to record nodes
+			and one or more XPath expressions point to attribute nodes/values relatively to particular record node.
+			Our tool is able to produce one or more relations from a single XML input.
+			The input is parsed at once and converted to DOM in memory i.e. no streaming – thus processing of huge XML files requires appropriate amounts of RAM, on the other hand: 
+			our expression can access whole XML document and pick values not only from currently processed record node.
+		</p>
+		
+		
+		<p>These XML config files contain lot of information describing given virtual machine:</p>
+		
+		
+		<m:pre jazyk="xml"><![CDATA[<?xml version="1.0" encoding="utf8"?>
+<domain type="kvm">
+	<name>relpipe-1</name>
+	<uuid>36d1e8b2-97e9-40cb-9fc2-306ebf989282</uuid>
+	<memory unit="KiB">1048576</memory>
+	<currentMemory unit="KiB">1048576</currentMemory>
+	<vcpu placement="static">2</vcpu>
+	<os>
+		<type arch="x86_64" machine="pc-i440fx-bionic">hvm</type>
+		<boot dev="hd"/>
+	</os>
+	<features>
+		<acpi/>
+		<apic/>
+		<vmport state="off"/>
+	</features>
+	<cpu mode="custom" match="exact" check="partial">
+		<model fallback="allow">Opteron_G5</model>
+	</cpu>
+	<clock offset="utc">
+		<timer name="rtc" tickpolicy="catchup"/>
+		<timer name="pit" tickpolicy="delay"/>
+		<timer name="hpet" present="no"/>
+	</clock>
+	…
+	<devices>
+		<emulator>/usr/bin/kvm-spice</emulator>
+		<disk type='file' device='disk'>
+			<driver name='qemu' type='qcow2'/>
+			<source file='/mnt/kvm-image/relpipe-1.qcow2'/>
+			<target dev='vda' bus='virtio'/>
+			<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
+		</disk>
+		<disk type='file' device='cdrom'>
+			<driver name='qemu' type='raw'/>
+			<target dev='hda' bus='ide'/>
+			<readonly/>
+			<address type='drive' controller='0' bus='0' target='0' unit='0'/>
+		</disk>
+		…
+		<interface type='bridge'>
+			<mac address='52:54:e9:f2:f6:bb'/>
+			<source bridge='br0'/>
+			<model type='virtio'/>
+			<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
+		</interface>
+		…
+	</devices>
+</domain>]]></m:pre>
+	
+	
+	<p>If we are interested only in certain parts, we can pick them using a command like this one:</p>		
+		
+		<m:pre jazyk="bash"><![CDATA[cat /etc/libvirt/qemu/relpipe-1.xml \
+	| relpipe-in-xmltable \
+		--relation "machine" \
+			--records "/domain" \
+			--attribute "uuid" string "uuid" \
+			--attribute "name" string "name" \
+			--attribute "memory_size" integer "memory" \
+			--attribute "memory_unit" string "memory/@unit" \
+			--attribute "cpu_count" integer "vcpu" \
+		--relation "storage" \
+			--records "/domain/devices/disk" \
+			--attribute "machine" string "/domain/uuid" \
+			--attribute "type" string "@device" \
+			--attribute "format" string "driver/@type" \
+			--attribute "source" string "source/@file" \
+			--attribute "target_dev" string "target/@dev" \
+			--attribute "target_type" string "target/@bus" \
+		--relation "network_interface" \
+			--records "/domain/devices/interface" \
+			--attribute "machine" string "/domain/uuid" \
+			--attribute "mac" string "mac/@address" \
+	| relpipe-out-tabular]]></m:pre>
+
+		
+		<p>And get three relations:</p>
+		
+		<pre><![CDATA[machine:
+ ╭──────────────────────────────────────┬───────────────┬───────────────────────┬──────────────────────┬─────────────────────╮
+ │ uuid                        (string) │ name (string) │ memory_size (integer) │ memory_unit (string) │ cpu_count (integer) │
+ ├──────────────────────────────────────┼───────────────┼───────────────────────┼──────────────────────┼─────────────────────┤
+ │ 36d1e8b2-97e9-40cb-9fc2-306ebf989282 │ relpipe-1     │               1048576 │ KiB                  │                   2 │
+ ╰──────────────────────────────────────┴───────────────┴───────────────────────┴──────────────────────┴─────────────────────╯
+Record count: 1
+
+storage:
+ ╭──────────────────────────────────────┬───────────────┬─────────────────┬─────────────────────────────────┬─────────────────────┬──────────────────────╮
+ │ machine                     (string) │ type (string) │ format (string) │ source                 (string) │ target_dev (string) │ target_type (string) │
+ ├──────────────────────────────────────┼───────────────┼─────────────────┼─────────────────────────────────┼─────────────────────┼──────────────────────┤
+ │ 36d1e8b2-97e9-40cb-9fc2-306ebf989282 │ disk          │ qcow2           │ /mnt/kvm-image/relpipe-1.qcow2  │ vda                 │ virtio               │
+ │ 36d1e8b2-97e9-40cb-9fc2-306ebf989282 │ cdrom         │ raw             │                                 │ hda                 │ ide                  │
+ ╰──────────────────────────────────────┴───────────────┴─────────────────┴─────────────────────────────────┴─────────────────────┴──────────────────────╯
+Record count: 2
+
+network_interface:
+ ╭──────────────────────────────────────┬───────────────────╮
+ │ machine                     (string) │ mac      (string) │
+ ├──────────────────────────────────────┼───────────────────┤
+ │ 36d1e8b2-97e9-40cb-9fc2-306ebf989282 │ 52:54:e9:f2:f6:bb │
+ ╰──────────────────────────────────────┴───────────────────╯
+Record count: 1]]></pre>
+
+		<p>
+			Each record contain ID of the machine, thus if we collect data from several VMs, 
+			we can JOIN relevant records together, do some aggregations or statistics.
+			If we are sure that the <code>name</code> field is unique, we can use it as a key instead of the UUID.
+		</p>
+		
+		
+	</text>
+
+</stránka>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/atom-xmltable.sh	Thu Jul 25 22:16:12 2019 +0200
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+get_atom() {
+	wget --quiet --output-document - https://blog.frantovo.cz/agregace/c/?p=10
+	# wget --quiet --output-document - https://blog.frantovo.cz/agregace/k/
+	# cat atom.xml
+}
+
+get_atom \
+	| relpipe-in-xmltable \
+		--namespace "a" "http://www.w3.org/2005/Atom" \
+		--relation "/a:feed/a:title" --name-is-xpath \
+			--records "//a:entry" \
+			--attribute "published" string "a:published" \
+			--attribute "title" string "a:title" \
+			--attribute "url" string "a:link/@href" \
+		--relation "metadata" \
+			--records "." \
+			--attribute "id" string "a:id" \
+			--attribute "subtitle" string "a:subtitle" \
+	| relpipe-out-tabular
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/atom-xmltable.txt	Thu Jul 25 22:16:12 2019 +0200
@@ -0,0 +1,24 @@
+Frantovo.cz – články:
+ ╭──────────────────────┬──────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
+ │ published   (string) │ title                                       (string) │ url                                                                                                                 (string) │
+ ├──────────────────────┼──────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
+ │ 2019-07-12T08:12:00Z │ OpenPOWER – Blackbird – první spuštění               │ https://blog.frantovo.cz/c/375/OpenPOWER%20%E2%80%93%20Blackbird%20%E2%80%93%20prvn%C3%AD%20spu%C5%A1t%C4%9Bn%C3%AD          │
+ │ 2019-07-11T17:17:24Z │ Siemens Nixdorf: ComfoDesk (1990)                    │ https://blog.frantovo.cz/c/374/Siemens%20Nixdorf%3A%20ComfoDesk%20%281990%29                                                 │
+ │ 2019-07-04T18:12:02Z │ Opravujeme myš: výměna spínačů                       │ https://blog.frantovo.cz/c/373/Opravujeme%20my%C5%A1%3A%20v%C3%BDm%C4%9Bna%20sp%C3%ADna%C4%8D%C5%AF                          │
+ │ 2019-06-04T16:32:08Z │ Java a unixové doménové sokety, FD, systemd a xinetd │ https://blog.frantovo.cz/c/372/Java%20a%C2%A0unixov%C3%A9%20dom%C3%A9nov%C3%A9%20sokety%2C%20FD%2C%20systemd%20a%C2%A0xinetd │
+ │ 2019-04-26T19:48:00Z │ Zálohujeme internet: Zdrojové kódy                   │ https://blog.frantovo.cz/c/371/Z%C3%A1lohujeme%20internet%3A%20Zdrojov%C3%A9%20k%C3%B3dy                                     │
+ │ 2018-12-24T13:37:24Z │ GNU Bash: Vánoční tipy                               │ https://blog.frantovo.cz/c/370/GNU%20Bash%3A%20V%C3%A1no%C4%8Dn%C3%AD%20tipy                                                 │
+ │ 2018-08-04T23:23:00Z │ HiFive1 – deska s otevřeným čipem RISC-V             │ https://blog.frantovo.cz/c/368/HiFive1%20%E2%80%93%20deska%20s%C2%A0otev%C5%99en%C3%BDm%20%C4%8Dipem%20RISC-V                │
+ │ 2018-06-30T13:37:08Z │ The Things Network – LoRaWAN – IoT                   │ https://blog.frantovo.cz/c/366/The%20Things%20Network%20%E2%80%93%20LoRaWAN%20%E2%80%93%C2%A0IoT                             │
+ │ 2018-03-31T19:48:00Z │ Roland Rubix44 – externí zvuková karta               │ https://blog.frantovo.cz/c/365/Roland%20Rubix44%20%E2%80%93%20extern%C3%AD%20zvukov%C3%A1%20karta                            │
+ │ 2017-11-25T20:26:49Z │ Přepisování parametrů příkazové řádky                │ https://blog.frantovo.cz/c/362/P%C5%99episov%C3%A1n%C3%AD%20parametr%C5%AF%20p%C5%99%C3%ADkazov%C3%A9%20%C5%99%C3%A1dky      │
+ ╰──────────────────────┴──────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
+Record count: 10
+
+metadata:
+ ╭───────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────╮
+ │ id                                   (string) │ subtitle                                                                     (string) │
+ ├───────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────┤
+ │ urn:uuid:d61ea960-3b36-11e3-bdb9-3085a98fdb88 │ Blog nejen o svobodném softwaru, GNU/Linuxu, Javě, XML, politice, ekonomii, filosofii │
+ ╰───────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────╯
+Record count: 1
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/relpipe-data/examples/atom-xmltable.xml	Thu Jul 25 22:16:12 2019 +0200
@@ -0,0 +1,69 @@
+<?xml version="1.0" encoding="utf8"?>
+<atom:feed xmlns="http://www.w3.org/1999/xhtml" xmlns:atom="http://www.w3.org/2005/Atom">
+	<atom:title>Frantovo.cz – články</atom:title>
+	<atom:subtitle>Blog nejen o svobodném softwaru, GNU/Linuxu, Javě, XML, politice, ekonomii, filosofii</atom:subtitle>
+	<atom:id>urn:uuid:d61ea960-3b36-11e3-bdb9-3085a98fdb88</atom:id>
+	<atom:link rel="self" href="https://blog.frantovo.cz//agregace/c/"/>
+	<atom:link href="https://blog.frantovo.cz"/>
+	<atom:updated>2019-07-12T08:12:00Z</atom:updated>
+	<atom:author>
+		<atom:name>František Kučera</atom:name>
+	</atom:author>
+	<atom:entry>
+		<atom:title>OpenPOWER – Blackbird – první spuštění</atom:title>
+		<atom:link href="https://blog.frantovo.cz/c/375/OpenPOWER%20%E2%80%93%20Blackbird%20%E2%80%93%20prvn%C3%AD%20spu%C5%A1t%C4%9Bn%C3%AD"/>
+		<atom:id>https://blog.frantovo.cz/c/375/</atom:id>
+		<atom:updated>2019-07-12T08:12:00Z</atom:updated>
+		<atom:published>2019-07-12T08:12:00Z</atom:published>
+		<atom:summary type="xhtml">
+			<div>
+				<div>
+					<p>Po včerejším <em>retro</em> článku tu máme <em>návrat do budoucnosti</em> – ze které se ale naštěstí už stává současnost. Zatím jsem udělal jen pár fotek… Ohledně motivace a smyslu doporučuji si přečíst <a href="/c/368/HiFive1%20%E2%80%93%20deska%20s%C2%A0otev%C5%99en%C3%BDm%20%C4%8Dipem%20RISC-V">HiFive1 – deska s otevřeným čipem RISC-V</a>.</p>
+					<p class="obrázek">
+						<a href="/s/1423/IMG_2329.JPG">
+							<img src="/s/1424/nahled_IMG_2329.JPG" alt="OpenPOWER CPU a deska Blackbird" title="OpenPOWER CPU a deska Blackbird"/>
+						</a>
+					</p>
+				</div>
+			</div>
+		</atom:summary>
+	</atom:entry>
+	<atom:entry>
+		<atom:title>Siemens Nixdorf: ComfoDesk (1990)</atom:title>
+		<atom:link href="https://blog.frantovo.cz/c/374/Siemens%20Nixdorf%3A%20ComfoDesk%20%281990%29"/>
+		<atom:id>https://blog.frantovo.cz/c/374/</atom:id>
+		<atom:updated>2019-07-11T20:08:59Z</atom:updated>
+		<atom:published>2019-07-11T17:17:24Z</atom:published>
+		<atom:summary type="xhtml">
+			<div>
+				<div>
+					<p>Objevil jsem doma staré diskety se softwarem ComfoDesk od firmy Siemens Nixdorf. Zajímalo mne, jestli budou ještě fungovat a jestli je ten software tak hrozný jako tehdy (vzpomínky na to nejsou moc dobré – nikdo u nás s tím neuměl pracovat a nakonec jsme museli přeinstalovat systém, abychom se ComfoDesku zbavili). Navíc je na internetu minimum zmínek o tomto softwaru, takže jsem se rozhodl ho alespoň trochu zdokumentovat pro příští generace.</p>
+					<p class="obrázek">
+						<a href="/s/1435/comfodesk-disketa-2-IMG_2461.jpeg">
+							<img src="/s/1436/nahled_comfodesk-disketa-2-IMG_2461.jpeg" alt="Siemens Nixdorf: ComfoDesk – diskety" title="Siemens Nixdorf: ComfoDesk – diskety"/>
+						</a>
+					</p>
+				</div>
+			</div>
+		</atom:summary>
+	</atom:entry>
+	<atom:entry>
+		<atom:title>Opravujeme myš: výměna spínačů</atom:title>
+		<atom:link href="https://blog.frantovo.cz/c/373/Opravujeme%20my%C5%A1%3A%20v%C3%BDm%C4%9Bna%20sp%C3%ADna%C4%8D%C5%AF"/>
+		<atom:id>https://blog.frantovo.cz/c/373/</atom:id>
+		<atom:updated>2019-07-04T22:39:39Z</atom:updated>
+		<atom:published>2019-07-04T18:12:02Z</atom:published>
+		<atom:summary type="xhtml">
+			<div>
+				<div>
+					<p>Po letech dobré služby mne začaly zlobit myši. A protože nerad vyhazuji věci a byly to kvalitní kousky hardwaru, pustil jsem se do opravy.</p>
+					<p class="obrázek">
+						<a href="/s/1388/logitech_trackman_marble_IMG_2189.JPG">
+							<img src="/s/1389/nahled_logitech_trackman_marble_IMG_2189.JPG" alt="myš resp. trackball Logitech TrackMan Marble (T-BC21)" title="myš resp. trackball Logitech TrackMan Marble (T-BC21)"/>
+						</a>
+					</p>
+				</div>
+			</div>
+		</atom:summary>
+	</atom:entry>
+</atom:feed>
--- a/relpipe-data/implementation.xml	Wed Jul 24 14:18:42 2019 +0200
+++ b/relpipe-data/implementation.xml	Thu Jul 25 22:16:12 2019 +0200
@@ -20,6 +20,7 @@
 			relpipe-in-fstab.cpp	executable	input	c++	GNU GPLv3+
 			relpipe-in-recfile.cpp	executable	input	c++	GNU GPLv3+
 			relpipe-in-xml.cpp	executable	input	c++	GNU GPLv3+
+			relpipe-in-xmltable.cpp	executable	input	c++	GNU GPLv3+
 			relpipe-lib-cli.cpp	library	header-only	c++	GNU GPLv3+
 			relpipe-lib-protocol.cpp	library	header-only	c++	GNU LGPLv3+ or GPLv2+
 			relpipe-lib-reader.cpp	library	shared	c++	GNU LGPLv3+ or GPLv2+