relpipe-data/examples-in-xmltable-tr-sql-xhtml-table.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 01:21:22 +0100
branchv_0
changeset 330 70e7eb578cfa
parent 268 1b8576c9640c
permissions -rw-r--r--
Added tag relpipe-v0.18 for changeset 5bc2bb8b7946
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
268
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     1
<stránka
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     4
	
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     5
	<nadpis>Processing data from an XHTML page using XMLTable and SQL</nadpis>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     6
	<perex>reading a web table and compute some statistics</perex>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     7
	<m:pořadí-příkladu>03000</m:pořadí-příkladu>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     8
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     9
	<text xmlns="http://www.w3.org/1999/xhtml">
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    10
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    11
		<p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    12
			Sometimes there are interesting data in a semi-structured form on a website.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    13
			We can read such data and process them as relations using the XMLTable input and e.g. SQL transformation.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    14
			This example shows how to read the list of available Relpipe implementations,
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    15
			filter the commands (executables) and compute statistics, so we can see, how many input filters, output filters and transformations we have:
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    16
		</p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    17
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    18
		<m:pre jazyk="bash" src="examples/xhtml-table-sql-statistics.sh"/>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    19
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    20
		<p>This script will generate a relation:</p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    21
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    22
		<m:pre jazyk="text" src="examples/xhtml-table-sql-statistics.txt"/>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    23
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    24
		<p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    25
			Using these tools we can build e.g. an automatic system which watches a website and notifies us about the changes.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    26
			In SQL, we can use the EXCEPT operation and compare current data with older ones and SELECT only the new or changed records.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    27
		</p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    28
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    29
		<p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    30
			There are also some caveats:
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    31
		</p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    32
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    33
		<p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    34
			What if the table structure changes? 
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    35
			At first, we must say that parsing a web page (which is a presentation form, not designed for machine processing) is always suboptimal and hackish.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    36
			The propper way is to arrange a machine-readable format for data exchange (e.g. XML with well-defined schema).
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    37
			But if we do not have this option and must parse some web page, we can improve it in two ways:
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    38
		</p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    39
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    40
		<ul>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    41
			<li>modify the <code>--records</code> XPath expression so it will select the table with exact number of colums and propper names instead of selecting the first table,</li>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    42
			<li>use XQuery which is much more powerful than XMLTable and can generate even dynamic relations with attributes derived from the content of the XHTML table, so if new columns are added, we will get automatically new attributes.</li>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    43
		</ul>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    44
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    45
		<p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    46
			What if the web page is invalid? Unfortunately, current web is full of invalid and faulty documents that can not be easily parsed.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    47
			In such case, we can pass the stream through the <code>tidy</code> tool which fixes the bugs and then pass it to the <code>relpipe-in-xmltable</code>.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    48
			It is just one additional step in our pipeline.
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    49
		</p>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    50
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    51
		
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    52
	</text>
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    53
1b8576c9640c examples: XHTML table processing in SQL
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    54
</stránka>