relpipe-data/examples-csv-sql-join.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 00:43:11 +0100
branchv_0
changeset 329 5bc2bb8b7946
permissions -rw-r--r--
Release v0.18
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
329
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     1
<stránka
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     4
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     5
	<nadpis>Running SQL JOINs on multiple CSV files</nadpis>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     6
	<perex>query a collection of (not only) CSV files using SQL</perex>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     7
	<m:pořadí-příkladu>05100</m:pořadí-příkladu>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     8
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     9
	<text xmlns="http://www.w3.org/1999/xhtml">
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    10
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    11
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    12
			CSV (<m:a href="4180" typ="rfc">RFC 4180</m:a>) is quite good solution when we want to store or share relational data in a simple text format –
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    13
			both, human-readable and well supported by many existing applications and libraries.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    14
			We have even ready-to-use GUI editors, so called spreadsheets e.g. LibreOffice Calc.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    15
			(on the other hand, such simple formats have usually some drawbacks…)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    16
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    17
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    18
			In this example, we will show how to query a set of CSV files like it was a relational database.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    19
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    20
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    21
		<p>Suppose we have a CSV file describing our network interfaces:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    22
		<m:pre jazyk="text"><![CDATA[address,name
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    23
00:00:00:00:00:00,lo
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    24
00:D0:D8:00:26:00,eth0
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    25
00:01:02:01:33:70,eth1]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    26
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    27
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    28
		<p>and another CSV file with IP addresses assigned to them:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    29
		<m:pre jazyk="text"><![CDATA[address,mask,version,interface
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    30
127.0.0.1,8,4,lo
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    31
::1,128,6,lo
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    32
192.168.1.2,24,4,eth0
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    33
192.168.1.8,24,4,eth0
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    34
10.21.12.24,24,4,eth0
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    35
75.748.86.91,95,4,eth1
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    36
23.75.345.200,95,4,eth1
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    37
2a01:430:2e::cafe:babe,64,6,eth1]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    38
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    39
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    40
		<h2>Loading a CSV file and running basic queries</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    41
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    42
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    43
			Simplest task is to parse the file and print it as a table in our terminal or convert it to another format (XML, Recfile, ODS, YAML, XHTML, ASN.1 etc.)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    44
			We can also add <code>relpipe-tr-sql</code> in the middle of our pipeline and run some SQL queries –
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    45
			transform data on-the-fly and send the query result to the <code>relpipe-out-tabular</code> (or other output filter) in place of the original data.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    46
			For now, we will filter just the IPv6 addresses:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    47
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    48
		<m:pre jazyk="bash"><![CDATA[cat ip.csv \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    49
	| relpipe-in-csv --relation 'ip' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    50
	| relpipe-tr-sql \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    51
		--relation 'ipv6' "SELECT * FROM ip WHERE version = 6" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    52
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    53
		<p>and get them printed:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    54
		<m:pre jazyk="text"><![CDATA[ipv6:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    55
 ╭────────────────────────┬───────────────┬──────────────────┬────────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    56
 │ address       (string) │ mask (string) │ version (string) │ interface (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    57
 ├────────────────────────┼───────────────┼──────────────────┼────────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    58
 │ ::1                    │ 128           │ 6                │ lo                 │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    59
 │ 2a01:430:2e::cafe:babe │ 64            │ 6                │ eth1               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    60
 ╰────────────────────────┴───────────────┴──────────────────┴────────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    61
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    62
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    63
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    64
			It is alo possible to run several queries at once
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    65
			and thanks to the <m:name/> format, the result sets are not mixed together, their boundaries are retained and everything is safely passed to the next stage of the pipeline:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    66
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    67
		<m:pre jazyk="bash"><![CDATA[cat ip.csv \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    68
	| relpipe-in-csv --relation 'ip' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    69
	| relpipe-tr-sql \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    70
		--relation 'ipv4' "SELECT * FROM ip WHERE version = 4" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    71
		--relation 'ipv6' "SELECT * FROM ip WHERE version = 6" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    72
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    73
		<p>resulting in two nice tables:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    74
		<m:pre jazyk="text"><![CDATA[ipv4:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    75
 ╭──────────────────┬───────────────┬──────────────────┬────────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    76
 │ address (string) │ mask (string) │ version (string) │ interface (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    77
 ├──────────────────┼───────────────┼──────────────────┼────────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    78
 │ 127.0.0.1        │ 8             │ 4                │ lo                 │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    79
 │ 192.168.1.2      │ 24            │ 4                │ eth0               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    80
 │ 192.168.1.8      │ 24            │ 4                │ eth0               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    81
 │ 10.21.12.24      │ 24            │ 4                │ eth0               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    82
 │ 75.748.86.91     │ 95            │ 4                │ eth1               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    83
 │ 23.75.345.200    │ 95            │ 4                │ eth1               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    84
 ╰──────────────────┴───────────────┴──────────────────┴────────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    85
Record count: 6
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    86
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    87
ipv6:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    88
 ╭────────────────────────┬───────────────┬──────────────────┬────────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    89
 │ address       (string) │ mask (string) │ version (string) │ interface (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    90
 ├────────────────────────┼───────────────┼──────────────────┼────────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    91
 │ ::1                    │ 128           │ 6                │ lo                 │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    92
 │ 2a01:430:2e::cafe:babe │ 64            │ 6                │ eth1               │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    93
 ╰────────────────────────┴───────────────┴──────────────────┴────────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    94
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    95
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    96
		<h2>Using parametrized queries to avoid SQL injection</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    97
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    98
			When <code>"4"</code> and <code>"6"</code> are not fixed values, we should not glue them to the query string like <code>version = $version</code>,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    99
			because it is a dangerous practice that may lead to SQL injection.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   100
			We have parametrized queries for such tasks:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   101
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   102
		<m:pre jazyk="bash"><![CDATA[--relation 'ipv6' "SELECT * FROM ip WHERE version = ?" --parameter "6"]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   103
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   104
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   105
		<h2>Running SQL JOINs, UNIONs etc. on multiple CSV files</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   106
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   107
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   108
			To load multiple CSV files into our <i>in-memory database</i>, we just concatenate the relational streams
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   109
			using the means of our shell – the semicolons and parenthesis:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   110
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   111
		<m:pre jazyk="bash"><![CDATA[(relpipe-in-csv --relation 'ip' < ip.csv; relpipe-in-csv --relation 'nic' < nic.csv) \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   112
	| relpipe-tr-sql \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   113
		--relation 'ip_nic' "SELECT * FROM ip JOIN nic ON nic.name = ip.interface" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   114
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   115
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   116
		<p>Generic version that loads all <code>*.csv</code> files:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   117
		<m:pre jazyk="bash"><![CDATA[for csv in *.csv; do relpipe-in-csv --relation "$(basename "$csv" .csv)" < "$csv"; done \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   118
	| relpipe-tr-sql \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   119
		--relation 'ip_nic' "SELECT * FROM ip JOIN nic ON nic.name = ip.interface" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   120
	| relpipe-out-tabular]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   121
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   122
		<p>Then we can JOIN data from multiple CSV files or do UNIONs, INTERSECTions etc.</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   123
		<m:pre jazyk="text"><![CDATA[ip_nic:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   124
 ╭────────────────────────┬───────────────┬──────────────────┬────────────────────┬───────────────────┬───────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   125
 │ address       (string) │ mask (string) │ version (string) │ interface (string) │ address  (string) │ name (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   126
 ├────────────────────────┼───────────────┼──────────────────┼────────────────────┼───────────────────┼───────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   127
 │ 127.0.0.1              │ 8             │ 4                │ lo                 │ 00:00:00:00:00:00 │ lo            │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   128
 │ ::1                    │ 128           │ 6                │ lo                 │ 00:00:00:00:00:00 │ lo            │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   129
 │ 192.168.1.2            │ 24            │ 4                │ eth0               │ 00:D0:D8:00:26:00 │ eth0          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   130
 │ 192.168.1.8            │ 24            │ 4                │ eth0               │ 00:D0:D8:00:26:00 │ eth0          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   131
 │ 10.21.12.24            │ 24            │ 4                │ eth0               │ 00:D0:D8:00:26:00 │ eth0          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   132
 │ 75.748.86.91           │ 95            │ 4                │ eth1               │ 00:01:02:01:33:70 │ eth1          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   133
 │ 23.75.345.200          │ 95            │ 4                │ eth1               │ 00:01:02:01:33:70 │ eth1          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   134
 │ 2a01:430:2e::cafe:babe │ 64            │ 6                │ eth1               │ 00:01:02:01:33:70 │ eth1          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   135
 ╰────────────────────────┴───────────────┴──────────────────┴────────────────────┴───────────────────┴───────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   136
Record count: 8]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   137
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   138
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   139
		<h2>Leveraging shell functions</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   140
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   141
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   142
			Good practice is to wrap common code blocks into functions and thus make them reusable.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   143
			In shell, the function still works with input and output streams and we can use them when building our pipelines.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   144
			Shell functions can be seen as named reusable parts of a pipeline.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   145
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   146
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   147
		<m:pre jazyk="bash"><![CDATA[csv2relation()  { for file; do relpipe-in-csv --relation "$(basename "$file" .csv)" < "$file"; done }
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   148
do_query()      { relpipe-tr-sql --relation 'ip_nic' "SELECT * FROM ip JOIN nic ON nic.name = ip.interface"; }
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   149
format_result() { [[ -t 1 ]] && relpipe-out-tabular || cat; }
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   150
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   151
csv2relation *.csv | do_query | format_result]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   152
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   153
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   154
			The <code>format_result()</code> function checks whether the STDOUT is a terminal or not.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   155
			and when printing to the terminal, it generates a table.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   156
			When writing to a regular file or STDIN of another process, it passes through original relational data.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   157
			Thus <code>./our-script.sh</code> will print a nice table in the terminal, while <code>./our-script.sh > data.rp</code> will create a file containing machine-readable data
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   158
			and <code>./our-script.sh | relpipe-out-xhtml > report.xhtml</code> will create an XHTML report and <code>./our-script.sh | relpipe-out-gui</code> will show a GUI window full of tables and maybe also charts.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   159
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   160
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   161
		<m:img src="img/csv-sql-gui-ip-address-counts.png"/>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   162
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   163
		<m:pre jazyk="sql"><![CDATA[SELECT
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   164
	nic.name || ' IPv' || ip.version AS label,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   165
	nic.name AS interface,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   166
	ip.version AS ip_version,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   167
	count(*) AS address_count
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   168
FROM nic 
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   169
	LEFT JOIN ip ON (ip.interface = nic.name)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   170
GROUP BY nic.name, ip.version
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   171
ORDER BY count(*) DESC]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   172
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   173
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   174
		<h2>Makefile version</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   175
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   176
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   177
			Shell scripts are not the only way to structure and organize our pipelines or generally our data-processing code.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   178
			We can also use Make (the tool intended mainly for building sofware), write a <i>Makefile</i> and organize our code around some temporary files and other targets instead of functions.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   179
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   180
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   181
		<m:pre jazyk="Makefile"><![CDATA[all: print_summary
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   182
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   183
.PHONY: clean print_summary run_commands
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   184
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   185
clean:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   186
	rm -rf *.rp
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   187
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   188
%.rp: %.csv
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   189
	relpipe-in-csv --relation "$(basename $(<))" < $(<) > $(@)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   190
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   191
define SQL_IP_NIC
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   192
	SELECT
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   193
		ip.address AS ip_address,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   194
		nic.name AS interface,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   195
		nic.address AS mac_address
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   196
	FROM ip
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   197
		JOIN nic ON (nic.name = ip.interface)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   198
endef
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   199
export SQL_IP_NIC
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   200
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   201
define SQL_COUNT_VERSIONS
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   202
	SELECT
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   203
		interface,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   204
		count(CASE WHEN version=4 THEN 1 ELSE NULL END) AS ipv4_count,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   205
		count(CASE WHEN version=6 THEN 1 ELSE NULL END) AS ipv6_count
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   206
	FROM ip
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   207
	GROUP BY interface
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   208
	ORDER BY interface
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   209
endef
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   210
export SQL_COUNT_VERSIONS
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   211
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   212
# Longer SQL queries are better kept in separate .sql files,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   213
# because we can enjoy syntax highlighting and other support in our editors.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   214
# Then we use it like this: --relation "ip_nic" "$$(cat ip_nic.sql)"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   215
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   216
summary.rp: nic.rp ip.rp
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   217
	cat $(^) \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   218
		| relpipe-tr-sql \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   219
			--relation "ip_nic" "$$SQL_IP_NIC" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   220
			--relation "counts" "$$SQL_COUNT_VERSIONS" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   221
		> $(@)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   222
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   223
print_summary: summary.rp
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   224
	cat $(<) | relpipe-out-tabular
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   225
]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   226
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   227
	<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   228
		We can even combine advantages of Make and Bash together (without calling or including Bash scripts from Make)
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   229
		and have reusable shell functions available in the Makefile:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   230
	</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   231
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   232
<m:pre jazyk="text"><![CDATA[
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   233
SHELL=bash
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   234
BASH_FUNC_read_nullbyte%%=() { local IFS=; for v in "$$@"; do export "$$v"; read -r -d '' "$$v"; done }
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   235
export BASH_FUNC_read_nullbyte%%]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   236
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   237
	<p>usage example:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   238
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   239
<m:pre jazyk="Makefile"><![CDATA[
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   240
run_commands: summary.rp
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   241
	cat $(<) \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   242
		| relpipe-tr-cut --relation 'ip_nic' --invert-match relation true \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   243
		| relpipe-out-nullbyte \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   244
		| while read_nullbyte ip_address interface mac_address; do\
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   245
			echo "network interface $$interface ($$mac_address) has IP address $$ip_address"; \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   246
		done;
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   247
]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   248
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   249
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   250
			Both approaches – the shell script and the Makefile – have pros and cons.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   251
			With Makefile, we usually create some temporary files containing intermediate results.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   252
			That avoids streaming. But on the other hand, we process (parse, transform, filter, format etc.) only data that have been changed.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   253
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   254
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   255
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   256
	</text>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   257
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   258
</stránka>