relpipe-data/examples-csv-data-types.xml
author František Kučera <franta-hg@frantovo.cz>
Mon, 21 Feb 2022 01:21:22 +0100
branchv_0
changeset 330 70e7eb578cfa
parent 329 5bc2bb8b7946
permissions -rw-r--r--
Added tag relpipe-v0.18 for changeset 5bc2bb8b7946
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
329
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     1
<stránka
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     2
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     3
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     4
	
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     5
	<nadpis>CSV and data types</nadpis>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     6
	<perex>declare or recognize integers and booleans in a typeless format</perex>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     7
	<m:pořadí-příkladu>04800</m:pořadí-příkladu>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     8
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
     9
	<text xmlns="http://www.w3.org/1999/xhtml">
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    10
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    11
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    12
			CSV (<m:a href="4180" typ="rfc">RFC 4180</m:a>) is quite good solution when we want to store or share relational data in a simple text format –
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    13
			both, human-readable and well supported by many existing applications and libraries.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    14
			We have even ready-to-use GUI editors, so called spreadsheets (e.g. LibreOffice Calc).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    15
			However, such simple formats have usually some drawbacks.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    16
			CSV may contain only a single relation (<i>table</i>, <i>sheet</i>). This is not a big issue – we can use several files.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    17
			A more serious problem is the absence of data types – in CSV, everything is just a text string.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    18
			Thus it was impossible to have loss-less conversion to CSV and back.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    19
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    20
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    21
		<m:pre jazyk="text"><![CDATA[$ find license/ -print0 | relpipe-in-filesystem | relpipe-out-tabular
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    22
filesystem:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    23
 ╭─────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    24
 │ path   (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    25
 ├─────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    26
 │ license/        │ d             │              0 │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    27
 │ license/gpl.txt │ f             │          35147 │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    28
 ╰─────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    29
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    30
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    31
		<p>Data types are missing in CSV by default:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    32
		<m:pre jazyk="text"><![CDATA[$ find license/ -print0 | relpipe-in-filesystem | relpipe-out-csv 
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    33
"path","type","size","owner","group"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    34
"license/","d","0","hacker","hacker"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    35
"license/gpl.txt","f","35147","hacker","hacker"]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    36
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    37
		<p>The <code>size</code> attribute was integer and now it is mere string:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    38
		<m:pre jazyk="text"><![CDATA[$ find license/ -print0 | relpipe-in-filesystem | relpipe-out-csv | relpipe-in-csv | relpipe-out-tabular 
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    39
csv:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    40
 ╭─────────────────┬───────────────┬───────────────┬────────────────┬────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    41
 │ path   (string) │ type (string) │ size (string) │ owner (string) │ group (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    42
 ├─────────────────┼───────────────┼───────────────┼────────────────┼────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    43
 │ license/        │ d             │ 0             │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    44
 │ license/gpl.txt │ f             │ 35147         │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    45
 ╰─────────────────┴───────────────┴───────────────┴────────────────┴────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    46
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    47
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    48
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    49
		<h2>Declare data types in the CSV header</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    50
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    51
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    52
			Since <m:name/> <m:a href="release-v0.18">v0.18</m:a> we can encode the data types (currently strings, integers and booleans) in the CSV header and then recover them while reading.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    53
			Such „CSV with data types“ is valid CSV according to the RFC specification and can be viewed or edited in any CSV-capable software.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    54
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    55
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    56
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    57
			The attribute name and data type are separated by the <code>::</code> symbol e.g. <code>name::string,age::integer,member::boolean</code>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    58
			Attribute names may contain <code>::</code> (unlike the data type names).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    59
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    60
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    61
		<p>The data type declarations may be added simply by hand or automatically using <code>relpipe-out-csv</code>.</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    62
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    63
		<m:pre jazyk="text"><![CDATA[$ find license/ -print0 | relpipe-in-filesystem | relpipe-out-csv --write-types true 
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    64
"path::string","type::string","size::integer","owner::string","group::string"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    65
"license/","d","0","hacker","hacker"
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    66
"license/gpl.txt","f","35147","hacker","hacker"]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    67
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    68
		<p>The <code>relpipe-out-csv</code> + <code>relpipe-in-csv</code> round-trip now does not degrade the data quality:</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    69
		<m:pre jazyk="text"><![CDATA[$ find license/ -print0 | relpipe-in-filesystem | relpipe-out-csv --write-types true | relpipe-in-csv | relpipe-out-tabular 
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    70
csv:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    71
 ╭─────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    72
 │ path   (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    73
 ├─────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    74
 │ license/        │ d             │              0 │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    75
 │ license/gpl.txt │ f             │          35147 │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    76
 ╰─────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    77
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    78
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    79
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    80
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    81
			So we can put e.g. a CSV editor between them while storing and versioning the data in a different format (like XML or Recfile).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    82
			Such workflow can be effectively managed by <code>make</code> –
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    83
			<code>make edit</code> will convert versioned data to CSV and launch the editor,
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    84
			<code>make commit</code> will convert data back from the CSV and commit them in Mercurial, Git or other version control system (VCS).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    85
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    86
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    87
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    88
			Why put into VCS data in different format than CSV?
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    89
			Formats like XML or Recfile may have each attribute on a separate line which leads to more readable diffs.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    90
			At a glance we can see which attributes have been changed.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    91
			While in CSV we see just a changed long line and even with a better tools we need to count the comas to know which attribute it was.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    92
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    93
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    94
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    95
			The <code>relpipe-out-csv</code> tool generates data types only when explicitly asked for: <code>--write-types true</code>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    96
			The <code>relpipe-in-csv</code> tool automatically looks for these type declarations
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    97
			and if all attributes have valid type declarations, they are used, otherwise they are considered to be a part of the attribute name.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    98
			This behavior can be disabled by <code>--read-types false</code> (<code>true</code> will require valid type declarations).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
    99
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   100
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   101
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   102
		<h2>Recognize data types using relpipe-tr-infertypes</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   103
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   104
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   105
			Sometimes we may also want to infer data types from the values automatically without any explicit declaration.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   106
			Then we put the <code>relpipe-tr-infertypes</code> tool in our pipeline.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   107
			It buffers whole relations and checks all values of each attribute.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   108
			If they contain all integers or all booleans they are converted to given type.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   109
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   110
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   111
		<m:pre jazyk="text"><![CDATA[$ find license/ -print0 | relpipe-in-filesystem | relpipe-out-csv | relpipe-in-csv | relpipe-tr-infertypes | relpipe-out-tabular
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   112
csv:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   113
 ╭─────────────────┬───────────────┬────────────────┬────────────────┬────────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   114
 │ path   (string) │ type (string) │ size (integer) │ owner (string) │ group (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   115
 ├─────────────────┼───────────────┼────────────────┼────────────────┼────────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   116
 │ license/        │ d             │              0 │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   117
 │ license/gpl.txt │ f             │          35147 │ hacker         │ hacker         │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   118
 ╰─────────────────┴───────────────┴────────────────┴────────────────┴────────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   119
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   120
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   121
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   122
			This approach is inefficient and contradicts streaming, however it is sometimes useful and convenient for small data coming from external sources.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   123
			We can e.g. download some data set from network and pipe it through <code>relpipe-in-csv</code> + <code>relpipe-tr-infertypes</code> and improve the data quality a bit.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   124
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   125
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   126
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   127
			We may apply the type inference only on certain relations: <code>--relation "my_relation"</code>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   128
			or chose different mode: <code>--mode data</code> or <code>metadata</code> or <code>auto</code>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   129
			The <code>data</code> mode is described above.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   130
			In the <code>metadata</code> mode the <code>relpipe-tr-infertypes</code> works similar to <code>relpipe-in-csv --read-types true</code>.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   131
			The <code>auto</code> mode checks for the metadata in attribute names first and if not found, it fallbacks to the <code>data</code> mode.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   132
			This tool works with any relational data regardless their original format or source (not only with CSV).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   133
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   134
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   135
				
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   136
		<h2>No header? Specify types as CLI parameters</h2>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   137
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   138
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   139
			Some CSV files contain just data – have no header line containing the column names.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   140
			Then we specify the attribute names and data types as CLI parameters of <code>relpipe-in-csv</code>:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   141
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   142
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   143
		<m:pre jazyk="text"><![CDATA[$ echo -e "a,b,c\nA,B,C" \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   144
	| relpipe-in-csv \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   145
		--relation 'just_data' \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   146
			--attribute 'x' string \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   147
			--attribute 'y' string \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   148
			--attribute 'z' string \
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   149
	| relpipe-out-tabular
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   150
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   151
just_data:
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   152
 ╭────────────┬────────────┬────────────╮
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   153
 │ x (string) │ y (string) │ z (string) │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   154
 ├────────────┼────────────┼────────────┤
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   155
 │ a          │ b          │ c          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   156
 │ A          │ B          │ C          │
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   157
 ╰────────────┴────────────┴────────────╯
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   158
Record count: 2]]></m:pre>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   159
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   160
		<p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   161
			We may also skip existing header line: <code>tail -n +2</code> and force our own names and types.
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   162
			However this will not work if there are multiline values in the header – which is not common – 
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   163
			in such cases we should use some <code>relpipe-tr-*</code> tool to rewrite the names or types
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   164
			(these tools work with relational data instead of plain text).
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   165
		</p>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   166
		
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   167
	</text>
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   168
5bc2bb8b7946 Release v0.18
František Kučera <franta-hg@frantovo.cz>
parents:
diff changeset
   169
</stránka>