relpipe-data/index.xml
author František Kučera <franta-hg@frantovo.cz>
Sun, 25 Nov 2018 19:58:06 +0100
branchv_0
changeset 144 ee7e96151673
parent 143 297da74fcab2
child 145 42bbbccd87f3
permissions -rw-r--r--
classic pipeline example

<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>Relational pipes</nadpis>
	<perex>Official homepage of Relational pipes.</perex>
	<pořadí>10</pořadí>

	<text xmlns="http://www.w3.org/1999/xhtml">
		<p>
			One of the great parts of the <m:unix/>
			<m:podČarou> 
				<m:unix tvar="vysvětlivka"/> 
			</m:podČarou> culture is the invention<m:podČarou>which is attributed to Doug McIlroy, see <a href="http://www.catb.org/~esr/writings/taoup/html/ch07s02.html#plumbing">The Art of Unix Programming: Pipes, Redirection, and Filters</a></m:podČarou>
			of <em>pipes</em> and the idea<m:podČarou>see <a href="http://www.catb.org/~esr/writings/taoup/html/ch01s06.html">The Art of Unix Programming: Basics of the Unix Philosophy</a></m:podČarou> 
			that <em>one program should do one thing and do it well</em>.
		</p>
		
		<p>
			Each running program (process) has one input stream (called standard input or STDIN) and one output stream (called standard output or STDOUT) and also one additional output stream for logging/errors/warnings (STDERR).
			We can connect programs and pass the STDOUT of first one to the STDIN of the second one (etc.) using pipes.
		</p>
		
		<p>
			A classic pipeline example (<m:a href="classic-example">explained</m:a>):
		</p>
		
		<m:classic-example/>

		<!--		
		<m:diagram orientace="vodorovně">
			node[shape=box];
			
			cat  [label="cat /etc/fstab"];
			dd   [];
			grep [label="grep tmpfs"];
			log  [label="/tmp/dd.log"];
			
			cat -> dd  [label="STDOUT → STDIN"];
			dd -> grep [label="STDOUT → STDIN"];
			dd -> log  [label="STDERR → file"];
		</m:diagram>
		-->
		
		<p>
			According to this principle we can build complex and powerful programs (pipelines) by composing several simple, single-purpose and reusable programs.
			Such single-purpose programs (often called <em>filters</em>) are much easier to create, test and optimize and their authors don't have to bother about the complexity of the final pipeline.
			They even don't have to know, how their programs will be used in the future by others.
			This is a great design principle that brings us advanced flexibility, reusability, efficiency and reliability.
			Being in any role (author of a filter, builder of a pipeline etc.), we can always focus on our task only and do it well.
			And we can collaborate with others even if we don't know about them and we don't know that we are collaborating.
			Now think about putting this together with the free software ideas...  How very!
		</p>
		
		<!--
		<m:diagram orientace="vodorovně">
			compound=true;
			node[shape=box];
			
			subgraph cluster_in {
			label = "Inputs:";
			cli;
			fstab;
			}
			
			subgraph cluster_tr {
			label = "Transformations:";
			grep;
			sed;
			}
			
			subgraph cluster_out {
			label = "Outputs:";
			xml;
			tabular;
			gui;
			}
			
			cli -> grep  [ltail=cluster_in, lhead=cluster_tr];
			grep -> xml [ltail=cluster_tr, lhead=cluster_out];
			// cli -> xml [ltail=cluster_in, lhead=cluster_out];
			
		</m:diagram>
		-->
		
		
		<p>Bytes, text, structured data? XML, YAML, JSON, ASN.1</p>
		
		<p>Rules:</p>
		
		<ul>
			<li>a stream contains zero or more relations</li>
			<li>a relation has a name</li>
			<li>a relation has one or more attributes</li>
			<li>a relation contains zero or more records</li>
		</ul>
		
		
		<h2>What <m:name/> are?</h2>
		
		<p>
			<m:name/> are an open <em>data format</em> designed for streaming structured data between two processes. 
			Simultaneously with the format specification, we are also developing a <em>reference implementation</em> (libraries and tools) as a free software.
			Although we believe in the specification-first (or contract-first) approach, we always look and check, whether the theoretic concepts are feasible and whether they can be reasonably and reliably implemented.
			So befeore publishing any new specification or its version, we will verify it by creating a reference implementation at least in one programming language.
		</p>
		<p>
			More generally, <m:name/> are a philosophical continuation of the classic <m:unix/> pipelines and the relational model.
		</p>
		
		
		<h2>What <m:name/> are not?</h2>
			
		<p>
			<m:name/> respect the existing ecosystem and are rather an improvement or supplement than a replacement.
			So <m:name/> are not a:
		</p>
		
		<ul>
			<li>Shell – we use existing shells (e.g. GNU Bash), work with any shell and even without a shell (e.g. as a stream format passed through a network or stored in a file).</li>
			<li>Terminal emulator – same as with shells, we use existing terminals and we can use <m:name/> also outside any terminal; if we interact with the terminal, we use standard means as Unicode, ANSI escape sequences etc.</li>
			<li>IDE – we can use standard <m:unix/> tools as an IDE (GNU Screen, Make etc.) or any other IDE.</li>
			<li>Programming language – <m:name/> are language-independent data format and can be produced or consumed in any programming language.</li>
			<li>Query language – although some of our tools are doing queries, filtering or transformations, we are not inventing a new query language – instead, we use existing languages like SQL, XPath or regular expressions.</li>
			<!--<li>Text editor – </li>-->
			<li>Database system, DBMS – we focus on the stream processing rather than data storage. Although sometimes it makes sense to redirect data to a file and continue with the processing later.</li>
		</ul>
		
		
		<h2>Project status</h2>
		
		<p>
			The main ideas and the roadmap are quite clear, but many things will change (including the format internals and interfaces of the libraries and tools).
			Because we know how important the API and ABI stability is, we are not ready to publish the version 1.0 yet.
		</p>
		<p>
			On the other hand, the already published tools (tagged as v0.x in v_0 branch) should work quite well (should compile, should run, should not segfault often, should not wipe your hard drive or kill your cat),
			so they might be useful for someone who likes our ideas and who is prepared to update own programs and scripts when the new version is ready.
		</p>

		
	</text>

</stránka>