relpipe/relpipe-web: relpipe-data/faq.xml@2fc56dd7f003


<stránka
	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
	
	<nadpis>FAQ</nadpis>
	<perex>Frequently asked questions</perex>
	<pořadí>16</pořadí>

	<text xmlns="http://www.w3.org/1999/xhtml">
		
		<p>
			<strong>When the stable version will be released?</strong>		
			<br/>
			We don't know – there is no exact date.
			<m:name/> are something that should be released about twenty years ago. But real work started in 2018.
			So it is not a big difference whether it will be released this month or the next one.
			We understand the <em>release early, release often</em> rule.
			But it fits better to application software than to standards and APIs.
			Of course, we expect some evolution after the v1.0.0 release, but we need to stabilize and verify many things before the release in order to be able to maintain backward compatibility in future.
		</p>
		
		<p>
			<strong>How can I help you?</strong>		
			<br/>
			<ul>
				<li>Suggest more examples how <m:name/> can be used; especially how YOU would like to use it.</li>
				<li>We are looking for illustrations that would supplement our documentation and website.</li>
				<li>
					As an author of a program that generates or consumes some data, you could add relational input and output to your program. 
					But please mention that we do not have v1.0 yet, so these features should be marked as experimental. 
					The API might/will change. 
					Other (and maybe better for now) option is to add input/output of values separated by null byte (<code>\0</code>).
					This "API" will be supported for sure and data are simply the attribute values. There are no record separators (we know the number of attributes, so they are not needed).
					Disadvantage of this approach is that the stream can contain only a single relation; and that the metadata are not embedded in the stream and must be passed separately.
				</li>
				<li>Review our source code and suggest improvements and fixes. Constructive criticism is always welcome. This is one of reasons why we publish our programs as free software.</li>
				<li>Native speakers could suggest improvements and corrections of our English texts.</li>
			</ul>
		</p>
		
		<p>
			<strong>Why do you speak about <em>relations</em> instead of <em>tables</em>?</strong>
			<br/>
			It might be uncommon terminology for someone, but <em>relations</em> and <em>attributes</em> symbolizes
			that we focus on substance of the data. Pure data are conveyed through the pipelines 
			and the presentation of such data is only the last step.
			The data might be presented/visualized in many various forms.
			And tables (consisting of rows and columns) are only one of many possible options.
		</p>
		
		<m:tabulka>
			Relational	SQL	alternative terms
			relation	table
			attribute	column	field
			record	row	tuple
		</m:tabulka>
		
		<p>
			<strong>What about duplicate records?</strong>
			<br/>
			In the relational model, the records must be unique.
			In <m:name/> there is no central authority that would prevent you from appending duplicate records to the relational stream.
			It means that in some points in the relational pipeline there might occur data that do not fit the rules of the relational model.
			The deduplication is generally not done on the output side of particular steps, but is postponed and done on the input side of steps, where uniqueness is important (e.g. JOIN or UNION).
			You should not put duplicate records in the relational stream, but you can.
			Duplicates can also occur after some transformations like <code>relpipe-tr-cut</code> (e.g. if you choose only <code>dump</code> or <code>type</code> attributes from your <code>fstab</code> and omit the primary/unique key field).
			Such data are not considered invalid, but should be processed like there are no duplicates (if uniqueness is important for particular step)
			or should be passed through if it is not in conflict with the goal of given step (e.g. calling <code>uppercase()</code> function on some field or doing UNION ALL).
			Each tool must document how it handles duplicate records.
		</p>
		
		<p>
			The reasons for this <em>transient tolerance of duplicate records</em> are two.
			1) Performance: guaranteeing the uniqueness in every moment would negate streaming and would require holding whole relation in memory and always sorting the records.
			2) Modularity: many tasks would have to be done by a single bulky tool that does everything e.g. if you want to cut only the <code>type</code> field from your <code>fstab</code> and then count statistics how many times particular filesystems are used.
		</p>
		
		<!--
		<p>
			<strong>?</strong>		
			<br/>
			...
		</p>
		
		<p>
			<strong>Why don't build on XML? It is a standard since 1998 and there are many tools and libraries for it.</strong>		
			<br/>
			XML is a great and mature (meta)format and its ecosystem is respectable and inspiring.
			But the XML does not conform to our <m:a href="principles">principles</m:a>, especially the ability to concatenate multiple files/streams and to append new records to an already existing relation.
			XML is also not concise. 
			And the implementation of the XML parser in various environments would be <em>a bit more complex</em>.
		</p>
		<p>
			We prefer XML as an input and output format and look forward to cooperation with XML ecosystem (XSD, XPath, XSLT, XQuery etc.).
			Such steps might be at the beginning, at the end, or even in the middle of the relational pipeline.
		</p>
		
		<p>
			<strong>?</strong>		
			<br/>
			...
		</p>
		-->
		
		<p>
			<strong>Why C++?</strong>		
			<br/>
			Firstly, <m:name/> are a specification of a data format and as such are not bound to any programming language.
			This specification is totally language- and platform- independent.
		</p>
		<p>
			The ideal/perfect language does not exist and our implementations will be written in various languages.
			We started our prototype and first real implementations in C++ for several reasons:
		</p>
		<ul>
			<li>It is mature and widespread: GCC runs almost everywhere and other compilers/toolchains are also available.</li>
			<li>Programs written in C++ starts immediately: very important for CLI tools.</li>
			<li>Can be seamlessly mixed with C and its libraries. Is good for interaction with the operating system.</li>
			<li>Modern C++ is a quite good language.</li>
			<li>We are not C++ gurus and C++ is not our first-choice language i.e. the fact that we are able to do implementation in C++ proves that the specification is simple enough to be reasonably implemented by an average software engineer in any other language :-)</li>
		</ul>
		
		<p>Implementation in other languages will follow. Java is the next one. Then probably Perl, Python, Rust, Go, PHP etc. (depends on community involvement).</p>
		
		<p>			
			<strong>Have you seen <a href="https://xkcd.com/927/">XKCD 927</a>?</strong>		
			<br/>
			Yes. And we liked it so much that we followed their instructions and created <m:name/>.
		</p>
			
		<p>
			<strong>Are <m:name/> compatible with cloud, IoT, SPA/PWA, AI, blockchain and mobile-first? Should our DevOps use it in our serverless hipster fintech app with strong focus on SEO, UX and machine learning?</strong>
			<br/>
			Go @#$%&amp; yourself. We are pretty old school hackers and we enjoy our green screen terminals!<br/>
			Of course, you can use <m:name/> anywhere if it makes sense for you.
			<m:name/> are designed to be generic enough – i.e. not specific to any industry (banking, telecommunications, embedded etc.) nor platform.
			Data in this format are very concise, so can be used even in very small devices.
			Its native data structure is a relation (table) but it can also handle tree-structured data (i.e. any data).
			It is designed rather for streaming than for storage (but under some circumstances it is also meaningful to use it for storage).
		</p>
		
		<p>
			<strong>What about your hobbies?</strong>
			<br/>
			It is a bit personal question, but I can unveil that I collect signed photos of Ally Sheedy, Winona Ryder and Richard Stallman.
		</p>
		
	</text>

</stránka>
author	František Kučera <franta-hg@frantovo.cz>
	Fri, 11 Jan 2019 23:08:47 +0100
branch	v_0
changeset 234	2fc56dd7f003
parent 226	fc68cd31db78
child 275	1cdb74e845d0
permissions	-rw-r--r--