relpipe-data/examples-tr-sql-odbc.xml
branchv_0
changeset 297 192b0059a6c4
child 300 b9bd0f06b4a1
equal deleted inserted replaced
296:418e11eb6fea 297:192b0059a6c4
       
     1 <stránka
       
     2 	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
       
     3 	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
       
     4 	
       
     5 	<nadpis>Accessing SQLite, PostgreSQL and MySQL through ODBC</nadpis>
       
     6 	<perex>use various DBMS for SQL transformations or data access</perex>
       
     7 	<m:pořadí-příkladu>04200</m:pořadí-příkladu>
       
     8 
       
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
       
    10 		
       
    11 		<p>
       
    12 			Since <m:a href="release-v0.16">v0.16</m:a> the <code>relpipe-tr-sql</code> module
       
    13 			uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system).
       
    14 			Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration.
       
    15 		</p>
       
    16 		
       
    17 		<blockquote>
       
    18 			<p>
       
    19 				ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS.
       
    20 				In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG)
       
    21 				and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s.
       
    22 				ODBC is available on many operating systems and there are at least two free software implementations:
       
    23 				<a href="http://www.unixodbc.org/">unixODBC</a> and <a href="http://www.iodbc.org/">iODBC</a>.
       
    24 			</p>
       
    25 		</blockquote>
       
    26 		
       
    27 		<p>For more information see the <m:a href="release-v0.16">v0.16 release notes</m:a>.</p>
       
    28 		
       
    29 		<h2>General concepts and configuration</h2>
       
    30 		
       
    31 		<p>
       
    32 			<strong>ODBC</strong>:
       
    33 			the API consisting of C functions; see the files <code>sql.h</code> and <code>sqlext.h</code> e.g. in unixODBC.
       
    34 		</p>
       
    35 		<p>
       
    36 			<strong>Database driver</strong>:
       
    37 			a shared library (an <code>.so</code> file) 
       
    38 			that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.);
       
    39 			is usually provided by the authors of given DBMS, sometimes writen by a third-party
       
    40 		</p>
       
    41 		<p>
       
    42 			<strong>Client</strong>:
       
    43 			a program that calls the API in order to access a database; our <code>relpipe-tr-sql</code> is a client
       
    44 		</p>
       
    45 		<p>
       
    46 			<strong>Data Source Name (DSN)</strong>:
       
    47 			the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters
       
    48 			(like server name, user name, password etc.) can be then looked-up in the configuration
       
    49 		</p>
       
    50 		<p>
       
    51 			<strong>Connection string</strong>:
       
    52 			a text string consisting of serialized parameters needed for connecting
       
    53 			– we can specify all parameters ad-hoc in the connection string without creating any permanent configuration;
       
    54 			a connection string can also refer to a DSN and add or override some parameters
       
    55 		</p>
       
    56 
       
    57 		<p>
       
    58 			There is some global configuration in the <code>/etc</code> directory.
       
    59 			In <code>/etc/odbcinst.ini</code> we can a find list of ODBC drivers.
       
    60 			Thanks to it, we can refer to a driver by its name (e.g. <code>SQLite3</code>)
       
    61 			instead of the path to the shared library (e.g. <code>/usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so</code>).
       
    62 			In <code>/etc/odbc.ini</code> we can find a list of global (for given computer) data sources.
       
    63 			It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords,
       
    64 			but we can provide here just a <i>template</i> with public parameters like server name, port etc.
       
    65 			and user will supply his own user name and password in the connection string or in his personal configuration file.
       
    66 		</p>
       
    67 		
       
    68 		<p>
       
    69 			The <code>~/.odbc.ini</code> contains personal configuration of given user.
       
    70 			There are usually data sources including the passwords.
       
    71 			Thus this file must be readable only by given user (<code>chmod 600 ~/.odbc.ini</code>).
       
    72 			Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons:
       
    73 			by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes.
       
    74 		</p>
       
    75 		
       
    76 		<p>
       
    77 			The section name – in the <code>[]</code> brackets – is the DSN.
       
    78 			Then there are parameters in form of <code>key=value</code> on each line.
       
    79 		</p>
       
    80 		
       
    81 		
       
    82 		<h2>CLI options</h2>
       
    83 		
       
    84 		<p>
       
    85 			The <code>relpipe-tr-sql</code> and <code>relpipe-in-sql</code> support these relevant CLI options:
       
    86 		</p>
       
    87 		
       
    88 		<ul>
       
    89 			<li>
       
    90 				<code>--list-data-sources</code>:
       
    91 				lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to <code>relpipe-out-tabular</code>)
       
    92 			</li>
       
    93 			<li>
       
    94 				<code>--data-source-name</code>:
       
    95 				specifies the DSN of a configured data source
       
    96 			</li>
       
    97 			<li>
       
    98 				<code>--data-source-string</code>:
       
    99 				specifies the connections string for ad-hoc connection without need of any configuration
       
   100 			</li>
       
   101 		</ul>
       
   102 		
       
   103 		<pre><![CDATA[$ relpipe-tr-sql --list-data-sources | relpipe-out-tabular 
       
   104 data_source:
       
   105  ╭───────────────┬──────────────────────╮
       
   106  │ name (string) │ description (string) │
       
   107  ├───────────────┼──────────────────────┤
       
   108  │ sqlite-memory │ SQLite3              │
       
   109  │ relpipe       │ PostgreSQL Unicode   │
       
   110  ╰───────────────┴──────────────────────╯
       
   111 Record count: 2]]></pre>
       
   112 
       
   113 		<p>
       
   114 			Because output of this command is relational, we can further process it in our relational pipelines.
       
   115 			This output is also used for the Bash-completion for suggesting the DSN.
       
   116 		</p>
       
   117 		
       
   118 		<p>
       
   119 			If neither <code>--data-source-name</code> nor <code>--data-source-string</code> option is provided,
       
   120 			a temporary in-memory SQLite database is used as default.
       
   121 		</p>
       
   122 		
       
   123 		<h2>SQLite</h2>
       
   124 		
       
   125 		<p>In Debian GNU/Linux and similar distributions we can install <a href="https://sqlite.org/">SQLite</a> ODBC driver by this command:</p>
       
   126 		
       
   127 		<pre>apt install libsqliteodbc</pre>
       
   128 		
       
   129 		<p>Which also installs the SQLite library that is all we need (because SQLite is a <i>serverless and self-contained</i> database).</p>
       
   130 		
       
   131 		<p>
       
   132 			Then we can use the default in-memory temporary database or specify the connection string ad-hoc, 
       
   133 			<m:a href="examples-in-sql-selecting-existing-database">access existing SQLite databases</m:a>
       
   134 			or <m:a href="examples-in-filesystem-tr-sql-indexing">create new ones</m:a>	– e.g. this command:
       
   135 		</p>
       
   136 		
       
   137 		<pre>… | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'</pre>
       
   138 		
       
   139 		<p>will create the <code>MyDatabase.sqlite</code> file and fill it with relations that came from STDIN.</p>
       
   140 		
       
   141 		<p>For frequently used databases it is convenient to configure a data source in <code>~/.odbc.ini</code>:</p>
       
   142 		
       
   143 		<m:pre jazyk="ini"><![CDATA[[MyDatabase]
       
   144 Driver=SQLite3
       
   145 Database=file:/home/hacker/MyDatabase.sqlite]]></m:pre>
       
   146 
       
   147 		<p>
       
   148 			and then connect to it simply using <code>--data-source-name MyDatabase</code>
       
   149 			(both the option and the name will be suggested by Bash-completion).
       
   150 		</p>
       
   151 		
       
   152 		<p>
       
   153 			The <a href="http://www.ch-werner.de/sqliteodbc/html/index.html">SQLite ODBC driver</a> supports several parameters that are described in its documentation.
       
   154 			One of them is <code>LoadExt</code> that loads SQLite extensions:
       
   155 		</p>
       
   156 		
       
   157 		<m:pre jazyk="ini"><![CDATA[LoadExt=/home/hacker/libdemo.so]]></m:pre>
       
   158 		
       
   159 		<p>
       
   160 			So we can write our own SQLite extension with custom functions or other features 
       
   161 			(<a href="https://blog.frantovo.cz/c/383/Komplexita%20softwaru%3A%20%C5%98e%C5%A1en%C3%AD%20a%C2%A0prevence#toc_sqlite">example</a>)
       
   162 			or chose some existing one and load it into the SQLite connected through ODBC.
       
   163 		</p>
       
   164 
       
   165 		
       
   166 		<h2>PostgreSQL</h2>
       
   167 		
       
   168 		<p>In Debian GNU/Linux and similar distributions we can install <a href="https://www.postgresql.org/">PostgreSQL</a> ODBC driver by this command:</p>
       
   169 		
       
   170 		<pre>apt install odbc-postgresql</pre>
       
   171 		
       
   172 		<p>
       
   173 			PostgreSQL is very powerful DBMS (probably most advanced free software relational database system)
       
   174 			and utilizes the client-server architecture.
       
   175 			This means that we also needs a server (can be also installed through <code>apt</code> like the driver).
       
   176 		</p>
       
   177 		
       
   178 		<p>
       
   179 			Once we have a server – remote or local – we need to create a user (role).
       
   180 			For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default,
       
   181 			which means that all relations we create are lost at the end of the session (when the <code>relpipe-tr-sql</code> command finishes),
       
   182 			thus it behaves very similar to the SQLite in-memory database.
       
   183 		</p>
       
   184 		
       
   185 		<m:pre jazyk="sql"><![CDATA[CREATE USER relpipe WITH PASSWORD 'someSecretPassword';
       
   186 ALTER ROLE relpipe SET search_path TO 'pg_temp';]]></m:pre>
       
   187 
       
   188 		<p>
       
   189 			And then we <a href="https://odbc.postgresql.org/docs/config.html">configure</a> the ODBC data source:
       
   190 		</p>
       
   191 
       
   192 		<m:pre jazyk="ini"><![CDATA[[postgresql-temp]
       
   193 Driver=PostgreSQL Unicode
       
   194 Database=postgres
       
   195 Servername=localhost
       
   196 Port=5432
       
   197 Username=relpipe
       
   198 Password=someSecretPassword]]></m:pre>
       
   199 
       
   200 		<p>
       
   201 			Now we can use advanced PostgreSQL features for transforming data in our pipelines.
       
   202 			We can also configure a DSN for another database that contains some useful data and other database objects, 
       
   203 			call existing business functions installed in such database, load data to or from this DB etc.
       
   204 		</p>
       
   205 		
       
   206 		
       
   207 		<h2>MySQL</h2>
       
   208 		
       
   209 		<p>
       
   210 			If the <code>libmyodbc</code> package is missing in our distribution,
       
   211 			the	ODBC driver for <a href="https://dev.mysql.com/downloads/connector/odbc/">MySQL</a> can be downloaded from their website.
       
   212 			We can get a binary package (<code>.deb</code>, <code>.rpm</code> etc.) or source code.
       
   213 			If we are compiling from sources, we do something like this:
       
   214 		</p>
       
   215 		
       
   216 		<m:pre jazyk="bash"><![CDATA[cd mysql-connector-odbc-*-src/
       
   217 mkdir build
       
   218 cd build
       
   219 cmake ../ -DWITH_UNIXODBC=1
       
   220 make]]></m:pre>
       
   221 
       
   222 		<p>
       
   223 			We should use the driver in the same or similar version as the MySQL client library installed on our system.
       
   224 			For example 8.x driver will not work with 5.x library.
       
   225 			Successful compilation results in <code>libmyodbc*.so</code> files.
       
   226 		</p>
       
   227 		
       
   228 		<p>
       
   229 			Like PostgreSQL, also MySQL is a client-server,
       
   230 			so we need a server where we create a database and some user account.
       
   231 			As root through the <code>mysql mysql</code> command we execute:
       
   232 		</p>
       
   233 		
       
   234 		<m:pre jazyk="sql"><![CDATA[CREATE DATABASE relpipe CHARACTER SET = utf8;
       
   235 CREATE USER 'relpipe'@'localhost' IDENTIFIED BY 'someSecretPassword';
       
   236 GRANT ALL PRIVILEGES ON relpipe.* TO 'relpipe'@'localhost';
       
   237 FLUSH PRIVILEGES;]]></m:pre>
       
   238 
       
   239 		<p>As a normal user we add new data source to our <code>~/.odbc.ini</code> file:</p>
       
   240 
       
   241 		<m:pre jazyk="ini"><![CDATA[[mysql-relpipe-localhost]
       
   242 Driver=/home/hacker/src/mysql/build/lib/libmyodbc5w.so
       
   243 Server=localhost
       
   244 Port=3306
       
   245 Socket=/var/run/mysqld/mysqld.sock
       
   246 User=relpipe
       
   247 Password=someSecretPassword
       
   248 Database=relpipe
       
   249 InitStmt=SET SQL_MODE=ANSI_QUOTES;
       
   250 Charset=utf8]]></m:pre>
       
   251 
       
   252 		<p>
       
   253 			See that we have compiled the ODBC driver in our home directory
       
   254 			and even without installing it anywhere and registering it in the <code>/etc/odbcinst.ini</code> file,
       
   255 			we can simply refer to the <code>.so</code> file from our <code>~/.odbc.ini</code>.
       
   256 		</p>
       
   257 
       
   258 		<p>
       
   259 			If we set <code>Server=localhost</code>, the client-server communication does not go through TCP/IP
       
   260 			but rather through the unix domain socket specified in the <code>Socket</code> field.
       
   261 			If we set <code>Server=127.0.0.1</code> or some remote IP address or domain name, the communication goes through TCP/IP on given port.
       
   262 		</p>
       
   263 		
       
   264 		<p>
       
   265 			The <code>SET SQL_MODE=ANSI_QUOTES;</code> init statement is important,
       
   266 			because it tells MySQL server that it should support standard SQL "quoted" identifiers
       
   267 			instead of that `weird` MySQL style.
       
   268 			We use the standard SQL while creating the tables.
       
   269 		</p>
       
   270 		
       
   271 		<p>
       
   272 			There are many other parameters, quite well 
       
   273 			<a href="https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-connection-parameters.html">documented</a>.
       
   274 		</p>
       
   275 		
       
   276 		<p>
       
   277 			Now we can use MySQL as the <i>SQL engine</i> for transformations in our pipelines
       
   278 			and we can also access existing MySQL databases,
       
   279 			load data to and from them
       
   280 			or call functions and procedures installed on the server. 
       
   281 		</p>
       
   282 		
       
   283 		
       
   284 	</text>
       
   285 
       
   286 </stránka>