relpipe-data/examples-in-filesystem-tr-sql-indexing.xml
branchv_0
changeset 281 0b6b1781a0a5
child 297 192b0059a6c4
equal deleted inserted replaced
280:eccf2de78284 281:0b6b1781a0a5
       
     1 <stránka
       
     2 	xmlns="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/strana"
       
     3 	xmlns:m="https://trac.frantovo.cz/xml-web-generator/wiki/xmlns/makro">
       
     4 	
       
     5 	<nadpis>Indexing and searching the filesystem</nadpis>
       
     6 	<perex>build an index of the filesystem and search it faster or offline using SQL</perex>
       
     7 	<m:pořadí-příkladu>03500</m:pořadí-příkladu>
       
     8 
       
     9 	<text xmlns="http://www.w3.org/1999/xhtml">
       
    10 		
       
    11 		<p>
       
    12 			Thanks to the <code>relpipe-in-filesystem</code> we can collect metadata (or even the file contents)
       
    13 			and store them for later use in an index file.
       
    14 			Such index is useful for faster access and for offline work (we can index e.g. an optical disc or external or network HDD).
       
    15 		</p>
       
    16 		
       
    17 		<p>
       
    18 			We can simply pipe the relational data into a file and use this file as the index.
       
    19 			Or we can use some other format. In this example, we will use an SQLite file as the index.
       
    20 		</p>
       
    21 		
       
    22 		<p>
       
    23 			First step is to collect the file metadata. We will index just a subset of our filesystem,
       
    24 			the <code>/bin/</code> and <code>/usr/bin/</code> directories:
       
    25 		</p>
       
    26 		
       
    27 		<m:pre jazyk="bash"><![CDATA[find /bin/ /usr/bin/ -print0 \
       
    28 	| relpipe-in-filesystem --relation "program" \
       
    29 	| relpipe-tr-sql --file bin.sqlite --file-keep true]]></m:pre>
       
    30 	
       
    31 		<p>
       
    32 			This index allows us to do fast searches and various analysis.
       
    33 			We can e.g. find 20 largest binaries:
       
    34 		</p>
       
    35 		
       
    36 		<m:pre jazyk="bash"><![CDATA[relpipe-in-sql \
       
    37 	--file bin.sqlite \
       
    38 	--relation "largest" \
       
    39 		"SELECT path, size FROM program WHERE type = 'f' ORDER BY size DESC LIMIT 20" \
       
    40 	| relpipe-out-tabular]]></m:pre>
       
    41 	
       
    42 		<p>How very:</p>
       
    43 		
       
    44 		<m:pre jazyk="text"><![CDATA[largest:
       
    45  ╭──────────────────────────────┬───────────────╮
       
    46  │ path                (string) │ size (string) │
       
    47  ├──────────────────────────────┼───────────────┤
       
    48  │ /usr/bin/blender             │ 76975440      │
       
    49  │ /usr/bin/blenderplayer       │ 32199344      │
       
    50  │ /usr/bin/mscore              │ 24252992      │
       
    51  │ /usr/bin/mysql_embedded      │ 23004600      │
       
    52  │ /usr/bin/node                │ 18369616      │
       
    53  │ /usr/bin/galax-parse         │ 18365264      │
       
    54  │ /usr/bin/galax-run           │ 18360496      │
       
    55  │ /usr/bin/clementine          │ 16818328      │
       
    56  │ /usr/bin/emacs25-nox         │ 15055112      │
       
    57  │ /usr/bin/doxygen             │ 14924104      │
       
    58  │ /usr/bin/rosegarden          │ 14416952      │
       
    59  │ /usr/bin/snap                │ 13472520      │
       
    60  │ /usr/bin/audacity            │ 13257064      │
       
    61  │ /usr/bin/pgadmin3            │ 13098800      │
       
    62  │ /usr/bin/qemu-system-aarch64 │ 12564688      │
       
    63  │ /usr/bin/qemu-system-arm     │ 12370192      │
       
    64  │ /usr/bin/qemu-system-ppc64   │ 12280864      │
       
    65  │ /usr/bin/qemu-system-ppc     │ 11738208      │
       
    66  │ /usr/bin/qemu-system-x86_64  │ 11658464      │
       
    67  │ /usr/bin/qemu-system-i386    │ 11623776      │
       
    68  ╰──────────────────────────────┴───────────────╯
       
    69 Record count: 20]]></m:pre>
       
    70 
       
    71 		<p>
       
    72 			And we can collect additional metadata and append them to our index file.
       
    73 			In this example, we get lists of dynamically linked libraries using the <code>ldd</code> tool
       
    74 			for each binary and store the lists in our index:
       
    75 		</p>
       
    76 
       
    77 		<m:pre jazyk="bash"><![CDATA[relpipe-in-sql \
       
    78 		--file bin.sqlite \
       
    79 		--relation bin "SELECT path FROM program WHERE type = 'f'" \
       
    80 	| relpipe-out-nullbyte \
       
    81 	| while read_nullbyte f; do 
       
    82 		ldd "$f" | perl -ne 'if (/ => (.*) \(/) { print "$ENV{f},$1\n"; }';
       
    83 	  done \
       
    84 	| relpipe-in-csv \
       
    85 		"dependency" \
       
    86 			"program" string \
       
    87 			"library" string \
       
    88 	| relpipe-tr-sql --file bin.sqlite]]></m:pre>
       
    89 	
       
    90 		<p>And then we can make a „popularity contest“ and find 20 most often used libraries:</p>
       
    91 		
       
    92 		<m:pre jazyk="bash"><![CDATA[relpipe-in-sql \
       
    93 	--file bin.sqlite \
       
    94 	--relation "popular_libraries" "
       
    95 		SELECT 
       
    96 			d.library, 
       
    97 			count(*) AS count 
       
    98 		FROM dependency AS d 
       
    99 		JOIN program AS p ON (d.program = p.path) 
       
   100 		GROUP BY library
       
   101 		ORDER BY count DESC
       
   102 		LIMIT 20" \
       
   103 	| relpipe-out-tabular]]></m:pre>
       
   104 		
       
   105 		<p>Well, well… here we are:</p>
       
   106 		
       
   107 		
       
   108 		<m:pre jazyk="bash"><![CDATA[popular_libraries:
       
   109  ╭────────────────────────────────────────────┬────────────────╮
       
   110  │ library                           (string) │ count (string) │
       
   111  ├────────────────────────────────────────────┼────────────────┤
       
   112  │ /lib/x86_64-linux-gnu/libc.so.6            │ 2508           │
       
   113  │ /lib/x86_64-linux-gnu/libpthread.so.0      │ 1487           │
       
   114  │ /lib/x86_64-linux-gnu/libdl.so.2           │ 1364           │
       
   115  │ /lib/x86_64-linux-gnu/libm.so.6            │ 1271           │
       
   116  │ /lib/x86_64-linux-gnu/librt.so.1           │ 1057           │
       
   117  │ /lib/x86_64-linux-gnu/libz.so.1            │ 1019           │
       
   118  │ /lib/x86_64-linux-gnu/libgcc_s.so.1        │ 811            │
       
   119  │ /lib/x86_64-linux-gnu/libpcre.so.3         │ 788            │
       
   120  │ /lib/x86_64-linux-gnu/liblzma.so.5         │ 749            │
       
   121  │ /usr/lib/x86_64-linux-gnu/libstdc++.so.6   │ 742            │
       
   122  │ /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 │ 681            │
       
   123  │ /lib/x86_64-linux-gnu/libbsd.so.0          │ 658            │
       
   124  │ /usr/lib/x86_64-linux-gnu/libXau.so.6      │ 648            │
       
   125  │ /usr/lib/x86_64-linux-gnu/libXdmcp.so.6    │ 648            │
       
   126  │ /usr/lib/x86_64-linux-gnu/libxcb.so.1      │ 648            │
       
   127  │ /usr/lib/x86_64-linux-gnu/libX11.so.6      │ 638            │
       
   128  │ /usr/lib/x86_64-linux-gnu/libpng16.so.16   │ 622            │
       
   129  │ /lib/x86_64-linux-gnu/libgpg-error.so.0    │ 616            │
       
   130  │ /lib/x86_64-linux-gnu/libgcrypt.so.20      │ 613            │
       
   131  │ /usr/lib/x86_64-linux-gnu/liblz4.so.1      │ 575            │
       
   132  ╰────────────────────────────────────────────┴────────────────╯
       
   133 Record count: 20]]></m:pre>
       
   134 
       
   135 		<p>
       
   136 			In future versions there might be an option to gather more file metadata like hashes, Exif etc.
       
   137 			But even in the current version, it is possible to gather any literally metadata using a custom script (as we have shown with <code>ldd</code> above).
       
   138 			Extended attributes are already supported (the <code>--xattr</code> option).
       
   139 		</p>
       
   140 		
       
   141 	</text>
       
   142 
       
   143 </stránka>