2
|
1 |
README.txt
|
|
2 |
|
|
3 |
|
|
4 |
This Poller class demonstrates access to poll(2) functionality in Java.
|
|
5 |
|
|
6 |
Requires Solaris production (native threads) JDK 1.2 or later, currently
|
|
7 |
the C code compiles only on Solaris (SPARC and Intel).
|
|
8 |
|
|
9 |
Poller.java is the class, Poller.c is the supporting JNI code.
|
|
10 |
|
|
11 |
PollingServer.java is a sample application which uses the Poller class
|
|
12 |
to multiplex sockets.
|
|
13 |
|
|
14 |
SimpleServer.java is the functional equivalent that does not multiplex
|
|
15 |
but uses a single thread to handle each client connection.
|
|
16 |
|
|
17 |
Client.java is a sample application to drive against either server.
|
|
18 |
|
|
19 |
To build the Poller class and client/server demo :
|
|
20 |
javac PollingServer.java Client.java
|
|
21 |
javah Poller
|
|
22 |
cc -G -o libpoller.so -I ${JAVA_HOME}/include -I ${JAVA_HOME}/include/solaris\
|
|
23 |
Poller.c
|
|
24 |
|
|
25 |
You will need to set the environment variable LD_LIBRARY_PATH to search
|
|
26 |
the directory containing libpoller.so.
|
|
27 |
|
|
28 |
To use client/server, bump up your fd limit to handle the connections you
|
|
29 |
want (need root access to go beyond 1024). For info on changing your file
|
|
30 |
descriptor limit, type "man limit". If you are using Solaris 2.6
|
|
31 |
or later, a regression in loopback read() performance may hit you at low
|
|
32 |
numbers of connections, so run the client on another machine.
|
|
33 |
|
|
34 |
BASICs of Poller class usage :
|
|
35 |
run "javadoc Poller" or see Poller.java for more details.
|
|
36 |
|
|
37 |
{
|
|
38 |
Poller Mux = new Poller(65535); // allow it to contain 64K IO objects
|
|
39 |
|
|
40 |
int fd1 = Mux.add(socket1, Poller.POLLIN);
|
|
41 |
...
|
|
42 |
int fdN = Mux.add(socketN, Poller.POLLIN);
|
|
43 |
|
|
44 |
int[] fds = new int[100];
|
|
45 |
short[] revents = new revents[100];
|
|
46 |
|
|
47 |
int numEvents = Mux.waitMultiple(100, fds, revents, timeout);
|
|
48 |
|
|
49 |
for (int i = 0; i < numEvents; i++) {
|
|
50 |
/*
|
|
51 |
* Probably need more sophisticated mapping scheme than this!
|
|
52 |
*/
|
|
53 |
if (fds[i] == fd1) {
|
|
54 |
System.out.println("Got data on socket1");
|
|
55 |
socket1.getInputStream().read(byteArray);
|
|
56 |
// Do something based upon state of fd1 connection
|
|
57 |
}
|
|
58 |
...
|
|
59 |
}
|
|
60 |
}
|
|
61 |
|
|
62 |
Poller class implementation notes :
|
|
63 |
|
|
64 |
Currently all add(),remove(),isMember(), and waitMultiple() methods
|
|
65 |
are synchronized for each Poller object. If one thread is blocked in
|
|
66 |
pObj.waitMultiple(), another thread calling pObj.add(fd) will block
|
|
67 |
until waitMultiple() returns. There is no provided mechanism to
|
|
68 |
interrupt waitMultiple(), as one might expect a ServerSocket to be in
|
|
69 |
the list waited on (see PollingServer.java).
|
|
70 |
|
|
71 |
One might also need to interrupt waitMultiple() to remove()
|
|
72 |
fds/sockets, in which case one could create a Pipe or loopback localhost
|
|
73 |
connection (at the level of PollingServer) and use a write() to that
|
|
74 |
connection to interrupt. Or, better, one could queue up deletions
|
|
75 |
until the next return of waitMultiple(). Or one could implement an
|
|
76 |
interrupt mechanism in the JNI C code using a pipe(), and expose that
|
|
77 |
at the Java level.
|
|
78 |
|
|
79 |
If frequent deletions/re-additions of socks/fds is to be done with
|
|
80 |
very large sets of monitored fds, the Solaris 7 kernel cache will
|
|
81 |
likely perform poorly without some tuning. One could differentiate
|
|
82 |
between deleted (no longer cared for) fds/socks and those that are
|
|
83 |
merely being disabled while data is processed on their behalf. In
|
|
84 |
that case, re-enabling a disabled fd/sock could put it in it's
|
|
85 |
original position in the poll array, thereby increasing the kernel
|
|
86 |
cache performance. This would best be done in Poller.c. Of course
|
|
87 |
this is not necessary for optimal /dev/poll performance.
|
|
88 |
|
|
89 |
Caution...the next paragraph gets a little technical for the
|
|
90 |
benefit of those who already understand poll()ing fairly well. Others
|
|
91 |
may choose to skip over it to read notes on the demo server.
|
|
92 |
|
|
93 |
An optimal solution for frequent enabling/disabling of socks/fds
|
|
94 |
could involve a separately synchronized structure of "async"
|
|
95 |
operations. Using a simple array (0..64k) containing the action
|
|
96 |
(ADD,ENABLE,DISABLE, NONE), the events, and the index into the poll
|
|
97 |
array, and having nativeWait() wake up in the poll() call periodically
|
|
98 |
to process these async operations, I was able to speed up performance
|
|
99 |
of the PollingServer by a factor of 2x at 8000 connections. Of course
|
|
100 |
much of that gain was from the fact that I could (with the advent of
|
|
101 |
an asyncAdd() method) move the accept() loop into a separate thread
|
|
102 |
from the main poll() loop, and avoid the overhead of calling poll()
|
|
103 |
with up to 7999 fds just for an accept. In implementing the async
|
|
104 |
Disable/Enable, a further large optimization was to auto-disable fds
|
|
105 |
with events available (before return from nativeWait()), so I could
|
|
106 |
just call asyncEnable(fd) after processing (read()ing) the available
|
|
107 |
data. This removed the need for inefficient gang-scheduling the
|
|
108 |
attached PollingServer uses. In order to separately synchronize the
|
|
109 |
async structure, yet still be able to operate on it from within
|
|
110 |
nativeWait(), synchronization had to be done at the C level here. Due
|
|
111 |
to the new complexities this introduced, as well as the fact that it
|
|
112 |
was tuned specifically for Solaris 7 poll() improvements (not
|
|
113 |
/dev/poll), this extra logic was left out of this demo.
|
|
114 |
|
|
115 |
|
|
116 |
Client/Server Demo Notes :
|
|
117 |
|
|
118 |
Do not run the sample client/server with high numbers of connections
|
|
119 |
unless you have a lot of free memory on your machine, as it can saturate
|
|
120 |
CPU and lock you out of CDE just by its very resource intensive nature
|
|
121 |
(much more so the SimpleServer than PollingServer).
|
|
122 |
|
|
123 |
Different OS versions will behave very differently as far as poll()
|
|
124 |
performance (or /dev/poll existence) but, generally, real world applications
|
|
125 |
"hit the wall" much earlier when a separate thread is used to handle
|
|
126 |
each client connection. Issues of thread synchronization and locking
|
|
127 |
granularity become performance killers. There is some overhead associated
|
|
128 |
with multiplexing, such as keeping track of the state of each connection; as
|
|
129 |
the number of connections gets very large, however, this overhead is more
|
|
130 |
than made up for by the reduced synchronization overhead.
|
|
131 |
|
|
132 |
As an example, running the servers on a Solaris 7 PC (Pentium II-350 x
|
|
133 |
2 CPUS) with 1 GB RAM, and the client on an Ultra-2, I got the following
|
|
134 |
times (shorter is better) :
|
|
135 |
|
|
136 |
1000 connections :
|
|
137 |
|
|
138 |
PollingServer took 11 seconds
|
|
139 |
SimpleServer took 12 seconds
|
|
140 |
|
|
141 |
4000 connections :
|
|
142 |
|
|
143 |
PollingServer took 20 seconds
|
|
144 |
SimpleServer took 37 seconds
|
|
145 |
|
|
146 |
8000 connections :
|
|
147 |
|
|
148 |
PollingServer took 39 seconds
|
|
149 |
SimpleServer took 1:48 seconds
|
|
150 |
|
|
151 |
This demo is not, however, meant to be considered some form of proof
|
|
152 |
that multiplexing with the Poller class will gain you performance; this
|
|
153 |
code is actually very heavily biased towards the non-polling server as
|
|
154 |
very little synchronization is done, and most of the overhead is in the
|
|
155 |
kernel IO for both servers. Use of multiplexing may be helpful in
|
|
156 |
many, but certainly not all, circumstances.
|
|
157 |
|
|
158 |
Benchmarking a major Java server application which can run
|
|
159 |
in a single-thread-per-client mode or using the new Poller class showed
|
|
160 |
Poller provided a 253% improvement in throughput at a moderate load, as
|
|
161 |
well as a 300% improvement in peak capacity. It also yielded a 21%
|
|
162 |
smaller memory footprint at the lower load level.
|
|
163 |
|
|
164 |
Finally, there is code in Poller.c to take advantage of /dev/poll
|
|
165 |
on OS versions that have that device; however, DEVPOLL must be defined
|
|
166 |
in compiling Poller.c (and it must be compiled on a machine with
|
|
167 |
/usr/include/sys/devpoll.h) to use it. Code compiled with DEVPOLL
|
|
168 |
turned on will work on machines that don't have kernel support for
|
|
169 |
the device, as it will fall back to using poll() in those cases.
|
|
170 |
Currently /dev/poll does not correctly return an error if you attempt
|
|
171 |
to remove() an object that was never added, but this should be fixed
|
|
172 |
in an upcoming /dev/poll patch. The binary as shipped is not built with
|
|
173 |
/dev/poll support as our build machine does not have devpoll.h.
|
|
174 |
|