[wt4hq] A bit late...
Mario Lorenz
ml at vdazone.org
Fri Aug 17 00:23:19 CEST 2007
... but I've finally gotten around to write down a few
of our impressions from the IARU contest.
So hello everybody;
this is to prove the old proverb that one shouldnt postpone
what ought to be done immediately, but well...
The issues with the tunnel server and client have already
been discussed and adressed, the memory leaks in
dxTelnet and the tunnel client also have already been
mentioned and will hopefully be fixed.
Having to start the contest without logsync, and seeing the logs
diverge even on the local lan again, I tried several times to
get logsync back on, with several strategies.
First of all, with only the regular network protocol, without
logsync, there *is* qso loss on the LAN. This was reported
by other stations too.
I have no plausible explanation for that, but I therefore suggest
thinking about moving the protocol to something TCP based, to
prevent this type of loss.
(P2P apps routinely have more TCP connections open than what would
be required even for large WT setups; if there are still concerns,
the network protocoll would elect a set of Super-Nodes to handle
the traffic in a hierarchy)
Logsync. Sigh. I am not sure if the current approach will
ever scale to networks the size we usually have, with running,
support, and hot standby stations, all live in the net.
First experiments with turning logsync back on happened at night,
after I had finally found the server bug.
With all these stations having already roughly 12000 Q's but most
of them missing a different few, turning on logsync caused
IHAVE/IHAVELAN announcements flooding the LAN at a rate that
caused our tunnel client computer to show CPU (500MHz box) to
show CPU utilization of ~80% and operators sitting in front
of similar computers complaining about WT reacting slowly.
Fortunately we had a new highspeed internet uplink, but even with
the filtering employed by the tunnel client, I'm affraid we
pretty much flooded the server.
We had to turn it back off. As I said, I tried this several times
with several constellations regarding turning on or off logsync, but
the ensuing flood of traffic would make it difficult to turn logsync off
on the remote station via remote commands.
I do not think the current logsyncing approach will ever work unless
the number of involved stations is low, all of the stations are mostly
in sync, high bandwith links are available, or all of the above.
I do wonder if it would not be a better to turn off logsync
alltogether, use only the normal network updating, and have all stations
on a site regularly save a backup of their log into one (shared) directory,
where these could be merged automatically into one merged log,
which all stations would then fetch from that directory and import back
into their local databases (preferably in the background so that running
the contest is not affected). This approach would hopefully eliminate
the logsync network protocol, and produce accurate global log files
to fix cases where ADDQSO's got lost, at a fraction of the bandwith
involved, and much more controllable.
Next: Could you please implement a filter for the official contest
start time in logsync ?
While we issued remote clearlogall's right before the contest started,
obviously one or two stations were not connected or missed the command,
with the result that test qso's from the day before got synced into
everyone's log.
While of course being removed before sending the log to the ARRL, they
showed up as worked Multis during the contest (when they were not
worked) and also made any statistics/objective/target graphs
meaningless, because the statistics would assume the oldest QSO to be
the begin of the contest, which happened to have been on the day
before...
And the worst: We had several crashes, particularly after the remote
clearlogall, or otherwise during reloading of the contest, in one case
wiping out one .wtb completely and starting over at QSO 1.
So, all in all, this year's WT experience at the IARU
championship was not too well, but we cant really complain, either.
So CU next year!
Mario
--
Mario Lorenz Internet: <ml at vdazone.org>
Ham Radio: DL5MLO at DB0ERF.#THR.DEU.EU
Trust the computer industry to shorten "Year 2000" to Y2K. It was this kind
of thinking that caused the problem in the first place.
More information about the Wt4hq
mailing list