[WT-support] Network LogSync problems...
Mario Lorenz
ml-wt at vdazone.org
Wed Sep 4 17:18:47 CEST 2013
Am 04. Sep 2013, um 13:30:19 schrieb Laurent HAAS - F6FVY:
> Hi Mario et al.
>
> Le 04/09/2013 00:44, Mario Lorenz a écrit :
>
> >question: At this year's IARU HF, we have seen severe problems with
> >network logsync. Essentially QSO numbers didnt match at all, there was
> >quite significant traffic, but things were so bad that chat messages
> >even vanished on the local LAN.
Hello Larry,
what actually happened at DA0HQ (as far as I could reproduce it) was
that one machine was disconnected from the net when I sent the
CLEARLOGNOW remote commands to clear out any test QSOs or other QSOs
before the contest started.
When this machine then reconnected, several hundred invalid QSOs
started synching around.
Now for scoring, the QSOs were easy to identify and to delete, but
nevertheless this situation
a) destabilized the Tunnel server links - I expected that much -
b) even in the local LAN Chat messages got lost. And then likely also
ADDQSOs, adding to the inconsistency and thus to the problem.
b) was a surprise that I didnt expect, given we do have
100M LAN, and reasonably fast (P4/2.8GHz) machines.
> >Have there been any changes in the networking / logsync code in the last
> >versions ? (I'm not doing this test for each version, but I'd wager
> >4.7 and possibly (not sure) 4.9 we didnt observe this.
>
> There has been no major modification on this part of the code,
> except the various improvements regarding the bridgehead concept to
> decrease network traffic in the LANs.
Hm.. Ok. Was just a question... I might try rolling back to 4.8 or
somesuch anyway.
> Actually, the sync code and concept is not tailored at all to run
> the test case you described (and tried) : Having one "master log"
> only, of several thousands of Q, and 11 (!!!) other *empty* logs to
> sync together is not a realistic view of what is happening in the
> real life.
Oh I dont know, in view of what I wrote above. Granted, in a normal
contest this is rather unlikely to happen this way since one doesnt have
to shepheard some 40+ distributed machines, but again, in the LAN case,
even copying over whole QSO databases ought to generate less traffic.
Nevertheless the system should act stable...
> This generates a HUGE traffic, and as the frames are UDP
> only as you know, no doubt some of your chit-chat (gab) frames were
> lost !
Then perhaps some traffic could be rate-limited ?
> Also, except if I misunderstood, no doubt that disconnecting
> the "master log" while none of the 11 other machines is sync, will
> definitely lead to incomplete logs. But, once again, it has _not_
> been designed for this kind of configuration.
How come ? I thought the rule was that if some node sees a missing QSO
it tries to fetch from the originating node, but if thats not available
will also sync from other nodes?
> I understand the sync process could be improved or even redesigned
> to fit this kind of configuration / usage, but I will personally not
> take this risk for such rare occasions. And also, I'm not the former
> author of this part of the Win-Test code, which doesn't help.
I'd be willing to help, if needed...
73s,
Mario
--
Mario Lorenz Internet: <ml at vdazone.org>
Ham Radio: DL5MLO at DB0ERF.#THR.DEU.EU
More information about the Support
mailing list