Under review

Chris21 import unexpectedly hangs

Adrian Corston 3 months ago in UNIFYBroker/Frontier ichris/chris21 updated 3 months ago 5

A chris21 DET import has hung, seemingly forever.  Is there some way to find out what caused it to hang?  Could we have a timeout added so it automatically recovers.  Right now there's no way to find out there's an issue until the customer rings to ask why processing has stopped.

Exactly this same behaviour happened again yesterday at 15:00 UTC, so could you please escalate this up as an urgent Bug rather than a Question?

The customer is planning a demonstration of this proof-of-concept environment to their management on Friday, so it needs to be working 100% before then or else the concept will not be proven :-)

Under review

Hi Adrian

The chris21 agent does have a timeout, which yours appears to be set to 1 hour. This is a per request timeout, not the whole import, so I'd recommend reducing this. The bulk of the requests made during a full import will be for chunks of 1000 (as set on your connector) users, so choose an appropriate timeout for requests of this size in your environment.

There's nowhere that suppresses timeout errors, so if one occurs, the import should immediately fail.

Thanks Beau.  I've set my timeout to 10 minutes instead of one hour.

In the UNIFYBroker log (private attachment above) I see the following chunking messages (each chunk being a request):

After that, there are no chunk log entries recorded until the service is forcibly restarted.  So the Chris21 timeout does not appear to have worked correctly - hours after the last request started the connector import is still running, and Cancel Import does not stop it from running.  The connector normally returns about 20,000 entities.  Are you able to make it so that the timeout works?

The timeout is part of the underlying web request framework so its not something we control, other than setting it. It definitely look to be getting set, though.

I did identify a possible threading issue with the chunking requests code, which would have the same symptoms as a non-functional timeout. I've provided a patch to David.

Awesome thanks.  I'll get David to install it after the customer's internal demo on Friday, so we can try it out.