0
Fixed

Op List aborts after approx 6 hours

Peter Wass 8 years ago • updated by anonymous 3 years ago 8

When processing a large op list (89 MAs each taking a long time) the op lists aborts after about 6 hours with System.Threading.ThreadAbortException: Thread was being aborted.

This is causing issues with large sites. There is a work around of breaking up the op lists to try and get them to complete within 6 hours. However there are issues when the Delta imports take longer then that as it interferes with normal operation of the solution.

Complete log files attached.
Confirming EB Version with client.
Will get an export of the config as well.


EBConfig.zip
Logs.zip

Matthew,

You may be able to supply your experience at DET to assist Peter Wass with this issue. Please work with him to resolve.

Peter,

A few comments based on the above. It may just be me, but I can't see where the complete logs are attached

I have seen that ThreadAbortException at DET, but it does not seem to interrupt operation lists (seems to be when it is attempting to close an unresponsive thread). If a ThreadAbortException is killing an operation list, then I believe this is a bug. That said, it could be something else

Something to check would be the "timeouts" of the individual operation lists, seen by right clicking on the Operation -> General Properties -> Retry Settings. By default this is 10 minutes and is fairly arbitrary, but if the time taken is longer than this, Event Broker considers the entire operation list to fail. Solutions include increasing the timeout (advisable) or setting the "On Failure" for an operation to Continue, rather than Fail

Another point is that it is not advisable to create a single operation list containing 87 MAs. The reason for this is that if a single point fails, Event Broker does not resume from the point in the list that it failed - it resumes from the start. There may be several options for restructuring these lists, especially if they are only performing delta operations (for example, the Schedule for each MA could contain the weekly/daily processing that needs to happen for the MA, rather than attempting it all in one hit)

Finally, all large operations that could potentially clash should be put on the same thread to avoid database contention within ILM (eg. if you have pending exports that will be fired off on 7 systems during a large full sync job)

Let me know if this helps. The Event Broker logs (preferably in Debug mode) as well as the above information you are collecting will definitely be of assistance

Work logged, please review

Matt, probably best to reassign back to Peter if you want him to review your log entry.

Peter, Matt put his thoughts against the work log, rather than the comments.

Referred the suggestions back to Mark @ ACT Education to see if changing the timeout value fixes the issue. We were having the issue on deltas as well, so timeout may well be a factor. We'd already discussed breaking up the Op list.

Once I get a response from Mark I'll update the job with more info.

Logs and Configs attached. Version is 2.2.3

Timeout issue - working now.