0
Not a bug

There were not enough free threads in the ThreadPool to complete the operation

Anthony Soquin 6 years ago updated by anonymous 6 years ago 13
Topic collaborators

Hi,

A client reported the following error from PROD Event Broker:

Operation aa1021b4-fb70-447b-b8e2-e625079329f7 failed in operation list with id 975036c6-c301-4b07-92d5-f6757732b0bc for the following reason. This is retry number 0: Unify.Product.EventBroker.RestAPIAgentSendRequestFailedException: The sending of the request failed. See the inner exception for more information. ---> System.InvalidOperationException: There were not enough free threads in the ThreadPool to complete the operation.
   at System.Net.HttpWebRequest.BeginGetRequestStream(AsyncCallback callback, Object state)
   at System.Net.Http.HttpClientHandler.StartGettingRequestStream(RequestState state)
   at System.Net.Http.HttpClientHandler.PrepareAndStartContentUpload(RequestState state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Unify.Product.EventBroker.RestAPIAgent.<SendPostRequestAsync>d__3.MoveNext()
   --- End of inner exception stack trace ---
   at Unify.Product.EventBroker.RestAPIPlugIn.Execute()
   at Unify.Product.EventBroker.OperationListExecutorBase.RunNextOperations(IEnumerator`1 operationEnumerator)

For info:

This error comes from the Event Broker List Operation : "IDB - STUDENT IMPORT". The operation which failed is "STUDENT UNIT IMPORT". This operation request to run an import of Identity Broker connector: "Student_Unit".

Moreover Event Broker web site is not accessible anymore.

Configuration:

Event Broker 3.2.0

Identity Broker 5.0.3

Example of the configuration in UAT:

Image 4660


Thanks.

Regards,

Answer

Answer
Not a bug

Hi Anthony,

How frequently are you experiencing this error? After you saw it for the first time, has it continued to occur until a service restart or any other recovery actions?

The error simply suggests that at the time the operation started, there were too many other operations running in parallel, exhausting the thread pool. As other operations complete, threads should free up and the issue should resolve itself automatically.

Answer
Not a bug

Hi Anthony,

How frequently are you experiencing this error? After you saw it for the first time, has it continued to occur until a service restart or any other recovery actions?

The error simply suggests that at the time the operation started, there were too many other operations running in parallel, exhausting the thread pool. As other operations complete, threads should free up and the issue should resolve itself automatically.

Curtis, I have spoken to Adam V.

We have confirmed that the issues has occurred twice in 4 days. The Event Broker Service was restarted  4 days ago and then again yesterday. It has been agreed with Adam V that we will request another restart and determine if we can add in additional memory and space out the sync operations. 

As discussed, perform restart and monitor over the next few days. Consider increasing RAM if it's constrained, otherwise schedule tasks to run less frequently or fewer operations at the same time (exclusion groups).

Exceptions themselves are not an issue and do not automatically equal a bug in the product. In this case it may just be that the machine is not spec'd properly for the number of concurrent operations that are being performed.

As the errors are so infrequent and MIM Event Broker has been designed to retry failed operations - it should resolve itself in subsequent retries.

With regards to the UI not being responsive after/during the occurrence of this issue, turn the built-in web server off and use IIS in its place. The built-in web server is deprecated and must also compete for shared resources with MIM Event Broker.

Dear Product team,

We have just experienced a new occurrence of the problem.


Below is a summary of what we know

On the 18/11 : the following patches have been applied

ComputerName Date                KB        Title                                                                       
------------ ----                --        -----                                                                       
FIMSYNC01    18/11/2017 3:13:... KB4048958 2017-11 Security Monthly Quality Rollup for Windows Server 2012 R2 for x6...
FIMSYNC01    18/11/2017 3:05:... KB4047206 Cumulative Security Update for Internet Explorer 11 for Windows Server 20...
FIMSYNC01    18/11/2017 3:03:... KB890830  Windows Malicious Software Removal Tool for Windows 8, 8.1, 10 and Window...


On Monday 20/11 (exact time TBC): The Event Broker stopped working (Service still running, not processing)

Action taken. Service was restarted


On Wednesday 22/11  (exact time TBC): The Event Broker stopped working (Service still running, not processing)

Action taken. Service was restarted


On Thursday 23/11 (exact time TBC) : The Event Broker stopped working (Service still running, not processing)

Action taken. Service was restarted.


Next Steps : Product team to advise

Thanks for the update, Paul. Could you please clarify what is meant by "not processing"? Do you mean that the error is reoccuring, operations are not executing, that there is no activity in the logs at all, or that the website is not responding? This may or may not be related to the original issue raised on this ticket ("There were not enough free threads in the ThreadPool to complete the operation").

Could you please zip up the Extensibility directory and the Log files and attach them to this ticket?

If the website is not responding, please see Adam's previous comment regarding switching to IIS (see also: Configuring MIM Event Broker for use with IIS).

Hi Curtis,

When Paul said "not processing", it means that the service is "running" but it is blocked, in a waiting state or infinite loop, etc.. By consequence, Event broker doesn't request new actions to FIM.

As per my previous comment, that sounds consistent with the logs. It's not that it's not processing, it's that the operations are frequently blocking each other. I note 25,000 occurrences in the last 5 days of an operation failing to start because it is blocked by the exclusion group, which contains 27 operation lists. Please see Operation List Groups for advice on exclusion groups. Could you try limiting the number of operation lists in the exclusion groups and monitor whether the issue persists?

Hi,


The windows updates has been in UAT as well. No issue in UAT so far.

The configuration of the server in PROD:


I will update the log level to "Diagnostic" and wait for the next issue. Then, I will provide you the log for investigation.

Thanks.

Thanks for providing the extra details on the server specifications.

Is there been any update on this?

Hi,

Not for the moment.

The client validated this morning the modification of the log level.

I will provide you the logs when the issue will appear again.

Thanks.

Hi Anthony,

Any update on this issue?

The client agreed to park and monitor the issue for the moment.