0
Fixed

This operation returned because the timeout period expired

Bob Bradley 5 years ago updated by Beau Harrison (Senior Product Software Engineer) 5 years ago 10
Topic collaborators

After many hours of processing the initial data load we are seeing repeated exceptions such as the following in the logs:

20190210,21:51:22,UNIFY Identity Broker,Adapter,Error,"Adapter
Adapter c700d25d-1825-4caf-ad26-b01910879914 page errored on page reflection. Duration: 00:00:17.3285030. Error: System.AggregateException: One or more errors occurred. ---> System.ApplicationException: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)
   at System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32 millisecondsTimeout)
   at System.Threading.ReaderWriterLock.AcquireWriterLock(TimeSpan timeout)
   at Unify.Framework.Collections.ThreadsafeHashSet`1.Add(T item)
   at Unify.Product.IdentityBroker.EntityBase`3.SetValue[TValue](TKey key, TValue value)
   at Unify.Product.IdentityBroker.AttributeMapper.MapAttributeValues(IEntity leftSideEntity, IEntity rightSideEntity)
   at System.Linq.Parallel.PartitionedDataSource`1.ListContiguousIndexRangeEnumerator.MoveNext(T& currentElement, Int32& currentKey)
   at System.Linq.Parallel.PipelineSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Threading.Tasks.Task.Execute()
   --- End of inner exception stack trace ---
   at System.Linq.Parallel.QueryTaskGroupState.QueryEnd(Boolean userInitiatedDispose)
   at System.Linq.Parallel.AsynchronousChannelMergeEnumerator`1.MoveNextSlowPath()
   at System.Linq.Parallel.QueryOpeningEnumerator`1.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)
   at System.Linq.Enumerable.Aggregate[TSource,TAccumulate](IEnumerable`1 source, TAccumulate seed, Func`3 func)
   at Unify.Product.IdentityBroker.Adapter.ReflectChangesInner()
   at Unify.Product.IdentityBroker.Adapter.ReflectChanges()
   at Unify.Product.IdentityBroker.AdapterAuditingDecorator.ReflectChanges()
   at Unify.Product.IdentityBroker.AdapterNotifierDecorator.ReflectChanges()
   at Unify.Product.IdentityBroker.ReflectAdapterOnChangeDueJob.<runbase>b__9_0(IOperationalAdapter adapter)
---> (Inner Exception #0) System.ApplicationException: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)
   at System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32 millisecondsTimeout)
   at System.Threading.ReaderWriterLock.AcquireWriterLock(TimeSpan timeout)
   at Unify.Framework.Collections.ThreadsafeHashSet`1.Add(T item)
   at Unify.Product.IdentityBroker.EntityBase`3.SetValue[TValue](TKey key, TValue value)
   at Unify.Product.IdentityBroker.AttributeMapper.MapAttributeValues(IEntity leftSideEntity, IEntity rightSideEntity)
   at System.Linq.Parallel.PartitionedDataSource`1.ListContiguousIndexRangeEnumerator.MoveNext(T& currentElement, Int32& currentKey)
   at System.Linq.Parallel.PipelineSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Threading.Tasks.Task.Execute()<---

---> (Inner Exception #1) System.ApplicationException: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)
   at System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32 millisecondsTimeout)
   at System.Threading.ReaderWriterLock.AcquireWriterLock(TimeSpan timeout)
   at Unify.Framework.Collections.ThreadsafeCollectionBase`2.Remove(TValue item)
   at Unify.Framework.Data.LinqContextConversionWithUpdateBase`5.UpdateOnSubmitDelegate(TInterface obj)
   at Unify.Product.IdentityBroker.EntityBase`3.SetValue[TValue](TKey key, TValue value)
   at Unify.Product.IdentityBroker.AttributeMapper.MapAttributeValues(IEntity leftSideEntity, IEntity rightSideEntity)
   at System.Linq.Parallel.PartitionedDataSource`1.ListContiguousIndexRangeEnumerator.MoveNext(T& currentElement, Int32& currentKey)
   at System.Linq.Parallel.PipelineSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Threading.Tasks.Task.Execute()<---
.
Error details:
System.AggregateException: One or more errors occurred. ---> System.ApplicationException: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)
   at System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32 millisecondsTimeout)
   at System.Threading.ReaderWriterLock.AcquireWriterLock(TimeSpan timeout)
   at Unify.Framework.Collections.ThreadsafeHashSet`1.Add(T item)
   at Unify.Product.IdentityBroker.EntityBase`3.SetValue[TValue](TKey key, TValue value)
   at Unify.Product.IdentityBroker.AttributeMapper.MapAttributeValues(IEntity leftSideEntity, IEntity rightSideEntity)
   at System.Linq.Parallel.PartitionedDataSource`1.ListContiguousIndexRangeEnumerator.MoveNext(T& currentElement, Int32& currentKey)
   at System.Linq.Parallel.PipelineSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Threading.Tasks.Task.Execute()
   --- End of inner exception stack trace ---
   at System.Linq.Parallel.QueryTaskGroupState.QueryEnd(Boolean userInitiatedDispose)
   at System.Linq.Parallel.AsynchronousChannelMergeEnumerator`1.MoveNextSlowPath()
   at System.Linq.Parallel.QueryOpeningEnumerator`1.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)
   at System.Linq.Enumerable.Aggregate[TSource,TAccumulate](IEnumerable`1 source, TAccumulate seed, Func`3 func)
   at Unify.Product.IdentityBroker.Adapter.ReflectChangesInner()
   at Unify.Product.IdentityBroker.Adapter.ReflectChanges()
   at Unify.Product.IdentityBroker.AdapterAuditingDecorator.ReflectChanges()
   at Unify.Product.IdentityBroker.AdapterNotifierDecorator.ReflectChanges()
   at Unify.Product.IdentityBroker.ReflectAdapterOnChangeDueJob.<runbase>b__9_0(IOperationalAdapter adapter)
---> (Inner Exception #0) System.ApplicationException: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)
   at System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32 millisecondsTimeout)
   at System.Threading.ReaderWriterLock.AcquireWriterLock(TimeSpan timeout)
   at Unify.Framework.Collections.ThreadsafeHashSet`1.Add(T item)
   at Unify.Product.IdentityBroker.EntityBase`3.SetValue[TValue](TKey key, TValue value)
   at Unify.Product.IdentityBroker.AttributeMapper.MapAttributeValues(IEntity leftSideEntity, IEntity rightSideEntity)
   at System.Linq.Parallel.PartitionedDataSource`1.ListContiguousIndexRangeEnumerator.MoveNext(T& currentElement, Int32& currentKey)
   at System.Linq.Parallel.PipelineSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Threading.Tasks.Task.Execute()<---

---> (Inner Exception #1) System.ApplicationException: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)
   at System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32 millisecondsTimeout)
   at System.Threading.ReaderWriterLock.AcquireWriterLock(TimeSpan timeout)
   at Unify.Framework.Collections.ThreadsafeCollectionBase`2.Remove(TValue item)
   at Unify.Framework.Data.LinqContextConversionWithUpdateBase`5.UpdateOnSubmitDelegate(TInterface obj)
   at Unify.Product.IdentityBroker.EntityBase`3.SetValue[TValue](TKey key, TValue value)
   at Unify.Product.IdentityBroker.AttributeMapper.MapAttributeValues(IEntity leftSideEntity, IEntity rightSideEntity)
   at System.Linq.Parallel.PartitionedDataSource`1.ListContiguousIndexRangeEnumerator.MoveNext(T& currentElement, Int32& currentKey)
   at System.Linq.Parallel.PipelineSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Threading.Tasks.Task.Execute()<---
",Normal</runbase></runbase>

During this time the CPU is near 100% and the processed entity counts do not seem to move - we have been stuck on Processed Entity Count = 30304, Pending Changes = 7654 since 4:30 am this morning.

Logs will be attached along with Extensibility files in the ticket comments.

Identity Broker version is 5.2.1 RTM

Broker console is presently unresponsive.

Additionally we understand there may be one or more duplicates present in the source data, and there is at least one adapter join which is not on a keyed column.

Urgent assistance is requested to identify and resolve issues preventing data load and completion of deployment (TEST and PROD in parallel) to allow customer UAT to commence.

In the TEST environment I performed essentially the same process, having first removed all the transformations on the adapter, and the same behaviour was evident - about 15k records generate changes just fine, then the Broker CPU hits 100% and it slows to a halt.

Personally, I don't believe the joins are the cause of the problem, it seems to be purely the number of records being loaded (and possible the large number of fields on the objects)

Under review

Hi Bob

First of all, do your environments meet the minimum required specs for Broker, specifically cpu and ram? Remember these values are for Broker only and don't include whatever may be required by other applications (MIM/SQL Server/etc) running on the same box.

Are the connectors able to import successfully? I'd recommend disabling all connectors and adapters and start by importing the connectors one by one. Once all connectors are imported, enable each adapter one by one, waiting for all changes to be processed before moving on to the next adapter.

If the performance of import or reflection degrades during the above, rebuild the Broker database indexes between steps.

The outcome of this process will help determine the next steps we should take.

Hi Beau

The minimum requirements are indeed being met - this is not the first time we've deployed to either environment and a previous broker config loaded successfully (V0.16) - although load times are not recorded, the load was eventually successful (approx 1 day, but not sure).  The Broker differences are outlined in the project OneNote here: Summary of changes since v0.17  (Web view).  Specifically with respect to RAM and CPU, we have 3 virtual CPUs on each Broker host server, and 16 GB RAM.  The MIM CPU usage is hovering at near 100% but SQL is not resource bound (CPU average approx 10% with 60% peak, although RAM is tighter at 83% usage of the available 8 Gb RAM).

Connectors all import successfully - disabling adapters is standard in Adrian's approach.  Observations are that once the adapter load commences the CPU doesn't hit 100% until about the 10-15K employee mark.  At the moment we have all adapters apart from the (problematic) employee adapter are disabled.

We are already in the habit of periodically rebuilding the indexes - and have done this several times (including this morning after we observed the counters had not moved since 4:30 AM, when we stopped the service, rebuild indexes and restarted again).  On each occasion fragmentation quickly deteriorates (within minutes) - at one point Adrian was rebuilding indexes interactively for Entity and EntityValue tables every 2 minutes (during the load) and this wasn't helping (i.e. we're still seeing timeouts reported).

I will go into greater analysis of the minimum requirements but I will give you a chance to digest the above first.  I believe that very soon we will reach the point where the next step is a shared desktop.  Alternatively is there anything else that you want me to check?

Thanks

Bob

P.S. Results of x-check against Broker minimum specs can be found here: Minimum Requirements X-Check  (Web view)

+1

Hi Bob,

Just to keep you in the loop, I've identified an issue in Broker where and exception being throw during reflection, particularly when processing a large number of changes simultaneously, can result in that set to changes to continuously be reprocessed every reflection cycle. Reflecting large number of changes is a processor heavy operation, and this on repeat is the cause on the 100% cpu usage that we're seeing.

I'm working on an interim solution that will allow this blockage to pass, with only a small number of changes around the exception-causing change/entity being blocked. Following that further improvements to reflection error handling can be made as needed, as well as looking into if the reflection process can be optimized at all.

I'm treating the timeout errors as being a symptom at this point, however that can be reassessed after the initial fix if they persist.

Thanks for the update Beau. Do you have an ETA for the interim solution.

+1

Will probably be a few hours, maybe till the end of today.

Hi Bob

I've have the solution to the issue 100% cpu load issue, the caveat being that reflection will still stop and be unable to progress until the error is resolved. The current design for reflection assumes changes are processed in sequence (a fact which slipped my mind when I described my plan previously). I'll attach the patch in a private comment. However, considering your desire to avoid untested code when discussing adding the empty string check to the SAP connector, would this even be usable? I'd consider this a riskier change than adding the check to the connector (from a product perspective, at least).

I've also got a recommendation for you regarding the performance loss you mentioned yesterday with DN templates with multiple parts. Broker calculates the hierarchical structure for an entity based on its DN and stores a reference to any containers in the database. When you configure a DN template like you have and all fields involved are largely unique, Broker will create a large number of container records (one for each entity if two DN components are configured). If you can, reconfigure your DN template to use the plus separator instead of the comma. Plus separated DN components are treated as a single component in the hierarchy so no pointless parent container will be generated.

Hi Bob, just to confirm, was there anything else outstanding for this topic? 

Hi Beau.

Thanks for preparing the patch, but while resolution is still outstanding, we have been able to employ an acceptable work-around (CN=@idBID DN format) to allow us to negotiate the issue with the production dataset which was preventing production deployment.  We are obviously happy to have a work-around for the moment which keeps our project on track.

The hold-up is now with the project and our ability to install, revert to the former DN format (still desirable - albeit with the + instead of , delimter as you suggest) and test this due to the stage we are at with deployment and testing, and our ability to simulate this condition with test data in DEV (Adrian can explain this).  As such we would like you to mark this as on hold pending the next development cycle which will be the first opportunity for us to revisit this patch.

The following summarises the expected behaviour of the patch if we were to reconfigure the adapter with the original DN format (or a suitable variation thereof):

  • Without this patch, the outcome would continue to be unacceptable (timeouts/memory bloat/cpu 100%, etc.);
  • With this patch, while the outcome would be improved (no timeouts/memory bloat/cpu 100%, etc) the anticipated adapter sync step failure may still not be acceptable either, given that this could be expected to fail the MIM import (ideally surfacing the root cause to MIM, but possibly not), resulting in a ticket being raised to investigate and resolve a problem for which resolution would be a data fix in the source.  History tells us that it is a bad idea to introduce a vulnerability of this nature into the design because it generally results in multiple BAU tickets of this nature being raised, and a rapid satisfaction drop with the system users.
  • A better outcome would definitely be to find a way of isolating the data issue and reporting it with the record but not fail the entire batch.  I realise that this may be difficult but should not be impossible.

Let's revisit this when we have a chance to retest.  In the meantime if you believe that the improvement provided by the patch is worth incorporating in the next version of the Broker, please update this ticket accordingly.

Thanks again.

Fixed

Thanks Bob

Going to mark this one as fixed rather than put it on hold because the bug in question has been fixed and will be included in the next v5.2 release. When you enter then next dev cycle, it can be reopened or a new topic created as necessary. A proper patch (not debug only) can be provided at that time as well.

Being able to isolate individual failing changes would be nice, however that would be a much larger piece of work and may be something we consider in the future.