0
Answered

Identity Broker dropping connector space

Eddie Kirkman 7 years ago in PowerShell connector updated by anonymous 7 years ago 11

The PS script for importing all users from 0365 sometimes errors with the following:

Import all entities from connector failed.
Import all entities from connector Office 365 Staff Licenses failed with reason An unexpected error occurred.. Duration: 00:00:06.8594919
Error details:
Microsoft.Online.Administration.Automation.MicrosoftOnlineException: An unexpected error occurred.
at Unify.Product.IdentityBroker.PowerShellConnector.<GetEntitiesInScript>d__a.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at Unify.Framework.Collections.ActionOnExceptionEnumerator`1.MoveNext()
at Unify.Framework.Collections.EnumerableExtensions.<ActionOnLast>d__19`1.MoveNext()
at Unify.Framework.Collections.EnumerableExtensions.<ProduceAutoPages>d__a`1.MoveNext()
at Unify.Framework.Visitor.ThreadsafeVisitorEvaluator`1.Visit()
at Unify.Product.IdentityBroker.RepositoryChangeDetectionWorkerBase.PerformChangeDetection(IEnumerable`1 connectorEntities)
at Unify.Product.IdentityBroker.ChangeDetectionImportAllJob.ImportAllChangeProcess()
at Unify.Product.IdentityBroker.ChangeDetectionImportAllJob.RunBase()
at Unify.Framework.DefinedScopeJobAuditTrailJobDecorator.Run()
at Unify.Product.IdentityBroker.ConnectorJobExecutor.<>c_DisplayClass29.<Run>b_27()
at Unify.Framework.AsynchronousJobExecutor.PerformJobCallback(Object state)

Original discussion with product team suggested that since that function returns IEnumerable result that if the connection were to drop part way through the call would not complete. Modified script to define an array for the result and populate that. Sometimes the array populates sometimes it fails. Normally failure stops the script, but sometimes it gets seen as an empty result and the 45000 users are wiped from the IdB connector - which flows deletes to FIM CS. Next successful or partially successful load puts them back and they rejoin, but this should not be happening.

The portion of the PS script that connects and gets users has been run as a standalone from the server and did not drop out or fail, but running from IdB seems to be consistently flakey.

I understand that this is more likely to be an MSOL or PS issue, but would appreciate any assistance around how to troubleshoot the unexpected errors or any suggestions for possible workarounds.


prodo365staff.ps1
staff.ps1
student.ps1
Unify.IdentityBroker.Entity.PowerShell.dll

Hi Eddie,

From memory I suggested the .ToArray() call so that we could avoid any potentially delays in processing and make sure the call completes as quickly as possible.

Now that it's not working, I'd recommend trying the following (all together, as they won't interfere with each other):

  • Remove the .ToArray();
  • Instead of calling foreach over the result set, construct a while (enumerable.MoveNext()) { enumerable.Current }, with error handling around each to know which call fails;
  • Code retries (e.g. limited to 3, timeout on failure) into enumerable.MoveNext() and enumerable.Current calls;
  • It's scary that the service is failing silently, so confirm that it isn't your script that is doing that
    • If it's MSOL that's failing silently add a counter that increases for each entity; throw an exception at the end of the script if a certain number isn't reached (either hardcoded, or some number based off $components.ContextEntities.Count())

Thanks.

Forgot to mention that you should add logging at as many points as feasible so that you can tell exactly where it's failing.

Further investigation has been undertaken with the following results:

  1. A standalone script, run in PowerShell using runas to open PS as the IdB service account from the same server as IdB is running, performing the first parts of the production script (get-msoluser -all into an array and counting the array) has been run.
    That script performs flawlessly, getting the correct number of users every time.
  2. The production script, using the production details and credentials and connecting to the production O365 staff tenancy has been deployed to the development IdB. That was left running and effectively ran successfully (getting the right number of users) for many days (I think the connection dropped maybe 2 or 3 times in a week).
  3. Further tests in Prod, with a threshold in place to retry the get-msoluser command if the number retrieved was under the threshold, deployed and improved success rate, but looking at the numbers, it appears that sometimes the command retrieves the number of users in the staff tenancy and other times it retrieves the number of users in the student tenancy, suggesting some sort of session contamination in the PowerShell/IdB space.
  4. The production student script has been deployed to dev IdB and immediately the same varying number/cross contamination was seen.

I would add something to the script to break the MSOL connection but there does not seem to be such a command. Other than that I might have to force PowerShell to open a new session that I can then close. Open to any other suggestions.

Hi Eddie Kirkman,

I have added a new unit test for the PowerShell components, but was unable to get anything to leak between different instances running concurrently. My next step is to try and reproduce in Identity Broker.

Thanks.

Hi Eddie Kirkman,

I have set up a test in the PowerShell connector (streaming a CSV), but was unable to get the variables to leak between the two connectors.

I have come across people suggesting that running in a new app domain might do it. I'm working on a version that can do this and will provide a dll to test once I have it working (I'll be unable to prove whether it's done anything though as I cannot reproduce).

Thanks.

Attached Unify.IdentityBroker.Entity.PowerShell.dll. It will allow you to test using a new app domain.

Instructions:

  1. Stop Identity Broker;
  2. Copy file to Identity Broker services directory;
  3. Unblock the dll (right click, unblock, apply);
  4. Start Identity Broker;
  5. Comment out the PowerShell script lines that return anything to Identity Broker (anything that uses $entities, $components, $logger, etc);
  6. Try to reproduce the error and see if the app domain change has worked.

To get it back to the original state, remove the dll and uncomment the PowerShell script lines.

Let me know how it goes, then we can decide what the plan is from there.

Thanks.

For the record, Microsoft have been aware of this defect since May/2012 and have still done nothing to address it through a disconnect command (https://community.office365.com/en-us/f/148/t/53335). As discussed some days ago, the only option provided by Microsoft is to use a new PowerShell session (which can be disconnected).

Have you had any luck reproducing the issue in an Azure environment?

Hey Eddie Kirkman
I've attached the revised full import scripts that should work correctly now. The issue was the $entities object was not being passed into the Invoke-Command block, and even when it was could not be used to create entities.

The solution was to collect entity values as a hashtable and return an array of entity hashtables. Then, outside Invoke-Command block, call $entities.Create() and populate from the return values.

I've run both connector imports in parallel several times and all imports are completed successfully. Check for yourself and let us know if there are any further problems
-Beau

Thanks heaps. I will leave it running in Azure for a bit and see if I can get it to break at all.

I have left the scripts (as provided by Beau) running and scheduled to run every 10 minutes (staggered but still overlapping) in my Azure lab and have not seen any unexpected results.
The only amendment I made to the scripts was to make the variable names in use in each script "tenancy specific" (i.e. instead of my admin user being $admin, I used $studadmin and $staffadmin).
I have deleted some users from my test staff tenant and saw the expected decrease in numbers in the IdB connector. I am adding another 20000 to my student tenant and seeing the connector number increasing as expected too.

The modified scripts were deployed to USC non-prod environment on 16 July and have run successfully over the weekend, talking to the production tenants.

I believe this issue is now successfully resolved