0
Under review

Gateway was unable to be started due to One or more errors occurred.

Liam Schulz 1 year ago in UNIFYBroker Service updated by Matthew Davis (Technical Product Manager) 1 year ago 3

Hi,

We have seen across multiple Broker instances that the following error occurs for LDAP gateways:
"The gateway <gateway name> (guid) was unable to be started due to One or more errors occurred."

Unfortunately there doesn't seem to be much more information that what is provided in the log. Examination of the log file further with CMTrace doesn't reveal anymore information.

In one particular affected customer's case, I checked the Azure Provisioning service to see if there was any significant event that may have caused this, but could not find anything there either.

The workaround is to Recycle the gateway, but this currently relies on manual checking to see if it has occurred or not and this appears to be happening on a frequent basis. We would like to address the root cause issue if possible.

Is there additional logging levels that could be applied to find out what could be causing this?

Thanks,
Liam

Under review

Hi Liam,

Is there circumstances under which this is happening? For example, following a restart of the service or modification of the configuration? There's only certain circumstances under which a gateway should be attempted to be started - normally they're started upon creation or service start, and persisted for the duration of the broker service. 

Is there some consistency with which it's happening in terms of environments? (For example, is it happening in ALL environments, or just Production?) Is it also just happening for the LDAP gateway, or also for other gateways? 

JSON logs (such as Log Analytics) should contain the full stack trace of the exception, unfortunately this message currently doesn't give much information to the base CSV log writer. Is there another log writer, or a chance to configure another log writer, to capture some more information?

Hi Matt,

To clarify, this is happening with SCIM gateways to Azure Applications.

I was able to find the following stack trace in JSON logs for one customer:

System.AggregateException:
One or more errors occurred. --->
System.Net.Http.HttpRequestException: An error occurred while sending
the request. ---> System.Net.WebException: The underlying connection
was closed: An unexpected error occurred on a send. --->
System.IO.IOException: Unable to read data from the transport
connection: An existing connection was forcibly closed by the remote
host. ---> System.Net.Sockets.SocketException: An existing connection
was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
--- End of inner exception stack trace ---
at System.Net.TlsStream.EndWrite(IAsyncResult asyncResult)
at System.Net.ConnectStream.WriteHeadersCallback(IAsyncResult ar)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar)
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)

at
Microsoft.Owin.Security.ActiveDirectory.WsFedMetadataRetriever.GetSigningKeys(String
metadataEndpoint, TimeSpan backchannelTimeout, HttpMessageHandler
backchannelHttpHandler)
at Microsoft.Owin.Security.ActiveDirectory.WsFedCachingSecurityTokenProvider.RetrieveMetadata()

at
Microsoft.Owin.Security.ActiveDirectory.WsFedCachingSecurityTokenProvider..ctor(String
metadataEndpoint, ICertificateValidator
backchannelCertificateValidator, TimeSpan backchannelTimeout,
HttpMessageHandler backchannelHttpHandler)
at
Owin.WindowsAzureActiveDirectoryBearerAuthenticationExtensions.UseWindowsAzureActiveDirectoryBearerAuthentication(IAppBuilder
app, WindowsAzureActiveDirectoryBearerAuthenticationOptions options)

at
Microsoft.SystemForCrossDomainIdentityManagement.WebApplicationStarter.ConfigureApplication(IAppBuilder
applicationBuilder)
at Microsoft.Owin.Hosting.Engine.HostingEngine.Start(StartContext context)
at Microsoft.SystemForCrossDomainIdentityManagement.Service.Start(Uri baseAddress)
at Unify.Product.IdentityBroker.SCIMGateway.StartGateway()
at Unify.Product.IdentityBroker.GatewayBase.Start()
at Unify.Product.IdentityBroker.GatewayNotifierDecorator.Start()
--->
(Inner Exception #0) System.Net.Http.HttpRequestException: An error
occurred while sending the request. ---> System.Net.WebException: The
underlying connection was closed: An unexpected error occurred on a
send. ---> System.IO.IOException: Unable to read data from the
transport connection: An existing connection was forcibly closed by the
remote host. ---> System.Net.Sockets.SocketException: An existing
connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
--- End of inner exception stack trace ---
at System.Net.TlsStream.EndWrite(IAsyncResult asyncResult)
at System.Net.ConnectStream.WriteHeadersCallback(IAsyncResult ar)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar)
--- End of inner exception stack trace ---<---

This suggests to me that the Gateway connection drops, but then Broker is unable to restore it immediately.

Thanks for the clarification Liam - those logs are helpful.

It looks like the Broker service is restarting itself, and upon restart the SCIM gateway will attempt to reach out to the AAD service to get metadata for bearer token auth validation. It could be failing due to the fact that the service has just restarted (and underlying connectivity is still being established), as I can see in the logs that sometimes it's one of the gateways, and sometimes both.

We'll investigate whether there's some more resiliency we can add to this startup procedure, as it's housed in a library provided by Microsoft for SCIM functionality so it might not be something we can control completely. 

I'll also loop in David (but suggest you reach out to him as well), as this issue is only occurring because something is restarting the service automatically. The gateways would normally only restart under certain conditions (service start, gateway config change etc) so isn't a normal operation for them.