doesnt work with AAD #26

ravikd744 · 2020-07-27T18:26:11Z

Trying to use sql spark connector to connect to Azure SQL (single instance) from data bricks runtime (6.6) using Active Directory Password auth. I have uploaded adal library into the cluster.

import adal

dbname = "G_Test"
servername = "jdbc:sqlserver://" + "samplesql.database.windows.net:1433"
database_name = dbname
url = servername + ";" + "database_name=" + dbname + ";"
table_name = "dbo.cap"
aduser="user@msft.com"
adpwd="mypwd"
dfCountry = spark.read
.format("com.microsoft.sqlserver.jdbc.spark")
.option("url", url)
.option("dbtable", table_name)
.option("authentication", "ActiveDirectoryPassword")
.option("hostNameInCertificate", "*.database.windows.net")
.option("user", aduser)
.option("password", adpwd)
.option("encrypt", "true").load()

Getting Authentication Exception:
in
17 .option("user", aduser)
18 .option("password", adpwd)
---> 19 .option("encrypt", "true").load()
20
21

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
170 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
171 else:
--> 172 return self._df(self._jreader.load())
173
174 @SInCE(1.4)

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(

Py4JJavaError: An error occurred while calling o510.load.
: java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:3609)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:3580)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:3548)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:261)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:103)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$100(SQLServerConnection.java:82)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:55)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:311)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.microsoft.aad.adal4j.AuthenticationException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 36 more

rajmera3 · 2020-07-28T00:30:56Z

@ravikd744 Did you use the release jar on the repo? If so, please redownload the jar - it was just updated.

ravikd744 · 2020-07-29T16:53:28Z

Hi Rahul,

its the same error! :(

Would you be able to add a sample notebook into the databricks folder for Azure Single Instance using AAD?

thanks!

arvindshmicrosoft · 2020-07-29T19:00:53Z

@ravikd744 can you try also adding the com.microsoft.aad.adal4j library from Maven, detach and re-attach the notebook to your cluster, and retry your example? Looks like for AAD authentication using username/password, we need the underlying JAR for ADAL4j as well...

ravikd744 · 2020-07-29T19:53:58Z

Hi Arvind,

I added the adal4j and restarted the cluster, no difference :(

Should I be adding any specific import statements to make it work?

arvindshmicrosoft · 2020-07-29T20:02:25Z

@cchighman for advice

ravikd744 · 2020-07-29T21:08:47Z

Hi Everyone, there is a similar issue #28 reported on the old connector.
Azure/azure-sqldb-spark#28

I included the additional jars as suggested in that thread, but to no success.

I havent removed the existing mssql jars on the databricks, not sure if this connector requires or uses the mssql-jdbc-6.4.0.jre8.jar

cchighman · 2020-07-29T21:34:53Z

What version of Spark are you running ?

cchighman · 2020-07-29T21:42:55Z

Also, if you had installed libraries or versions previously on this cluster, I've found its not too resilient by just uninstalling an offending library and reinstalling a different version. I've had the best success spinning up a new cluster and installing the libraries of interest fresh.

I suggest giving that a shot, if for anything as a control, to test against the current cluster.

cchighman · 2020-07-29T21:45:17Z

@arvindshmicrosoft
It's possible in my tests I was working with a cluster that I had already ran the init script against. I wonder if it's expected we would need to run the same init script with this driver to account for driver differences with what ships on Databricks?

ravikd744 · 2020-07-29T21:47:43Z

Hi Chris, i am running databricks runtime 6.6 (Spark 2.4.5).

Sure, let me try with a new cluster.

Do you believe, i should remove the mssql jar and add it again? Does this connector rely on the mssql jar?

ravikd744 · 2020-07-29T21:55:23Z

I just tried doing pip install adal, and authenticate using a service principal. it works.

In addition, it works with no issues for sql auth, i guess because there is no ADAL involved.

I have issues only with the Active Directory Password. I saw in some thread, where they suggested it could be related to the logger class referenced in the jar. Does it give a clue?

cchighman · 2020-07-30T00:46:34Z

@ravikd744 @shivsood @arvindshmicrosoft @rajmera3
I re-tested again totally from scratch in a fresh subscription without issues. I have seen the issues you were describing but only when working on a cluster that has had past intrusive actions applied where intrusive actions are any action whereby a driver or library is changed or installed on the system which is compiled against a different version than that which is expected on the system.

Even when uninstalling, it doesn't necessarily fix itself. Your best bet is to work from a fresh cluster. If after following the steps below you still run into the issue, please post your driver logs and we'll see if we can take a closer look.

I created a new Azure AD tenant, removed all security defaults, a new serverless SQL server, and applied the associated Active Directory Admin:

I began with a fresh cluster:

Installed two libraries (not sure about the other two which come by default):

I executed a query in Databricks without any other requirements, except for creating a table.

Afterwards, I'm able to observe the login from Azure AD noting a JDBC driver login

The authentication details are CloudOnlyPassword

cchighman · 2020-07-30T00:58:32Z

Hi Rahul,

its the same error! :(

Would you be able to add a sample notebook into the databricks folder for Azure Single Instance using AAD?

thanks!
@ravikd744
The notebook in my above comment with screenshots is the same one from the samples folder under Databricks now.

cchighman · 2020-07-30T01:11:14Z

@ravikd744 @shivsood @arvindshmicrosoft @rajmera3

This confirms the requirements previously needed from the older Azure SQL connector for AAD auth only apply in that case because it was handling its own connection implementation with the driver directly. Because this connector is pass-thru to Spark, we are now relying on Spark's handling of the mssql JDBC driver versioning, which aligns us nicely since Spark is what is installed onto Databricks.

If you are coming from the previous Azure SQL Connector and have manually installed drivers onto that cluster for AAD compatibility, you will definitely need to remove those drivers. After removing those drivers, if you don't have the previous ones to revert to, it may render your cluster somewhat unusable for this purpose.

With this new connector, you should be able to simply install and go on a cluster using the default drivers (New or existing cluster that hasn't had its drivers modified) or a cluster which previously used modified drivers for the older Azure SQL connector provided the modified drivers were removed and the previous default drivers restored.

We may want to call this out in the README for those migrating.

ravikd744 · 2020-07-30T02:15:56Z

Hi Chris,

I tried the exact steps as you described. unfortunately, its not different.
I am going to try in my personal subscription to rule out any corporate proxy issues for the Azure AD auth. We have a federated auth (ADFS), i am suspecting that my DC could be rejecting the request with databricks as the resource.

thanks!

cchighman · 2020-07-30T02:17:43Z

Hi Chris,

I tried the exact steps as you described. unfortunately, its not different.
I am going to try in my personal subscription to rule out any corporate proxy issues for the Azure AD auth. We have a federated auth (ADFS), i am suspecting that my DC could be rejecting the request with databricks as the resource.

thanks!

Is the exception returned still "java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException" or is it something related to authentication failure?

cchighman · 2020-07-30T02:21:02Z

If its tied to auth failure, getting MFA disabled can be tricky at first. I disabled security defaults, added a one-time bypass, white-listed the cluster IP for skipping MFA requirement, and made sure Azure SQL server settings allowed connections from Azure resources. I also whitelisted the cluster host IP in Azure SQL Server firewall settings. Chances are you dont need to go through nearly this much, it's just not my area of expertise. Going to "Enterprise Applications" in Azure Active Directory and looking at the "Sign-in" attempts should show you failure scenarios that can help guide you towards getting auth working this way if its indeed an auth error at this point.

You'll find the request is actually from "Mobile Apps and Desktop Client" as the type from the screenshot I posted above. So, security defaults or conditional access policies may apply.

cchighman · 2020-07-30T02:22:41Z

@ravikd744 Logging into SSMS using Azure Active Directory was my control where if it worked there, it should absolutely work with the JDBC driver via Spark interface.

ravikd744 · 2020-07-30T11:18:19Z

Hi Chris, It’s still the same nockassdeffounderror on ADAL4j authentication exception. My credentials are fine for SSMS and I am an admin on the sql with appropriate vnet endpoints in place. Thanks!

On Wed, Jul 29, 2020 at 22:22 Christopher Highman ***@***.***> wrote: @ravikd744 <https://github.com/ravikd744> Logging into SSMS using Azure Active Directory was my control where if it worked there, it should absolutely work with the JDBC driver via Spark interface. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG32RL6CL6GFXH3OXLRNGLDR6DKP5ANCNFSM4PJCQMUA> .

-- Sent from Gmail Mobile

cchighman · 2020-07-30T11:32:43Z

@ravikd744

Just to confirm...

1.) You are installing adal by going to Libraries and choosing "PyPi".

2.) You are installing the SQL Spark Connector by going to Libraries and selecting the JAR from releases.

3.) This is a fresh cluster install.

If these are true, please provide the cluster logs for the driver.

ravikd744 · 2020-07-30T11:49:49Z

Yes both are true. For 2) I downloaded the jar to my local and uploaded it. If it makes a difference? I looked at the cluster logs and there is quite a bit to be redacted. Any easy way to get the logs secure?

On Thu, Jul 30, 2020 at 07:33 Christopher Highman ***@***.***> wrote: Just to confirm... 1.) You are installing adal by going to Libraries and choosing "PyPi". 2.) You are installing the SQL Spark Connector by going to Libraries and selecting the JAR from releases. 3.) This is a fresh cluster install. If these are true, please provide the cluster logs for the driver. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG32RL6THN6PTUIRO23HW4DR6FK6XANCNFSM4PJCQMUA> .

-- Sent from Gmail Mobile

cchighman · 2020-07-30T15:17:23Z

@ravikd744

This may be safer than pasting your entire logs. I would ideally like the initial startup and error logs if those are clean. At minimum, the below would be helpful under your "Environment" settings.

Go to your cluster "Spark Cluster UI - Master"

Click "Databricks Shell" under Application

Click "Environment" Tab

At minimum, please provide everything listed under ClassPath. If you can provide everything under Environment, more info is better.
Please provide everything you can under "Standard Error" for Driver Logs
Please try to provide the Cluster initialization under Standard and Log4j logs.
The initial startup should be clean of anything proprietary. I'm mainly just looking for what its doing when it starts from the above two logs.

Thanks!

ravikd744 · 2020-07-30T15:40:05Z

I tried to call the ADAL by passing the id and pwd, to get a token. Does it give a clue?

import adal

aduser="user@msft.com"
adpwd="mypwd"

authority_uri = "https://login.microsoftonline.com/c2e31356-f1e7-439e-a071-6a036aacbd93"
resource_uri = 'https://database.windows.net/'
client_id = 'f302da74-c20b-44a7-b1d2-ea0202574455'

context = adal.AuthenticationContext(authority_uri)
token = context.acquire_token_with_username_password(resource_uri,aduser,adpwd,client_id)
access_token = token["accessToken"]

There is a conditional access policy, that prevents non-interactive login. We need to find a way to support the databricks to pass the logged in user token to this jar.

Get Token request returned http error: 400 and server response: {"error":"interaction_required","error_description":"AADSTS53003: Access has been blocked by Conditional Access policies. The access policy does not allow token issuance.\r\nTrace ID: 7c721a5e-a7f9-4d38-bca5-43f9261c3200\r\nCorrelation ID: a7a3cc53-6c74-42a6-8ab3-8ceb51955bf1\r\nTimestamp: 2020-07-30 14:42:55Z","error_codes":[53003],"timestamp":"2020-07-30 14:42:55Z","trace_id":"7c721a5e-a7f9-4d38-bca5-43f9261c3200","correlation_id":"a7a3cc53-6c74-42a6-8ab3-8ceb51955bf1","error_uri":"https://login.microsoftonline.com/error?code=53003","suberror":"message_only","claims":"{\"access_token\":{\"capolids\":{\"essential\":true,\"values\":[\"b9ecb45a-5615-4391-8a51-4a7b1eea66e3\",\"9fda8814-c986-4662-bcf8-348795524012\",\"3632fb92-ec48-421b-a2b0-5bd734944321\",\"f241f958-369d-40ef-bc86-585429f5e5d4\",\"54ad26f7-cd6a-4654-bc80-d71dbb251918\"]}}}"}

cchighman · 2020-07-30T16:10:13Z

I tried to call the ADAL by passing the id and pwd, to get a token. Does it give a clue?

import adal

aduser="user@msft.com"
adpwd="mypwd"

authority_uri = "https://login.microsoftonline.com/c2e31356-f1e7-439e-a071-6a036aacbd93"
resource_uri = 'https://database.windows.net/'
client_id = 'f302da74-c20b-44a7-b1d2-ea0202574455'

context = adal.AuthenticationContext(authority_uri)
token = context.acquire_token_with_username_password(resource_uri,aduser,adpwd,client_id)
access_token = token["accessToken"]

There is a conditional access policy, that prevents non-interactive login. We need to find a way to support the databricks to pass the logged in user token to this jar.

Get Token request returned http error: 400 and server response: {"error":"interaction_required","error_description":"AADSTS53003: Access has been blocked by Conditional Access policies. The access policy does not allow token issuance.\r\nTrace ID: 7c721a5e-a7f9-4d38-bca5-43f9261c3200\r\nCorrelation ID: a7a3cc53-6c74-42a6-8ab3-8ceb51955bf1\r\nTimestamp: 2020-07-30 14:42:55Z","error_codes":[53003],"timestamp":"2020-07-30 14:42:55Z","trace_id":"7c721a5e-a7f9-4d38-bca5-43f9261c3200","correlation_id":"a7a3cc53-6c74-42a6-8ab3-8ceb51955bf1","error_uri":"https://login.microsoftonline.com/error?code=53003","suberror":"message_only","claims":"{"access_token":{"capolids":{"essential":true,"values":["b9ecb45a-5615-4391-8a51-4a7b1eea66e3","9fda8814-c986-4662-bcf8-348795524012","3632fb92-ec48-421b-a2b0-5bd734944321","f241f958-369d-40ef-bc86-585429f5e5d4","54ad26f7-cd6a-4654-bc80-d71dbb251918"]}}}"}

@ravikd744
I mentioned above in my setup that I had to disable Active Directory Admin "Security Defaults". If this is already done, it sounds like there is an Active Directory Conditional Access policy in play. Above, I shared the "type" by which a conditional access policy would apply. Your active directory admin can see this failure in the "Enterprise Apps" sign-on details view within Azure Active Directory.

If you are an Active Directory Admin, I would read the docs on "Conditional Access Policies". Otherwise, I would reach out to whomever supports this for you and request the policy to be adjusted.

For ActiveDirectoryPassword auth, if you are still getting NoClassDefFound, I would need the before mentioned items in order to help further.

You do have to go a bit out of your way just to get Active Directory to accept password authentication. It may be worth just using a Service Principal.

arvindshmicrosoft · 2020-07-30T16:22:57Z

@ravikd744 I second @cchighman in recommending to stay with Service Principal. Since you had mentioned that SPN / secret authentication worked for you, I think that is the pragmatic way to go.

ravikd744 · 2020-07-30T16:32:48Z

I agree @cchighman @arvindshmicrosoft , i will continue with service principal as a tactical solution.
Will investigate with active directory admin on the conditional access and get back to you guys. unfortunately, there is some rbac to be respected (which schema to view, read only or read-write, masking of specific columns) on the underlying data, which we cannot impose using a standard service principal and we wouldnt be able to let the endusers manage different service principal credentials.

thank you so much and really appreciate your efforts and time for investigating this long. hopefully, i can help to provide some settings for users with similar conditional access infrastructure.

thereverand · 2020-08-11T15:12:50Z

This issue exists because this package packs in a SQL Server driver, but not ADAL4J. So when the driver jar is loaded and then needs ADAL4J, it expects to find it in the same place, and it fails. There is a simple set of steps to fix this issue if youre stuck on it, but this dependency problem should have a top down fix :)

Steps to fix the issue:

Add a global or cluster init script to remove old versions of the mssql driver from the /databricks/jars folder, or add this line to an existing script: rm /databricks/jars/*mssql*
Add the adal4j and mssql packages, I used Maven, but any way should work. DO NOT install the SQL spark connector this way.
Add the driver class to your connection configuration:

connectionProperties = {
  "Driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}

This works, repeatably. :)

ravikd744 · 2020-08-12T17:27:10Z

Hi @cchighman, @arvindshmicrosoft ,

Is it possible to support the Service principal authentication within the spark connector with a new authentication type?

The service principal authentication is currently like :
dfCountry = spark.read
.format("com.microsoft.sqlserver.jdbc.spark")
.option("url", servername)
.option("databaseName", dbname)
.option("dbtable", table_name)
.option("hostNameInCertificate", "*.database.windows.net")
.option("trustServerCertificate", "true")
.option("accessToken",access_token)
.option("encrypt", "true").load()

We have to install the Python ADAL library and import ADAL modules to generate the access token to append the access token to the connector. It will be cleaner, if the connector takes 3 parameters and use its own MSAL library to get the access token, something like,
.option("Authentication Type","Service Principal with Secret")
.option("client_id",client_id)
.option ("secret",client_secret)

Any suggestions?

thanks!

ravikd744 · 2020-08-12T17:30:49Z

This issue exists because this package packs in a SQL Server driver, but not ADAL4J. So when the driver jar is loaded and then needs ADAL4J, it expects to find it in the same place, and it fails. There is a simple set of steps to fix this issue if youre stuck on it, but this dependency problem should have a top down fix :)

Steps to fix the issue:

Add a global or cluster init script to remove old versions of the mssql driver from the /databricks/jars folder, or add this line to an existing script: rm /databricks/jars/*mssql*

Add the adal4j and mssql packages, I used Maven, but any way should work. DO NOT install the SQL spark connector this way.

Add the driver class to your connection configuration:
connectionProperties = {
  "Driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
This works, repeatably. :)

Hi, @thereverand I am raising a Microsoft Support Ticket, hopefully they can remove the jdbc driver from the base image used in databricks altogether or they package the ADAL along with the sql driver. But, this would be the regular jdbc driver, not sure if @cchighman plan to package the spark connector to the base image in the future.

thanks!

cchighman · 2020-08-12T17:36:03Z

Because the Databricks installation happens to install with a copy of the MS SQL JDBC driver and Spark also natively supports JDBC, we don't have a need to maintain a separate connector which is aware JDBC exists. The previous connector had its own implementation but it was redundant. Everything you specify as an "option" gets passed directly to MSSQL's JDBC driver via Spark.

I wasn't able to identify a problem you're running into at the moment, @ravikd744, outside of wondering whether it would be possible to use less options with the query. The documentation on the README does a pretty good job of sharing the pass-thrus to JDBC. In the PR for AAD Support, I also pasted a pretty exhaustive list of items I found as well.

If you are still running into a problem getting AAD support to work, please let me know.

Christopher

ravikd744 · 2020-08-12T17:52:09Z

Hi @cchighman , i have an issue unrelated to the connector, my databricks workspace does not have a network route to go out to PyPi repo, so i had to manually download the wheels related to adal and PyJWT used by the adal library.

The behavior is inconsistent, sometimes databricks when loading the adal lib, tries to go out to pypi repo to download the PyJWT lib even though the wheel is available in the DBFS along with the adal lib. I was hoping if the adal within the connector reaches out to AAD for the token, then we dont need to get a Py ADAL context and manage the token.

I am not sure if i am explaining it correctly.

idc101 · 2020-08-26T00:05:53Z

I am not running on DataBricks but experienced the same issue:
java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException

In my case it was due to hadoop including a very old version of the mssql driver. To fix it:

rm $HADOOP_HOME/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar

This is probably the same jar DataBricks includes by default which is why rm /databricks/jars/*mssql* fixes it.

rajmera3 · 2020-08-28T16:31:36Z

Solved. See @thereverand and @idc101 answers

amk-cgill · 2020-09-01T21:38:46Z

@idc101 Can I ask how you are running that rm command? I've got this exactly issue yet I'm not quite sure how to remove the old versions of the mssql driver from the cluster.

arvindshmicrosoft · 2020-09-02T18:21:03Z

@idc101 Can I ask how you are running that rm command? I've got this exactly issue yet I'm not quite sure how to remove the old versions of the mssql driver from the cluster.

In the case of Azure Databricks, you can use Init Scripts to run this.

idc101 · 2020-09-02T22:51:50Z

@amk-cgill In my case I am running Spark on Kubernetes so in my Dockerfile:

RUN rm $HADOOP_HOME/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar

However in general you need to find that jar in your spark/hadoop distribution and delete it somehow. As @arvindshmicrosoft says for DataBricks this is best done in an Init Script.

thereverand · 2020-09-03T00:08:17Z

Ultimately, the driver is expecting to find ADAL4j in the same folder as its in, and can't. The solution seems simple: bundle ADAL4j like you do the driver. Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: Iain Cardnell <notifications@github.com> Sent: Wednesday, September 2, 2020 6:52:03 PM To: microsoft/sql-spark-connector <sql-spark-connector@noreply.github.com> Cc: Nick Daniels <nickdaniels@outlook.com>; Mention <mention@noreply.github.com> Subject: Re: [microsoft/sql-spark-connector] doesnt work with AAD (#26) @amk-cgill<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famk-cgill&data=02%7C01%7C%7Cd36247ec8b314c0fe4b108d84f92d016%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637346839252369562&sdata=iT7aFBzKPh7TIl%2FHydgB7KNKT1ztc3cNt%2Br8Dmt9m5I%3D&reserved=0> In my case I am running Spark on Kubernetes so in my Dockerfile: RUN rm $HADOOP_HOME/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar However in general you need to find that jar in your spark/hadoop distribution and delete it somehow. As @arvindshmicrosoft<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farvindshmicrosoft&data=02%7C01%7C%7Cd36247ec8b314c0fe4b108d84f92d016%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637346839252379558&sdata=Hk0DJq22UAFzkz1jWHqvePZ7V4Y4rtKbrshgCd5WIqk%3D&reserved=0> says for DataBricks this is best done in an Init Script. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fsql-spark-connector%2Fissues%2F26%23issuecomment-686069048&data=02%7C01%7C%7Cd36247ec8b314c0fe4b108d84f92d016%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637346839252379558&sdata=t%2B446VDi9ZH8azA3Xvc1tVixyqVnN4vTA4sDYEaK8No%3D&reserved=0>, or unsubscribe<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAFF3ESHUDCBP6BKGRNDA2DSD3EBHANCNFSM4PJCQMUA&data=02%7C01%7C%7Cd36247ec8b314c0fe4b108d84f92d016%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637346839252389554&sdata=3p9zrZQhEUQxR64DFd379BWkadG%2BIHDGs0L1hJ1KeQc%3D&reserved=0>.

amk-cgill · 2020-09-03T00:22:58Z

Putting the solution here so nobody else spends half a week of their time trying to fix what is really a lack of documentation on Microsoft's part.

High level steps:

Create a new cluster and let it start. The cluster must be of a Databricks runtime version of > 7 per the docs here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases-azure
Create or Attach a workbook to the cluster. Do not install any Libraries yet.
In the workbook, add a new entry with the following:

dbutils.fs.put("/databricks/init/scripts/sqlaadauth.sh",
"""#!/bin/bash
rm /databricks/jars/*mssql*
sleep 10s
wget https://repo1.maven.org/maven2/com/microsoft/azure/adal4j/1.6.4/adal4j-1.6.4.jar -O /databricks/jars/adal4j-1.6.4.jar
wget https://repo1.maven.org/maven2/com/nimbusds/oauth2-oidc-sdk/6.5/oauth2-oidc-sdk-6.5.jar -O /databricks/jars/oauth2-oidc-sdk-6.5.jar
wget https://repo1.maven.org/maven2/com/google/code/gson/gson/2.8.0/gson-2.8.0.jar -O /databricks/jars/gson-2.8.0.jar
wget https://repo1.maven.org/maven2/net/minidev/json-smart/1.3.1/json-smart-1.3.1.jar -O /databricks/jars/json-smart-1.3.1.jar
wget https://repo1.maven.org/maven2/com/nimbusds/nimbus-jose-jwt/8.2.1/nimbus-jose-jwt-8.2.1.jar -O /databricks/jars/nimbus-jose-jwt-8.2.1.jar
wget https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.21/slf4j-api-1.7.21.jar -O /databricks/jars/slf4j-api-1.7.21.jar
wget https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar -O /databricks/jars/mssql-jdbc-6.4.0.jre8.jar""")

Add a new cluster init script with the contents: dbfs:/databricks/init/scripts/sqlaadauth.sh
Restart the cluster to let the init script do it's thing, cluster should show 'Running' before proceeding
Go to the cluster Libraries page and install the following package from the Maven library: com.microsoft.azure:azure-sqldb-spark:1.0.2
Restart the cluster again (may not be necessary)
Done

A snippet of code to verify with:

%scala
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._

%scala
val config = Config(Map(
  "url" -> "dbName.database.windows.net",
  "databaseName" -> "databaseName",
  "user"  -> "***********",
  "password" -> "***************",
  "dbTable" -> "dbo.tmp",
  "Authentication" -> "ActiveDirectoryPassword",
  "trustServerCertificate" -> "true"
))

val collection = sqlContext.read.sqlDB(config)
collection.show(3)

It's unclear if adding the libraries by hand would fix this, but I somewhat doubt it.

shivsood added the help wanted Extra attention is needed label Jul 29, 2020

cchighman mentioned this issue Jul 30, 2020

java.lang.NoClassDefFoundError Error #31

Closed

rajmera3 removed the help wanted Extra attention is needed label Aug 28, 2020

rajmera3 closed this as completed Aug 28, 2020

vishalwadhia mentioned this issue Aug 18, 2021

AAD authentication error MFA #139

Closed

dinobm mentioned this issue Feb 27, 2023

Connection Error: Only when using AD Password Mode #215

Open

doesnt work with AAD #26

doesnt work with AAD #26

Comments

ravikd744 commented Jul 27, 2020 • edited

rajmera3 commented Jul 28, 2020

ravikd744 commented Jul 29, 2020

arvindshmicrosoft commented Jul 29, 2020

ravikd744 commented Jul 29, 2020

arvindshmicrosoft commented Jul 29, 2020

ravikd744 commented Jul 29, 2020

cchighman commented Jul 29, 2020

cchighman commented Jul 29, 2020

cchighman commented Jul 29, 2020

ravikd744 commented Jul 29, 2020

ravikd744 commented Jul 29, 2020

cchighman commented Jul 30, 2020 • edited

I created a new Azure AD tenant, removed all security defaults, a new serverless SQL server, and applied the associated Active Directory Admin:

I began with a fresh cluster:

Installed two libraries (not sure about the other two which come by default):

I executed a query in Databricks without any other requirements, except for creating a table.

** Afterwards, I'm able to observe the login from Azure AD noting a JDBC driver login **

** The authentication details are CloudOnlyPassword **

cchighman commented Jul 30, 2020 • edited

cchighman commented Jul 30, 2020 • edited

ravikd744 commented Jul 30, 2020

cchighman commented Jul 30, 2020

cchighman commented Jul 30, 2020 • edited

cchighman commented Jul 30, 2020

ravikd744 commented Jul 30, 2020 via email

cchighman commented Jul 30, 2020 • edited

ravikd744 commented Jul 30, 2020 via email

cchighman commented Jul 30, 2020 • edited

Go to your cluster "Spark Cluster UI - Master"

Click "Databricks Shell" under Application

Click "Environment" Tab

ravikd744 commented Jul 30, 2020

cchighman commented Jul 30, 2020

arvindshmicrosoft commented Jul 30, 2020

ravikd744 commented Jul 30, 2020

thereverand commented Aug 11, 2020

ravikd744 commented Aug 12, 2020

ravikd744 commented Aug 12, 2020

cchighman commented Aug 12, 2020

ravikd744 commented Aug 12, 2020

idc101 commented Aug 26, 2020

rajmera3 commented Aug 28, 2020

amk-cgill commented Sep 1, 2020

arvindshmicrosoft commented Sep 2, 2020

idc101 commented Sep 2, 2020

thereverand commented Sep 3, 2020 via email

amk-cgill commented Sep 3, 2020

ravikd744 commented Jul 27, 2020 •

edited

cchighman commented Jul 30, 2020 •

edited

Afterwards, I'm able to observe the login from Azure AD noting a JDBC driver login

The authentication details are CloudOnlyPassword

cchighman commented Jul 30, 2020 •

edited

cchighman commented Jul 30, 2020 •

edited

cchighman commented Jul 30, 2020 •

edited

cchighman commented Jul 30, 2020 •

edited

cchighman commented Jul 30, 2020 •

edited