JSON generated manually works, but created through json.dumps does not work even if output seems to be exactly same

I am using the Marketo API through the Python Library marketo-rest-python. I can create Leads and also Update them through the following basic code: leads =...

Databricks job getting javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure when calling api running in Google Cloud

A spark job running as a Databricks job tries to access an external rest api via http and the following error occurs: ERROR ScalaDriverLocal: User Code Stack Trace:...

How can I download GeoMesa on Azure Databricks?

I am interested in performing Big Data Geospatial analysis on Apache Spark. My data is stored in Azure data lake, and I am restricted to use Azure Databricks. Is there anyway to download Geomesa...

Azure databricks: Installing maven libraries to cluster through API causes error (Library resolution failed. Cause: java.lang.RuntimeException)

I am trying to install some maven libraries to existing azure data bricks' cluster/newly created cluster through API from python. Cluster details: Python 3 5.5 LTS (includes Apache Spark 2.4.3,...

How to list Databricks scopes using Python when working on it secret API

I can create a scope. However, I want to be sure to create the scope only when it does not already exist. Also, I want to do the checking using Python? Is that doable? What I have found out is...

Databricks Job timed out with error : Lost executor 0 on [IP]. Remote RPC client disassociated

Complete error : Databricks Job timed out with error : Lost executor 0 on [IP]. Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs...

Error Connecting to Databricks from local machine

I am attempting to make a connection to Databricks from my Mac(Mojave). I did a pip install -U databricks-connect==5.5.* I start a spark-shell but when I try to query in spark I get the following...

Verification of the instance profile failed. AWS error: You are not authorized to perform this operation

I encountered that error message when I tried to add IAM role in Databricks to allow "access data from Databricks clusters without the need to manage, deploy, or rotate AWS keys." Here's the steps...

Giving View permission to Databricks jobs using CLI or API

I am creating a Databricks job using CLI. Is it possible to give View permission to my job to another user using Databricks CLI or API? If so, please provide details on how this can be done.

Azure Databricks API: import entire directory with notebooks

I need to import many notebooks (both Python and Scala) to Databricks using Databricks REST API 2.0 My source path (local machine) is ./db_code and destination (Databricks workspace) is...

How to get job/run level logs in Databricks?

Databricks only provides cluster level logs in the UI or in the API. Is there a way to configure spark or log4j in databricks such that we get run/job level logs?

Databricks CLI: SSLError, can't find local issuer certificate

I have installed and configured the Databricks CLI, but when I try using it I get an error indicating that it can't find a local issuer certificate: $ dbfs ls dbfs:/databricks/cluster_init/ Error:...

How to configure Databricks token inside Docker File

I have a docker file where I want to Download the Databricks CLI Configure the CLI by adding a host and token And then running a python file that hits the Databricks token I am able to install...

Accessing file from python code in databricks

I am trying to access a model file I had previously copied over via CLI by using the following code in a notebook at https://community.cloud.databricks.com/ with open("/dbfs/cat_encoder.joblib",...

Read Delta table from multiple folders

I'm working on a Databricks. I'm reading my delta table like this: path = "/root/data/foo/year=2021/" df = spark.read.format("delta").load(path) However within the year=2021 folder there are...

Spark: How to write bytes string to hdfs hadoop in pyspark for spark-xml transformation?

In python, bytes string can be simply saved to single xml file: with open('/home/user/file.xml' ,'wb') as f: f.write(b'<Value>1</Value>') Current output : /home/user/file.xml (file saved in...

How to get the cluster's JDBC/ODBC parameters programmatically?

Databricks documentation shows how get the cluster's hostname, port, HTTP path, and JDBC URL parameters from the JDBC/ODBC tab in the UI. See image: Is there a way to get the same information...

Read a Databricks table via Databricks api in Python?

Using Python-3, I am trying to compare an Excel (xlsx) sheet to an identical spark table in Databricks. I want to avoid doing the compare in Databricks. So I am looking for a way to read the spark...

Azure Databricks PAT token creation for Azure Service Principal Name

Could not able to add Azure AD Service Principal Name into Azure Datadatabricks through portal. Finally, I have added my Service Principal into the Azure Databricks with help of Databricks APIs...

Install conda package manually Databricks ML runtime

I have a Databricks ML Runtime cluster. I am trying to install fbprophet library using cluster init_script. I am following the example in the Databricks documentation. #!/bin/bash set -x ....

Parallel REST API request using Spark(Databricks)

I want to leverage Spark(It is running on Databricks and I am using PySpark) in order to send parallel requests towards a REST API. Right now I might face two scenarios: REST API 1: Returns data...

Spark Delta table restore to version

I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTable = DeltaTable.forPath(spark,...

Databricks Job API create job with single node cluster

I am trying to figure out why I get the following error, when I use the Databricks Job API. { "error_code": "INVALID_PARAMETER_VALUE", "message": "Cluster validation error: Missing required...

Get second last value in each row of dataframe, R

I am trying to get the second last value in each row of a data frame, meaning the first job a person has had. (Job1_latest is the most recent job and people had a different number of jobs in the...

Databricks API 2.0 - create secret scope in powershell using service principal credentials

I am trying to create a key vault backed secret scope in Azure databricks using a powershell script that runs during Azure DevOps deployment. It works fine when I run locally using my own...

How to access shared Google Drive files through Python?

I try to access shared Google Drive files through Python. I have created an OAuth 2.0 ClientID as well as the OAuth consent. I have copy-pasted this code:...

Airflow assume aws role using the dag python file

I'm developing an Airflow pipeline that triggers a spark Databricks Job once a sensor task finds a file _SUCCESS in a specific bucket path. The problem is Airflow doesn't have access directly to...

How to use Azure DataBricks Api to submit job?

I am a beginner in Azure Databricks and I want to use APIs to create cluster and submit job in python. I am stuck as I am unable to do so. Also if I have an existing cluster how will the code look...

Microsoft-Graph: Failing to get token from python code: Error SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED]

I need to call a web API. For that I need a bearer token. I am using databricks(python) code to first get authenticated over Microsoft AAD. Then get bearer token for my service_user. I Followed...

How to use dbutils in a SparkListener on Databricks

Using Azure Databricks Runtime 9.1, I want to start a SparkListener and access dbutils features inside of the SparkListener. This listener should log some information on the start of the Spark...