Loading Data from Google BigQuery into Spark (on Databricks)

I want to load data into Spark (on Databricks) from Google BigQuery. I notice that Databricks offers alot of support for Amazon S3 but not for Google. What is the best way to load data into Spark...

Connect to Blob storage "no credentials found for them in the configuration"

I'm working with Databricks notebook backed by spark cluster. Having trouble trying to connect to the Azure blob storage. I used this link and tried the section Access Azure Blob Storage Directly...

Is spark-snowflake connector is only available for databricks spark?

Using databricks spark, able to write the data into snowflake using spark-snowflake connector(spark-snowflake_2.11-2.3.0.jar, snowflake-jdbc-2.8.1.jar), not using JDBC connection. But without...

MLFlow Projects throw JSONDecode error when run

I'm trying to get MLFlow Projects to run using the MLFlow CLI and its following the tutorial leads to an error. For any project I try to run from the CLI, I get the following error Traceback...

Databricks notebook detaches in standard cluster mode

EDIT: Update. This happens regardless of the number of users. Even with one user, it still happens. The databricks notebook is repeatedly detaching while in use. Our data scientist comes from a...

How to deal with Databricks Bulk Insert Error to Azure DB

I'm trying to run a bulk insert using Scala & the Spark Connector via Azure Databricks. I'm getting closed connection errors from SQL Server. A portion of the data will pass through to the...

Connect AWS S3 to Databricks PySpark

I'm trying to connect and read all my csv files from s3 bucket with databricks pyspark. When I am using some bucket that I have admin access , it works without error data_path =...

Intermittent failures of a scheduled Spark Job on Databricks cluster after few runs

Current Setup - Azure Data Factory pipeline scheduled to run every 15 mins, run some Databricks notebooks on an always on interactive databricks cluster. Issue faced here is - This pipeline fails...

How can I connect Databricks Community Edition cluster from PyCharm

I want to work on some small exercise projects, I wish to use databricks cluster. Can this be done. I am hoping there is some way to connect databricks cluster through databricks-connect utility....

Error Connecting to Databricks from local machine

I am attempting to make a connection to Databricks from my Mac(Mojave). I did a pip install -U databricks-connect==5.5.* I start a spark-shell but when I try to query in spark I get the following...

Switching between Databricks Connect and local Spark environment

I am looking to use Databricks Connect for developing a pyspark pipeline. DBConnect is really awesome because I am able to run my code on the cluster where the actual data resides, so it's perfect...

Connecting OneDrive data to Azure Databricks

I have created an Azure Databricks cluster and would like to connect to a SharePoint folder to read and upload files. I cannot seem to find any solution to this. Please advise.

Connection refused on connecting to postgresql:dbserver db to Databricks via JDBC connection

I'm trying to connect to a postgresql database on my local machine from databricks using a JDBC connection. There are several useful posts in stackoverflow. I'm following the procedure mentioned...

Can I use Jupyter lab to interact with databricks spark cluster using Scala?

Can I use Jupyter lab to connect to a databricks spark cluster that is hosted remotely? There are KB articles about databricks connect, which allows a scala or java client-process to control a...

Databricks Secrets with Apache Spark SQL Connecting to Oracle

I'm using spark SQL to pull tables from an Oracle database, some of them fairly sizable, into Azure databricks as tables so I can run jobs on them and leave them visible for the team to use. I...

Setting data lake connection in cluster Spark Config for Azure Databricks

I'm trying to simplify notebook creation for developers/data scientists in my Azure Databricks workspace that connects to an Azure Data Lake Gen2 account. Right now, every notebook has this at the...

Reading parquet file from ADLS gen2 using service principal

I am using azure-storage-file-datalake package to connect with ADLS gen2 from azure.identity import ClientSecretCredential # service principal credential tenant_id = 'xxxxxxx' client_id =...

Can I have more than one connection in databricks-connect?

I have setup on my PC a miniconda python environment where I have installed the databricks-connect package and configured the tool with databricks-connect configure to connect to a databricks...

: java.sql.SQLException: No suitable driver when tryingt to run a Python Script on Databricks cluster using Databricks Connect

I am trying to run a Python Script from Visual Studio code on Databricks Cluster using Databricks connect. The jar files for the Apache Spark connector: SQL Server & Azure SQL have been installed...

How can I connect Jmeter with Databricks spark cluster

I want to connect Jmeter with Databricks (Spark Cluster) using JDBC connection associated with that spark Cluster I need to perform a concurrency test using Jmeter's JDBC request on a apache spark...

How do I use the Spark connector in DataBricks to do a bulk insert into SQL?

I have a dataframe in DataBricks which I am trying to bulk insert into SQL Server. I have followed this tutorial on Microsoft's website, specifically using this code: # df is created as a...

Databricks: Remote execution of non-spark code

Using databricks-connect, I am able to run spark-code on a cluster. The official documentation (https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect) also only mentions...

No module found error while using Databricks-connect

I have custom libraries that I have installed on my cluster using EGG files and also on my local machine. I use databricks connect to connect my IDE to my cluster. When I try importing the module,...

azure synapse: connecting to serverless sql pool from databricks - Failed to find data source: com.databricks.spark.sqldw

Im using synapse in azure. I have data in the serverless sql pool. I want to import that data to a dataframe in databricks. I am getting the following error: Py4JJavaError: An error occurred while...

How avoid user run drop delta table from hive metastore

I'm having problems trying to run Data object privileges in databricks. I have a configuration with a shared metastore between two azure databricks workspaces, to accomplish this I used another...

Unable to get metrics from PrometheusServlet on Databricks Spark 3.1.1

Trying to get prometheus metrics with grafana dashboard working for Databricks clusters on AWS but cannot seem to get connections on the ports as requried. I've tried a few different setups, but...

Using databricks-connect debugging a notebook that runs another notebook

I am able to connect to the Azure Databricks cluster from my Linux Centos VM, using visual studio code. Below code even works without any issue from pyspark.sql import SparkSession spark =...

How to use dbutils in a SparkListener on Databricks

Using Azure Databricks Runtime 9.1, I want to start a SparkListener and access dbutils features inside of the SparkListener. This listener should log some information on the start of the Spark...

Databricks Connect java.lang.ClassNotFoundException

I updated our databricks cluster to DBR 9.1 LTS on Azure Databricks, but a package I use regularly is giving me an error when I try to run it in VS Code with Databricks-connect, where it didn't...

Multiple jobs from a single action (Read, Transform, Write)

Currently using PySpark on Databricks Interactive Cluster (with Databricks-connect to submit jobs) and Snowflake as Input/Output data. My Spark application is supposed to read data from Snowflake,...