Azure databricks spark - write to blob storage

I have a data frame with two columns - filepath (wasbs file path for blobs), string and want to write each string to a seperate blob with that file name. How can i do this?

Databricks: How do I get path of current notebook?

Databricks is smart and all, but how do you identify the path of your current notebook? The guide on the website does not help. It suggests: %scala dbutils.notebook.getContext.notebookPath res1:...

How to fix 'command not found' error in Databricks when creating a secret scope

I am trying to create a secret scope in a Databricks notebook. The notebook is running using a cluster created by my company's admin - I don't have access to create or edit clusters. I'm following...

How to map the coefficient obtained from logistic regression model to the feature names in pyspark

I built a logistic regression model using a pipeline flow to the one listed by databricks. https://docs.databricks.com/spark/latest/mllib/binary-classification-mllib-pipelines.html the features...

How to get the path of the Databricks Notebook dynamically?

Please don't give solution for IPython/Jupyter notebooks. The technology is different. I want to get the path of my Databricks notebook dynamically. Which is something I can get from the UI "Copy...

Can't read .xlsx file on Azure Databricks

I'm on Azure databricks notebooks using Python, and I'm having trouble reading an excel file and putting it in a spark dataframe. I saw that there were topics of the same problems, but they don't...

Connect AWS S3 to Databricks PySpark

I'm trying to connect and read all my csv files from s3 bucket with databricks pyspark. When I am using some bucket that I have admin access , it works without error data_path =...

Databricks drop a delta table?

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or...

How to list Databricks scopes using Python when working on it secret API

I can create a scope. However, I want to be sure to create the scope only when it does not already exist. Also, I want to do the checking using Python? Is that doable? What I have found out is...

How to get the all the table columns at a time in the azure databricks database

I need all the table columns at a time which present in the particular DB in Azure Data bricks. I know the approach to find the sql server by using the following query. I need same kind of...

PySpark and Protobuf Deserialization UDF Problem

I'm getting this error Can't pickle <class 'google.protobuf.pyext._message.CMessage'>: it's not found as google.protobuf.pyext._message.CMessage when I try to create a UDF in PySpark....

Switching between Databricks Connect and local Spark environment

I am looking to use Databricks Connect for developing a pyspark pipeline. DBConnect is really awesome because I am able to run my code on the cluster where the actual data resides, so it's perfect...

install python packages using init scripts in a databricks cluster

I have installed the databricks cli tool by running the following command pip install databricks-cli using the appropriate version of pip for your Python installation. If you are using Python 3,...

No html webpage shown from tensoflow data validation visualize_statistics() when run from databricks notebook

I am trying to use tensorflow (2.2) data validation (TFDV version: 0.22.2) to visualize data on databricks GPU cluster. From databricks notebook, I am running the code at...

Optimizing Merge in Delta Lake (Databricks Open Source )

I am trying to implement merge using delta lake oss and my history data is around 7 billions records and delta is around 5 millions. The merge is based on the composite key(5 columns). I am...

NLTK is called and got error of "punkt" not found on databricks pyspark

I would like to call NLTK to do some NLP on databricks by pyspark. I have installed NLTK from the library tab of databricks. It should be accessible from all nodes. My py3 code : import...

Fetching username inside notebook in Databricks on high concurrency cluster?

While trying to fetch user data on high concurrency cluster, I am facing this issue. I am using the command below to fetch the user...

AADToken: HTTP connection to https://login.microsoftonline.com/<tenantID>/oauth2/token failed for getting token from AzureAD

I want to get access to Azure Data Lake Storage Gen2 from Azure Databricks Cluster - SCALA version, via mount point in filesystem. I tried the following code where azure service principal...

list the files of a directory and subdirectory recursively in Databricks(DBFS)

Using python/dbutils, how to display the files of the current directory & subdirectory recursively in Databricks file system(DBFS).

install jar file in dbfs and mvn packages using init script

I have few Jar files/packages in the DBFS and I want an init script (so that I can place that in the automated cluster) to install the Jar package everytime the cluster starts. I also want to...

AssertionError: assertion failed: No plan for DeleteFromTable In Databricks

Is there any reason this command works well: %sql SELECT * FROM Azure.Reservations WHERE timestamp > '2021-04-02' returning 2 rows, while the below: %sql DELETE FROM Azure.Reservations WHERE...

Databricks SQL analytics

am trying to do this tutorial about databricks sql analytics (https://docs.microsoft.com/en-us/azure/databricks/sql/get-started/admin-quickstart) but when i create my databricks workspace i do not...

Standard Scaling it taking too much time in pyspark dataframe

I've tried standard scaler from spark.ml with the following function: def standard_scale_2(df, columns_to_scale): """ Args: df : spark dataframe columns_to_scale : list of columns...

How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage

I've seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The...

Apache Spark in Azure Synapse 'overwrite' method Function not working

I have a nice function let allows me to overwrite and rename a file when I save the results of query to ADLS, see following from pyspark.sql import SparkSession spark =...

How to install libs for R arrow package on ubuntu without internet?

I am working on Azure databricks and it's compute server is Ubuntu 18.04. I want to install arrow R package but without internet access because of security reasons. I downloaded arrow tar file on...

How to pass the script path to %run magic command as a variable in databricks notebook?

I want to run a notebook in datarbricks from another notebook using %run. Also I want to be able to send the path of the notebook that I'm running to the main notebook as a parameter. The reason...

pyspark delta-lake metastore

Using "spark.sql.warehouse.dir" in the same jupyter session (no databricks) works. But after a kernel restart in jupyter the catalog db and tables arent't recognized anymore. Isn't it possible...

Databricks - is not empty but it's not a Delta table

I run a query on Databricks: DROP TABLE IF EXISTS dublicates_hotels; CREATE TABLE IF NOT EXISTS dublicates_hotels ... I'm trying to understand why I receive the following error: Error in SQL...

How to use dbutils in a SparkListener on Databricks

Using Azure Databricks Runtime 9.1, I want to start a SparkListener and access dbutils features inside of the SparkListener. This listener should log some information on the start of the Spark...