Apache Spark + Delta Lake concepts

I have many doubts related to Spark + Delta. 1) Databricks propose 3 layers (bronze, silver, gold), but in which layer is recommendable to use for Machine Learning and why? I suppose they propose...

Delta Lake independent of Apache Spark?

I have been exploring the data lakehouse concept and Delta Lake. Some of its features seem really interesting. Right there on the project home page https://delta.io/ there is a diagram showing...

spark "delta" source not found

While using kafka and delta_core dependencies in a spark project I'm receiving the next warning: ``` [WARNING] delta-core_2.12-0.7.0.jar, spark-sql-kafka-0-10_2.12-3.1.1.jar define 1 overlapping...

Spark Maven dependency incompatibility between delta-core and spark-avro

I'm trying to add delta-core to my scala Spark project, running 2.4.4. A weird behaviour I'm seeing is that it seems to be in conflict with spark avro. Maven build succeeds, but during runtime I'm...

Write Spark dataframe into delta lake

I am trying to convert Spark data frame into delta format using the example code provided by documentation but always getting this strange error. Can you please help or...

Spark Delta table restore to version

I am trying to restore a delta table to its previous version via spark java , am using local ide .code is as below import io.delta.tables.*; DeltaTable deltaTable = DeltaTable.forPath(spark,...

Error reading delta file from spark structured streaming

we use spark structured streaming with Spark 2.2. at some point the streaming crashes and when it starts it tries reading from checkppoint and fails: java.lang.IllegalStateException: Error reading...

Partition pruning on Spark delta lake merge

I'm using delta lake ("io.delta" %% "delta-core" % "0.4.0") and merge in foreachBatch like: foreachBatch { (s, batchid) => deltaTable.alias("t") .merge( ...

Spark Delta Table Updates

I am working in Microsoft Azure Databricks environment using sparksql and pyspark. So I have a delta table on a lake where data is partitioned by say, file_date. Every partition contains files...

ERROR while Create Delta Table with Spark-Sql

I am using - Scala 2.11.8, Spark - 2.4.4, Delta - 0.4.0 My Usage => val deltaQuery = """CREATE TABLE <SCHEMA_NAME>.<TABLE_NAME> ( abc String, pqr...

Windowed lag/delta with Spark Structured Streaming

First of all, I'm pretty new to spark, so apologies if I'm missing the obvious! I'm developing a POC using Spark, which consumes a stream of data from Apache Kafka. My first goal was general...

How to Convert Parquet to Spark Delta Lake?

I was trying to convert a set of parquet files into delta format in-place. I tried using the CONVERT command as mentioned in the Databricks documentation....

how to read .delta files in spark checkpoint

In spark checkpoint directory, exists .delta files which record the intermediate status of the streaming. How can I read the content of it? When open .delta files as text(utf-8) by force, it...

Spark Delta format on non-Databricks platforms

> To improve query speed, Delta Lake on Databricks supports the ability > to optimize the layout of data stored in cloud storage. Delta Lake on > Databricks supports two layout algorithms:...

How to CREATE TABLE USING delta with Spark 2.4.4?

This is Spark 2.4.4 and Delta Lake 0.5.0. I'm trying to create a table using delta data source and seems I'm missing something. Although the CREATE TABLE USING delta command worked fine neither...

Spark read delta table, getting NoSuchObjectException(message:There is no database named delta) error

Reading delta format data using spark spark.sql("select * from delta.`/mnt/data/test`").createOrReplaceTempView("test") test view creates in spark program and I can use this...

update query in spark sql- delta format

I was trying to perform a simple update query in spark sql on delta tables update...

Issue with snapshot using delta on spark-dbt

I'm creating dbt snapshot on spark using Delta Lake. But after the Initial dbt snapshot run, from the second dbt snapshot command onwards, i'm getting the error snapshot: > target is not a...

Install delta lake package for Apche Spark 2.4.3 ( Pyspark )

I want to use delta lake on Hadoop cluster using pyspark. I haven't found any installation guide to use delta lake apart from below. shell pyspark --packages...

Creation of test spark delta table very slow

I am attempting to write some test cases for our spark logic by creating tiny input delta tables with known values. However I am noticing that the creation of a single item delta table is taking a...

Delta Lake Python

I have setup a virtual environment inside my existing hadoop cluster. Since the current cluster does not have spark >3 , so i installed delta spark using virtual environment. While trying to...

How to add Delta Lake support to Zeppelin's spark interpreter?

I'm trying to add the Delta Lake support to Zeppelin. So far I've tried adding the io.delta:delta-core_2.12:0.7.0 dependency to the spark interpreter, as well as a couple other related actions...

Spark Update Multiple Columns in Delta from another table

I am trying to update multiple columns from one delta table based on values fetched from another delta table. The update sql below works in Oracle but not in Spark Delta, can you please...

Convert spark dataframe to Delta table on azure databricks - warning

I am saving my spark dataframe on azure databricks and create delta lake table. It works fine, however I am getting this warning message while execution. Question- Why I am still getting this...

Update Spark Dataframe's window function row_number column for Delta Data

I need to update the dataframes's row number column for the delta data. I have implemented the base load's row number as below: Input Data: val base = List(List("001", "a",...

Schema mismatch - Spark DataFrame written to Delta

When writing a dataframe to delta format, the resulting delta does not seem to follow the schema of the dataframe that was written. Specifically, the 'nullable' property of a field seems to be...

Spark DataFrame is not saved in Delta format

I want to save Spark DataFrame in Delta format to S3, however, for some reason, the data is not saved. I debugged all the processing steps there was data and right before saving it, I ran count on...

Delta Table to Spark Streaming to Synapse Table in azure databricks

I need to write and synchronize our merged DELTA Tables to Azure Data warehouse. We are trying to read the Delta Table and but spark streaming doesn't allow Write Streaming to Synapse Tables. Then...

Delta table versioning while writing from a Spark structured streaming job

Will writing to a Delta table from a Spark structured streaming job create a version for every micro batch of data written?

Spark 3.0 -> Delta Lake 0.7.0 <-> AWS Glue Catalog -> Athena - implementing integration

I am using a stand-alone spark (pyspark) 3.0 with delta 0.7.0 on an EC2 instance. Can someone direct me to a guide of how to migrate to Glue Catalog from Hive Metastore catalog (on derby). If it...