How to specify the location of a deltalake table in spark structured streaming?

I have a streaming data incoming which I am saving as a deltalake table using the below...

How to count the number of messages fetched from a Kafka topic in a day?

I am fetching data from Kafka topics and storing them in Deltalake(parquet) format. I wish to find the number of messages fetched in particular day. My thought process: I thought to read the...

Not able to get metadata information of the Delta Lake table using Spark

I am trying to get metadata information of the Delta Lake table created using DataFrame. Information on the version, timestamp. Tried: spark.sql("describe deltaSample").show(10,false) — this is...

Write to csv file from deltalake table in databricks

How do I write the contents of a deltalake table to a csv file in Azure databricks? Is there a way where I do not have to first dump the contents to a dataframe?...

How to refer deltalake tables in jupyter notebook using pyspark

I'm trying to start use DeltaLakes using Pyspark. To be able to use deltalake, I invoke pyspark on Anaconda shell-prompt as — pyspark — packages io.delta:delta-core_2.11:0.3.0 Here is the...

How to list all tables by searching a given column name in spark or deltalake

I'm looking for metadata table which holds all column name, table names, creation timestamps within spark sql and delta lake. I need to be able to search by a given column name and list all the...

Connect to Azure Data Lake Gen 2 from local Spark job

I'm trying to connect from a local Spark job to my ADLS Gen 2 data lake to read some Databricks delta tables, which I've previously stored through a Databricks Notebook, but I'm getting a very...

Is it possible to connect to databricks deltalake tables from adf

I'm looking for a way to be able to connect to Databricks deltalake tables from ADF and other Azure Services(like Data Catalog). I don't see databricks data store listed in ADF data sources. On a...

How does Delta Lake (deltalake) guarantee ACID transactions?

What mechanisms does Delta Lake use to ensure the atomicity, consistency, isolation, and durability of transactions initiated by user operations on a DeltaTable?

Can Azure Data Factory read data from Delta Lake format?

We were able to read the files by specifiying the delta file source as a parquet dataset in ADF. Although this reads the delta file, it ends up reading all versions/snapshots of the data in the...

How to ingest different spark dataframes in a single spark job

I want to write a ETL pipeline in spark handling different input sources but using as few computing resources as possible and have problem using 'traditional' spark ETL approach. I have a number...

How to fix "origin location must be absolute" error in sbt project (with Spark 2.4.5 and DeltaLake 0.6.1)?

I am trying to setup a SBT project for Spark 2.4.5 with DeltaLake 0.6.1 . My build file is as follows. However seems this configuration cannot resolve some dependencies. [info] Reapplying...

Is delta lake supported by spark2.xx

So i was trying to use delta lake write df_concat.write.format("delta").mode("overwrite").save("file") it gives me this error : java.lang.NoClassDefFoundError:...

Creating database in Azure databricks on External Blob Storage giving error

I have mapped my blob storage to dbfs:/mnt/ under name /mnt/deltalake and blob storage container name is deltalake. Mounting to Dbfs is done using Azure KeyVault backed secret scope. When I try...

Configuring TTL on a deltaLake table

I'm looking for a way to add ttl(time-to-live) to my deltaLake table so that any record in it goes away automatically after a fixed span, I haven't found anything concrete of yet, any one knows if...

Can underlying parquet files be deleted without negatively impacting DeltaLake _delta_log

Using .vacuum() on a DeltaLake table is very slow (see https://stackoverflow.com/q/62822265/5060792). If I manually deleted the underlying parquet files and did not add a new json log file or add...

More than 1 column in record key in spark Hudi Job while making an upsert

I am currently doing a POC on deltalake where I came across this framework called Apache Hudi. Below is the data I am trying to write using apache spark framework. private val INITIAL_ALBUM_DATA...

Can Glue Crawler crawl the deltalake files to create tables in aws glue catalogue?

We have an existing infrastructure where we are crawling the S3 directories through aws crawlers. These S3 directories are created as part of AWS datalake and dumped through the spark job. Now in...

Deltalake error- MERGE destination only supports Delta sources

I am trying to implement scd-type-2 in delta lake and i am getting the following error- "MERGE destination only supports Delta sources". Below is the snippet code i am executing. MERGE INTO...

How to improve the performance of a merge operation with an incremental DeltaLake table?

I am specifically looking to optimize performance by updating and inserting data to a DeltaLake base table, with about 4 trillion records. Environment : Spark 3.0.0 DeltaLake 0.7.0 In context this...

Data Lake ,Layers and ETL processing on GCP

I am coming from on-prem/hadoop data-platform background and now want to understand the good practices of doing this on GCP cloud technologies. As shown in the diagram, I have used HDFS/Hive to...

DeltaLake: How to Time Travel infinitely across Datasets?

The Use Case: Store versions of Large Datasets (CSV/Snowflake Tables) and query across versions DeltaLake says that unless we run vacuum command we retain historical information in a DeltaTable....

error reading kafka source - spark 3.0.0 k8s : java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class

I run spark streaming app in k8s spark operator. Spark version 3.0.0. Read data from kafka - spark-sql-kafka-0-10. I am not have any BigQuery dependencies, but in log: Exception in thread "main"...

How does spark structured streaming job handle stream - static DataFrame join?

I have a spark structured streaming job which reads a mapping table from cassandra and deltalake and joins with streaming df. I would like to understand the exact mechanism here. Does spark hit...

Using Delta Tables in Azure Synapse Dedicated/Serverless SQL Pools

I am currently employed as a Junior Data Developer and recently saw a post saying that Azure Synapse can now create SQL tables from Delta tables. I tried creating an SQL table from a Delta table...

How to write to Synapse dedicated sql pool

Looking for options to load incremental data from either parquet/deltaLake to Synapse sql warehouse using spark notebooks.

AttributeError: module 'pyspark.sql.utils' has no attribute 'convert_exception'

Hi i have an issue with deltalake im trying to import from delta import * but i got the following error anyone has any idea how to solve it please share it thanks in advance Traceback (most...

How to write to delta table/delta format in Python without using Pyspark?

I am looking for a way to write back to a delta table in python without using pyspark. I know there is a library called deltalake/delta-lake-reader that can be used to read delta tables and...