how to deploy the kedro project and run the project in a new environment after kedro package command?

I have used already built pipeline using iris data and created a wheel and egg file using "kedro package". After this I created a virtual environment using python and installed both wheel and egg...

How do I add many CSV files to the catalog in Kedro?

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much...

Does Kedro support Checkpointing/Caching of Results?

Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only...

Dynamic instance of pipeline execution based on dataset partition/iterator logic

Not sure if this is possible or not, but this is what I am trying to do: - I want to extract out portions (steps) of a function as individual nodes (ok so far), but the catch is I have an iterator...

Data versioning of "Hello_World" tutorial

i have added "versioned: true" in the "catalog.yml" file of the "hello_world" tutorial. example_iris_data: type: pandas.CSVDataSet filepath: data/01_raw/iris.csv versioned: true Then when...

kedro-airflow creates DAGs that throw errors

I am using kedro-airflow to create a DAG for airflow but the DAG created throws an error (see below). The flow is just a test flow - very simple - and it runs without errors with kedro run....

Kedro: How to pass "list" parameters from command line?

I'd like to control kedro parameters via command line. According to docs, kedro can specify runtime parameters as follows: kedro run --params key:value > {'key': 'value'} It works. In the same...

Azure Common Credentials: Error in connection to blob storage when retrieving access token in get_token_with_client_credentials

I am trying to connect to a blob storage with the ServicePrincipalCredentials class and I get this error randomly: Error adal-python OAuth2Client:Get Token request failed The code that I use to...

Override nested parameters using kedro run CLI command

I am using nested parameters in my parameters.yml and would like to override these using runtime parameters for the kedro run CLI command: train: batch_size: 32 train_ratio: 0.9 ...

How to catalog datasets & models by S3 URI, but keep a local copy?

I'm trying to figure out how to store intermediate Kedro pipeline objects both locally AND on S3. In particular, say I have a dataset on S3: my_big_dataset.hdf5: type:...

Kedro airflow on spark

Looking for kedro+ airflow implementation on spark. Is the plugin now available for spark ? Looked at PipelineX but couldn't find relevant examples on spark ?

Using dictionary rather than parameter.yml for Kedro

Is there a way to use dictionary rather than using a yaml config for parameters.yml? I want to keep it as a Python Object because my IDE can then track the dependency easily. For my parameters, I...

using gunicorn for nested folders

I'm new to gunicorn and heroku so I would appreciate any help. I want to deploy my python Dash app on to heroku and I know I need a Procfile. The thing is that my project structure uses the Kedro...

kedro nodes input accept kwargs?

https://kedro.readthedocs.io/en/stable/kedro.pipeline.node.Node.html#kedro.pipeline.node.Node.inputs I have a function def function(**kwargs): return How can I pass variable to it as a node...

How to use tf.data.Dataset with kedro?

I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in...

Jupyter notebooks as Kedro node

How can I use a Jupyter Notebook as a node in Kedro pipeline? This is different from converting functions from Jupyter Notebooks into Kedro nodes. What I want to do is using the full notebook as the node.

Kedro - Can't instantiate abstract class ProjectContext with abstract methods project_name, project_version

I'm new to kedro and I have a problem when opening Jupyter Lab/Notebook from Kedro using the command kedro jupyter lab. The error was: TypeError: Can't instantiate abstract class ProjectContext...

PartitionedDataSet not found when Kedro pipeline is run in Docker

I have multiple text files in an S3 bucket which I read and process. So, I defined PartitionedDataSet in Kedro datacatalog which looks like this: raw_data: type: PartitionedDataSet path:...

How to create a list of catalog entries and pass them in as inputs in Kedro Pipeline

I am trying to get a list of datasets from a catalog file i have created and pass them in as inputs of a single node to combine them and ultimately run the pipeline on airflow using the...

kedro: train image classifier with keras ImageDataGenerator

Which kedro dataset should be used when working with images and keras ImageDataGenerator? I know there is ImageDataset but the number of images is too large to fit in memory. And all that keras...

How would one use databricks delta lake format with Kedro?

We are using kedro in our project. Normally, one can define datasets as such: client_table: type: spark.SparkDataSet filepath: ${base_path_spark}/${env}/client_table file_format: parquet ...

DataBricks + Kedro Vs GCP + Kubeflow Vs Server + Kedro + Airflow

We are deploying a data consortium between more than 10 companies. Wi will deploy several machine learning models (in general advanced analytics models) for all the companies and we will...

Kedro install - Cannot uninstall `terminado`

When running kedro install I get the following error: Attempting uninstall: terminado Found existing installation: terminado 0.8.3 ERROR: Cannot uninstall 'terminado'. It is a distutils...

How do I add a directory of .wav files to the Kedro data catalogue?

This is my first time trying to use the Kedro package. I have a list of .wav files in an s3 bucket, and I'm keen to know how I can have them available within the Kedro data catalog. Any thoughts?

Kedro 0.17 Override global.yml with extra params

Im currently not able to update the globals.yml file with extra params passed at run time as I previously did with Kedro 0.16.x. I run kedro through run.py. @hook_impl def...

How to load a specific catalog dataset instance in kedro 0.17.0?

We were using kedro version 0.15.8 and we were loading one specific item from the catalog this way: from kedro.context import load_context get_context().catalog.datasets.__dict__[key] Now, we...

Kedro context and catalog missing from Jupyter Notebook

I am able to run my pipelines using the kedro run command without issue. For some reason though I can't access my context and catalog from Jupyter Notebook anymore. When I run kedro jupyter...

Parquet file larger than memory consumption of pandas DataFrame

I am storing two different pandas DataFrames as parquet files (through kedro). Both DataFrames have identical dimensions and dtypes (float32) before getting written to disk. Also, their memory...

Kedro : Failed to find the pipeline named '__default__'

Having issues with kedro. The 'register_pipelines' function doesn't seem to be running or creating the default Pipeline that I'm returning from it. The error is (kedro-environment)...

Specify host and port in mlflow.yml and run "kedro mlflow ui", but host and port still default (localhost:5000) not change

I build sample kedro project refer to this page, and specify host as my global ip address in mlflow.yml. but when I hit "kedro mlflow ui" command, it still listen to local. even I only specify...