azuremlsdk R: How to convert dataset into R dataframe?

For AzureML Python SDK we can use get_by_name() which returns the dataset.

import azuremlsdk
mydata = get_by_name(myworkspace, 'mydata')

And I can get the panda dataframe of mydata by the .to_pandas_dataframe() method

mydata.to_pandas_dataframe()

For R equivalent, I'm stuck here

mydata <- azuremlsdk::get_dataset_by_name(myworkspace, 'mydata')

The question is, what are the options for R so that I get the tables, say, in csv or tibble?

I notice R's AzureML SDK is not as well documented as Python's, which makes migrating to AzureML pretty challenging for our R code base.

Answers

An Azure Machine Learning Dataset allows you Load all records from the dataset into a dataframe and then Convert the current dataset into a FileDataset containing CSV files or Parquet files.

load_dataset_into_data_frame() => Load all records from the dataset into a dataframe.

convert_to_dataset_with_csv_files() => Convert the current dataset into a FileDataset containing CSV files.

convert_to_dataset_with_parquet_files() => Convert the current dataset into a FileDataset containing Parquet files.

Example: Convert data into dataframe.

#' Load all records from the dataset into a dataframe.
#'
#' @description
#' Load all records from the dataset into a dataframe.
#'
#' @param dataset The Tabular Dataset object.
#' @return A dataframe.
#' @export
#' @md
load_dataset_into_data_frame <- function(dataset)	{
  dataset$to_pandas_data_frame()
}

#' Convert the current dataset into a FileDataset containing CSV files.
#'
#' @description
#' Convert the current dataset into a FileDataset containing CSV files.
#'
#' @param dataset The Tabular Dataset object.
#' @param separator The separator to use to separate values in the resulting file.
#' @return A new FileDataset object with a set of CSV files containing the data
#' in this dataset.
#' @export
#' @md

convert_to_dataset_with_csv_files <- function(dataset, separator = ",") {
  dataset$to_csv_files(separator)
}

#' Convert the current dataset into a FileDataset containing Parquet files.
#'
#' @description
#' Convert the current dataset into a FileDataset containing Parquet files.
#' The resulting dataset will contain one or more Parquet files, each corresponding
#' to a partition of data from the current dataset. These files are not materialized
#' until they are downloaded or read from.
#'
#' @param dataset The Tabular Dataset object.
#' @return A new FileDataset object with a set of Parquet files containing the
#' data in this dataset.
#' @export
#' @md
convert_to_dataset_with_parquet_files <- function(dataset) {
  dataset$to_parquet_files()
}


Reference: Azuremlsdk - working with datasets

Posted on by CHEEKATLAPRADEEP-MSFT