What is the right Date/Datetime format in JSON for Spark SQL to automatically infer the schema for it?

Spark SQL has support for automatically inferring the schema from a JSON input source (each row is a standalone JSON file) - it does so by scanning the entire data set to create the schema but...

Spark structured streaming kafka convert JSON without schema (infer schema)

I read Spark Structured Streaming doesn't support schema inference for reading Kafka messages as JSON. Is there a way to retrieve schema the same as Spark Streaming does: val dataFrame =...

SHACL SPARQL targets not giving correct inference while using pySHACL

When I try to do SPARQL based SHACL validation, I am getting the wrong results. I am trying to filter out processes where cranecapacity is less than module weight using SHACL SPARQL target. ...

Only single thread executes parallel SQL query with PySpark using multiprocessing pool

I have a case where I am using PySpark (or Spark if I can't do it with Python and instead need to use Scala or Java) to pull data from several hundred database tables that lack primary keys. (Why...

Reasoning in Apache Jena Fuseki: "Reload" dataset or "trigger" inference

We have an Apache-Fuseki Server running with the following configuration: @prefix : <http://base/#> . @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . @prefix tdb2: ...

What is the fastest correct way to detect that there are no duplicates in a JSON array?

I need to check if all items are unique in an array of serde_json::Value. Since this type does not implement Hash I came up with the following solution: use serde_json::{json, Value}; use...

Error: "Your deployment does not have an associated swagger.json" - ACI deployment on Stream Analytics Job

Latest update: In the current release of the public review link of Stream Analytics Job the ACI container deployment is not supported. Thus, I will close this question until further notice. For...

Validating that every subject has a type of class

I have the following Data & Shape Graph. @prefix hr: <http://learningsparql.com/ns/humanResources#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:...

Why Spark outputs nullable = true, when schema inference left to Spark, in case of JSON?

Why does Spark show nullable = true, when schema is not specified and its inference is left to Spark ? // shows nullable = true for fields which are present in all JSON...

aws personalize putevents does not update recommendations

I'm trying to use AWS Personalize. After creating dataset and batch inference, I am updating the user-item-interactions with personalize.putEvents (using Javascript SDK, docs) Snippet: const...

Azure ML: how to access logs of a failed Model deployment

I'm deploying a Keras model that is failing with the error below. The exception says that I can retrieve the logs by running "print(service.get_logs())", but that's giving me empty results. I am...

Spark not able to write into a new hive table in partitioned and append mode

Created a new table in hive in partitioned and ORC format. Writing into this table using spark by using append ,orc and partitioned mode. It fails with the...

How to use AWS Glue metadata in queries with the DynamoDB-Athena Connector

I am trying to use the Athena Federated query system with the pre-built Athena-DynamoDB Connector. I have the connector setup so I can run queries like this: SELECT * FROM...

How should I process nested data structures (e.g. JSON, XML, Parquet) with Dask?

We often work with scientific datasets distributed as small (<10G compressed), individual, but complex files (xml/json/parquet). UniProt is one example, and here is a schema for it. We typically...

How can I infer optional properties in TypeScript?

I have the following type definition for a database schema: type Schema<T> = { [K in keyof T]: SchemaType<T[K]> } interface SchemaType<T> { optional: boolean validate(t: T):...

No template named 'unary_function'; did you mean 'binary_function'? when use tensorflow and open cv on mac os

When I use opencv and tensorflow, I run it. It throws many errors. I can post my code here because it is many file. But I see it throws errors on library file of open cv or tensorflow ...

spark sql protocol buffer support

Have been attempting to write against java rdds and datasets and use protocol buffers (v2.5.x) for spark to infer the schema. However spark fails on protocol buffer field members Given a...

Renaming columns in a PySpark DataFrame with a performant select operation

There are other thread on how to rename columns in a PySpark DataFrame, see here, here and here. I don't think the existing solutions are sufficiently performant or generic (I have a solution...

Azure Stream Analytics: ML Service function call in cloud job results in no output events

I've got a problem with an Azure Stream Analytics (ASA) job that should call an Azure ML Service function to score the provided input data. The query was developed und tested in Visual Studio (VS)...

SOLVED - GatsbyJS - GraphQL ACF query issue

So I am encountering some issues with the way that my ACF is structured on a custom post type. Due to the way the post type works (it's for work case studies) there is the option to choose to use...

Tensorflow lite model output always gives same output no matter the input

My goal is to run a Keras model I have made in my ESP32 microcontroller. I have the libraries all working correctly. I have created a Keras model using google Collab that looks to be working fine...

Azure Machine Learning Studio: cannot deploy model with rpy2 as a dependency

I am trying to deploy a custom model on Azure Machine Learning Studio that needs rpy2 (Python wrapper for R) to run. So, I created the following yml file (myenv.yml), specifying the required...

Spark structured streaming: Schema Inference in Scala

I'm trying to infer the dynamic json schema from kafka topic.Found this piece of code in blog, which infer the schema using PYSPARK. def read_kafka_topic(topic): df_json = (spark.read ...

azureml No module named 'xgboost'

I'm running a python notebook in Azure ML and created an Auto ML experiment and attempting deploy a mode using Python script. Model deploys successfully and I can see the endpoints, however, when...

glue job schema inference issue

Requirment: I need a glue job to get the aws-dynamodb(nested structure-combination of maps and list) data into s3. My approach: First, i used glue-dynamic frame to get all the data from dynamodb ...

Azure ML Inference Schema - "List index out of range" error

I have an ML model deployed on Azure ML Studio and I was updating it with an inference schema to allow compatibility with Power BI as described here. When sending data up to the model via REST api...

Pyspark dataframe or parquet file to DynamoDB

I want to put a pyspark dataframe or a parquet file into a DynamoDB table The pyspark dataframe that I have has 30MM rows and 20 columns Solution 1: using boto3, pandas and Batch writing (Amazon...

find a frequency of words in text file

Please can anybody help me? I'm a beginner and I have a hard assignment. I need to write a c++ program that does the following : Ask the user to enter two text file the first one contains the...

How can I automatically infer schemas of CSV files on S3 as I load them?

Context Currently I am using Snowflake as a Data Warehouse and AWS' S3 as a data lake. The majority of the files that land on S3 are in the Parquet format. For these, I am using a new limited...

Malformed records are detected in schema inference parsing json

I have a really frustrating error trying to parse basic Json read from Blob Storage using a data set within ADF My Json is...