generating an AVRO schema from a JSON document

Is there any tool able to create an AVRO schema from a 'typical' JSON document. For example: { "records":[{"name":"X1","age":2},{"name":"X2","age":4}] } I found http://jsonschema.net/reboot/#/...

How to create schema containing list of objects using Avro?

Does anyone knows how to create Avro schema which contains list of objects of some class? I want my generated classes to look like below : class Child { String name; } class Parent { ...

How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?

I'm trying to validate a JSON file using an Avro schema and write the corresponding Avro file. First, I've defined the following Avro schema named user.avsc: {"namespace": "example.avro", "type":...

How to mix record with map in Avro?

I'm dealing with server logs which are JSON format, and I want to store my logs on AWS S3 in Parquet format(and Parquet requires an Avro schema). First, all logs have a common set of fields,...

How to insert into Hive table with a column of data type array<struct<int>>

I am trying to insert data into a table in Hive I created. I’ve been struggling, so I’m trying to simplify it as much as possible to get to the root of the issue. Here is my simplified code...

java.lang.ClassCastException:xx cannot be cast to org.apache.avro.generic.IndexedRecord

I was able to publish my java bean class as avro record to kafka. but when i try to consume i get class cast exception. Why this occurs? producer Schema schema = new Schema.Parser().parse(new...

Amazon redshift: load Avro files compressed using BZIP2

I have Avro files (compressed using BZIP2) stored in HDFS and S3 and I want to load them into Amazon Redshift. The copy command gives an error: error: Invalid AVRO file code: 8001 ...

Kafka connect confluent elasticsearch sink (no class found error)

I am very new to Kafka connect. I want to push my messages from Kafka topic to elasticsearch. After following the available documentation.. I downloaded and compiled elastic search sink from...

Kafka Consumer Vs Apache Flink

I did a poc in which I read data from Kafka using spark streaming. But our organization is either using Apache Flink or Kafka consumer to read data from Apache kafka as a standard process. So I...

Avro with Java 8 dates as logical type

Latest Avro compiler (1.8.2) generates java sources for dates logical types with Joda-Time based implementations. How can I configure Avro compiler to produce sources that used Java 8 date-time API?

Comparison of loading from different file formats in BigQuery

We currently load most of our data into BigQuery either via csv or directly via the streaming API. However, I was wondering if there were any benchmarks available (or maybe a Google engineer could...

Kafka Stream with Avro in JAVA , schema.registry.url" which has no default value

I have the following configuration for my Kafka Stream application Properties config = new Properties(); config.put(StreamsConfig.APPLICATION_ID_CONFIG,this.applicaionId); ...

how to get partition info and offset for Kafka topic without knowing consumer group info

I am totally a squat in Kafka land If I run the command /cfintools/confluent-4.0.0/bin/kafka-avro-console-consumer --topic $t --bootstrap-server $bt --consumer.config...

How can I convert avsc files to avdl files?

It is common to convert avro avdl files (idl files) to avsc files (schema files). I want to convert in the other direction, from avsc to avdl, because I have some avsc files created manually and...

Failed to deserialize data for topic

I'm using confluent cp-all-in-one project configuration from here: https://github.com/confluentinc/cp-docker-images/blob/5.2.2-post/examples/cp-all-in-one/docker-compose.yml I'm POST-ing a message...

AWS push down predicate not working when reading HIVE partitions

Trying to test out some glue functionality and the push down predicate is not working on avro files within S3 that were partitioned for use in HIVE. Our partitions are as follows: ...

Python AVRO reader returns AssertionError when decoding kafka messages

Newbie playing with Kafka and AVRO. I am trying to deserialise AVRO messages in Python 3.7.3 using kafka-python, avro-python3 packages and following this answer. The function responsible for...

PutSQL date format error cannot convert date value to timestamp

I am using apache nifi to query an Teradata Database using custom SQL script and store the result in an Oracle Database. The dataflow i used in nifi is as follows: ExecuteSQL (execute SQL...

How to send http post request with an Avro file?

I have a flask api that is expecting a post request in Avro. The problem is I'm not sure how to send Avro requests to test it. The api reads the data using the...

Kafka S3 Source Connector

I have a requirement where sources outside of our application will drop a file in an S3 bucket that we have to load in a kafka topic. I am looking at Confluent's S3 Source connector and currently...

ArrayIndexOutOfBoundsException while reading avro file using BinaryDecoder in java

I am using avro-1.10.1 jar and below is the sample code. This is to read avro file data stored on local system. Exception thrown in reader.read method where it's trying to access in.readIndex and...

Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1001: org.apache.kafka.common.errors.DisconnectException

I am using spring kafka and facing some errors Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1001: org.apache.kafka.common.errors.DisconnectException. my consumer producer...

Confluent Schema registry failed failed POST request

My local dev Kafka schema registry was working and I was able to POST the schema from my producer and get an ID back. I use auto-register=false, same as my production server. But I made an update...

How can I automatically infer schemas of CSV files on S3 as I load them?

Context Currently I am using Snowflake as a Data Warehouse and AWS' S3 as a data lake. The majority of the files that land on S3 are in the Parquet format. For these, I am using a new limited...

Should auto.register.schemas create a new version in schema-registry on modifying .avsc?

I've noticed a strange thing trying to follow a tutorial on Kafka schema-registry and a simple producer. It doesn't auto-generate new schema in schema-registry if I change the .avsc file. Steps I...

How to avoid losing messages with Kafka streams

We have a streams application that consumes messages from a source topic, does some processing and forward the results to a destination topic. The structure of the messages are controlled by some...

Connect to Kafka on host from Docker (ksqlDB)

I'm running ksqldb-server from a docker-compor found here https://ksqldb.io/quickstart.html#quickstart-content My kafka bootstrap server is running on the same VM in standard alone mode. I can see...

Flink Kafka : Expecting type to be a PojoTypeInfo

My customer class is already created using maven-avro plugin.When i try to run this program i am getting error as Exception in thread "main" java.lang.IllegalStateException: Expecting type to be a...

In Foundry, how can I parse a dataframe column that has a JSON response

I am trying to bring in JIRA data into Foundry using an external API. When it comes in via Magritte, the data gets stored in AVRO and there is a column called response. The response column has...

Data validation with mostly feature in List<T> using .NET Core C#

I have fetched the data as List < T > by reading from different formats e.g. CSV, Parquet, Avro, JSON. I want to validate the data with mostly feature e.g. The temperature should remain with in...