Hbase memstore manual flush

According to Hbase design, Hbase uses memstore to store the writes and eventually when the memstore reaches the size limit, it flushes it to HDFS. This flushing exercise is happened automatically...

How do I exclude all instances of a transitive dependency when using Gradle?

My gradle project uses the application plugin to build a jar file. As part of the runtime transitive dependencies, I end up pulling in org.slf4j:slf4j-log4j12. (It's referenced as a sub-transitive...

Strange Jackson Illegal character ((CTRL-CHAR, code 0)) Exception in Map Reduce Combiner

I have a Map-Reduce job with a mapper which takes a record and converts it into an object, an instance of MyObject, which is marshalled to JSON using Jackson. The value is just another Text field...

How to specify AWS Access Key ID and Secret Access Key as part of a amazon s3n URL

I am passing input and output folders as parameters to mapreduce word count program from webpage. Getting below error: HTTP Status 500 - Request processing failed; nested exception...

Hadoop DataStreamer Exception: File could only be replicated to 0 nodes instead of minReplication (=1)

I tried to load json data from my local into hadoop hdfs, I use these commands, and it throwing exception: hadoop fs -copyFromLocal path/files/file.json input/ hadoop fs -put path/files/file.json...

Unable to ingest data from flume to hdfs hadoop for logs

I am using following configuration for pushing data to hdfs from log file. agent.channels.memory-channel.type = memory agent.channels.memory-channel.capacity=5000 agent.sources.tail-source.type =...

Getting error while load data from Twitter to hdfs

I am getting error while load data from twitter to hdfs I am using ambari sandbox hortonworks hadoop-2.7 This is my flume.conf file flume.conf: TwitterAgent.sources = Twitter ...

bin/hadoop no such file or directory

I`m trying to install hadoop 2.6 on Ubuntu 14.04. When I write this command line bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'" this is the...

XML parsing in Hadoop mapreduce

I have written a mapreduce code for parsing XML as CSV. But I don't find any output in my output directory after running the job. I am not sure if the file is not read or not written. This my...

How to connect to remote HDFS

i am trying to connect to an HDFS instance running on a remote machine. I am running eclipse on a windows machine and the HDFS is running on a Unix box. Here is what i have tried ...

Spark: why tasks assigned only to one worker?

I'm new to Apache Spark and trying to run a simple program on my cluster. The problem is that the driver allocates all tasks to one worker. I am running as spark stand-alone cluster mode on 2...

Gradle archive contains more than 65535 entries

I am integrating hadoop2.5.0 for running mapreduce job and spring-boot-1.2.7 release and getting error while including this archive contains more than 65535 entries. My gradle jar...

Apache Kylin: Intermediate table not found

I’m a newbie for kylin. After installing, I run sample.sh, and then build the cube, but get the wrong message: java.io.IOException:...

Error : java.net.NoRouteToHostException no route to host

I run select * from customers in hive and i get the result. Now when I run select count(*) customers, the job status is failed. In JobHistory I found 4 failed maps. And in the map log file I have...

How to append a line at the end of /etc/sudoers file using shell script (without using pssh tool)

I want to append a few lines at the end of /etc/sudoers file. Below is an example of lines I want to append. nagios ALL = NOPASSWD: /bin/su - root -c /etc/init.d/crond status nagios ALL =...

Hive Vertex failed: killed/failed due to:ROOT_INPUT_INIT_FAILURE Caused by: java.lang.NullPointerException

I was querying a table,a simple count(*) and received the following error: Vertex failed, vertexName=Map 1, vertexId=vertex_1486982569467_0809_3_00, diagnostics=[Vertex...

Partitions are still showing in hive even though they are dropped for an external table

I have an external table in hive partitioned by year, month, day. So I dropped one partition but I still see it in show partitions. >use test_raw_tables; >show partitions...

curl: how to use Kerberos instead of NTLM authentication on Windows?

I'm trying to connect to a Livy REST service under Kerberos security. On Linux CentoS curl works fine with negotiate, after receiving a Kerberos kinit ticket the connection through curl...

Presto query Array of Rows

So I have a hive external table with schema looks like this : { . . `x` string, `y` ARRAY<struct<age:string,cId:string,dmt:string>>, `z` string } So basically I need to query a column(column "y")...

Pyspark column generation on lookup of previous rows and compute

Need to generate the column dynamically with look up on previous row values. So far tried on Code which is shared below Spark Dataframe is below cat a b c 1 null 0 0 1 0 9 0 2 ...

How to handle new line characters in hive?

I am exporting table from Teradata to Hive.. The table in the teradata Has a address field which has New line characters(\n).. initially I am exporting the table to mount filesystem path from...

HADOOP_HOME is not set correctly

I downloaded the binary tarball of hadoop from here: http://hadoop.apache.org/releases.html (ver 2.8.4). I unpacked the tar.gz file and then changed the etc/hadoop-env.sh from export...

DF.topandas() throwing error in pyspark

I am running a huge text file using PyCharm and PySpark. This is what I am trying to do: spark_home = os.environ.get('SPARK_HOME', None) os.environ["SPARK_HOME"] =...

Hive split string to get all the items except first one?

I have a column data "testdata" like this "abc,def,ghi,jkl" and I want to retrieve the output as "def,ghi,jkl" I am able to retrieve the first data like this SELECT split(testdata,'[\,]')[0] FROM...

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException) for hadoop 3.1.3

I am trying to run a mapreduce job but I am getting error for Hadoop-3.1.3 hadoop jar WordCount.jar WordcountDemo.WordCount /mapwork/Mapwork /r_out Error 2020-04-04 19:59:11,379 INFO...

String Index Out Of Bounds Exception When Initializing Spark Context

I've been working with Spark for more than 5 years. Recently, I encountered a basic error I have never seen before, and it has stopped development cold. When I do a routine call to create a Spark...

Using the AWS EMR artifact repository with a build tool

I try to use the EMR artifact repository to package the emrfs-hadoop-assembly and it's dependencies into my application. The resources I found, tell me the URL of the Maven repository I have to...

Is it possible to view logs in the spark history server with spark.eventLog.compress enabled?

I would like to enable spark.eventLog.compress in an EMR cluster to save log space without losing functionality from the spark history server. I've tried enabling the configuration setting and...

AWS Glue Passing Parameters via Boto3 causing exception

For the life of me I cannot figure out what is going on here. I am starting a Glue Job via Boto3 (from Lambda but testing locally gives the exact same issue) and when I pass parameters in via the...

How to configure authorization and authentication parameters from Apache Spark to Google Cloud Platform from outside of GCP?

I am trying to load data into GCP using Spark from outside of GCP (from one of our on-prem clusters). To do that, I wrote the following code. val conf = new SparkConf() ...

Relevant tags