Hadoop "Unable to load native-hadoop library for your platform" warning

I'm currently configuring hadoop on a server running CentOs. When I run start-dfs.sh or stop-dfs.sh, I get the following error: WARN util.NativeCodeLoader: Unable to load native-hadoop library...

How to specify AWS Access Key ID and Secret Access Key as part of a amazon s3n URL

I am passing input and output folders as parameters to mapreduce word count program from webpage. Getting below error: HTTP Status 500 - Request processing failed; nested exception...

How do you retrieve the replication factor info in Hdfs files?

I have set the replication factor for my file as follows: hadoop fs -D dfs.replication=5 -copyFromLocal file.txt /user/xxxx When a NameNode restarts, it makes sure under-replicated blocks are...

Create directory in hadoop filesystem

I'm new to hadoop. I am trying to create a directory in hdfs but I am not able to create. I have logged into "hduser" hence I assumed /home/hduser" pre-exists as Unix fs. So I tried to create...

How to start hiveserver2 as service

Hi all I have setup multi node cluster (i.e 5 node) in my network which working fine. now I wanted to insert and retrieve data from cluster using hive thus I have setup hive latest release...

Does HBaseTestingUtility work at all with a MiniCluster?

I have a simple unit test I want to run against the HBaseTestingUtility MiniCluster. The transitive dependencies needed to run a test with HBaseTestingUtility are missing. I've been tracking...

what does "vcore-seconds" in hadoop job log mean?

Job Counters Launched map tasks=3 Launched reduce tasks=45 Data-local map tasks=1 Rack-local map tasks=2 Total time spent by all maps in occupied slots (ms)=29338 Total...

What is the right way to set up HBase, hadoop, hive to access Hbase through hive?

I have a problem with configuring and installing hbase/hadoop/hive. What I did so far on an VM with ubuntu 14.04.3 LTS: installed jdk like this with the Version...

Facebook Presto unable to retrieve data from Azure Blob Storage

I have an Azure HDInsight Hadoop Cluster and a parquet file stored on an Azure blob from which I created a Hive table, called hive_table. I am able to execute hive sql queries successfully. Next,...

How to run spark-shell with YARN in client mode?

I've installed spark-1.6.1-bin-hadoop2.6.tgz on a 15-node Hadoop cluster. All nodes run Java 1.8.0_72 and the latest version of Hadoop. The Hadoop cluster itself is functional, e.g. YARN can run...

How to access hive database from secured kerberos environment using java

I am using hadoop with kerberos environment and I am new to the kerberos. I wanted to access hive database using java, I gone through the hive official site but they given very generalize...

Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y

I am running this command -- sudo -u hdfs hadoop fs -du -h /user | sort -nr and the output is not sorted in terms of gigs, Terabytes,gb I found this command - hdfs dfs -du -s...

Unable to drop table

I'm working on HBase 0.98.12-hadoop2 and phoenix-4.7.0 I created table on phoenix to map with existing table on HBase. After index testing, It failed to drop table with ERROR. Error: ERROR 1010...

Out of memory issue when compare two large datasets using spark scala

I am daily importing 10 Million records from Mysql to Hive using Spark scala program and comparing datasets yesterdays and todays datasets. val yesterdayDf=sqlContext.sql("select * from...

Not able to recover partitions through alter table in Hive 1.2

I am not able to run ALTER TABLE MY_EXTERNAL_TABLE RECOVER PARTITIONS; on hive 1.2, however when i run the alternative MSCK REPAIR TABLE MY_EXTERNAL_TABLE its just listing the partitions which...

Hive not launching mapreduce jobs.. getting stuck in the middle of execution

I am trying to select count(*) from a table but, after submitting the job, it is getting stuck. PFB the details. hive> select count(*) from txnrecords; WARNING: Hive-on-MR is deprecated in Hive 2...

Apache PIG, ELEPHANTBIRDJSON Loader

I'm trying to parse below input (there are 2 records in this input)using Elephantbird json...

Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-03-23 12:19:17,371 Stage-1 map = 0%, reduce = 0% Ended Job = job_local1571094051_0001...

is Apache Kylin good alternative for SSRS(SQL Server Reporting Services)?

We have framework for analysis data with the help of cube design for OLAP and warehouse that has ETL connections, all of them are in sqlServer structure and SSRS(SQL Server Reporting...

hdfs list and select the latest updated files

Trying to work a solution for getting the latest updated data in the list of files in HDFS. Explanation: hdfs dfs -ls -l /tmp/workday1/list/date=20170101/ The command above displays the list of...

Multiple compression on same hive table

I have a Hive table partitioned by Year/Month and it contains data for at least 7 years. What I want to do it compress the latest data (like upto 1 year old) through Snappy but the older data...

Dataproc Reading from Google Cloud Storage

I am trying to read a csv or txt file from GCS in a Dataproc pyspark Application. I have tried so many things. So far the most promising: #!/usr/bin/python import os import sys import...

Building Apache Flink from source fail when package flink-1.5.2

I just start my flink learning the day before yesterday.And I download the newest version of flink ----flink1.5.2 I run mvn clean package -DskipTests on both win10 ubuntu14.0 MacOS10.13,and both...

Flink checkpoints to Google Cloud Storage

I am trying to configure checkpoints for flink jobs in GCS. Everything works fine if I run a test job locally (no docker and any cluster setup) but it fails with an error if I run it using...

How to start JanusGraph Server with FoundationDB on a different host or container?

I am trying to create a docker-compose project with - JanusGraph server, ElasticSearch server and FoundationDB server. I am using following docker images: for ElasticSearch -...

Failed connecting to Hive metastore: [localhost:9083]

Im getting error while connecting presto server to hive metastore. Here is my...

what is the difference between fsimage and snapshot in hadoop?

I am new to hadoop. I want to know the difference between snapshot and fsimage used for file system state in hadoop. I heard that both do the same work. then what makes the difference between them?

How to merge partitions in HDFS?

Assuming I have a partitioned table in my HDFS, that gets new information all the time. New data will be partitioned by days by default, while all of the other files are partitioned by months. How...

pyspark load dataframe in bigquery from local

I want to load a pyspark dataframe into a Google BigQuery table. I run the job by running spark-submit --jars batch/jars/gcs-connector-hadoop2-latest.jar,batch/jars/spark-bigquery-latest.jar...

Hive is automatically rounding of the precision

I am doing a division operation in hive and it seems like automatically rounding of the values. Is there a way i can avoid this. Example select cast(600/27701.47 as...