hive difference between insert and load data

I am new to Hadoop and Hive, and I am confused about hive's insert into and load data statements.

When I execute INSERT INTO TABLE_NAME (field1, field2) VALUES(value1, value2);, hiveserver will execute mapReduce task.

When I execute LOAD DATA LOCAL INPATH PATH_TO_MY_DATA INTO TABLE TABLE_NAME;, it only loads data from the file and does nothing else.

I wrote a program with Python, here is my problem, if I use pyhs2 and use the insert statement to save data records, each record will execute a MapReduce task, and it is very slow. Should I first save my data somewhere, and later use the load data statement to load it?

Answers

Load Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.

Insert

Query Results can be inserted into tables by using the insert clause.

INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

in load all the data which in file is copied into table, in insert you can put data based on some condition.

your solution

for every single row you execute your given hql so every time map reduce run.

if you want to execute your query in single mapreduce then

INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);

create a single query and execute it. If you have more records in this condition you can make it a batch.

Posted on by Kishore