As I understand, that Hadoop HDFS can't increase the network speed, but I was in a discussion with a few folks trying to brainstorm how we can significantly speed up our uploads, and someone said that they were able to significantly improve the upload speed using HDFS.
If a user is on a LAN (100 MBPS), is there someway Hadoop HDFS can help increase the upload speeds when the user uploads a large file >100GB using their browser?
The webbrowser and webserver will then become the bottleneck in itself. They must buffer the file on that server, and then upload to HDFS, as compared to a direct datanode writer of hadoop fs -copyFromLocal
HUE (which uses WebHDFS) operates in this fashion, and I don't think there is an easy way to stream that large of a file via HTTP to exist on HDFS unless you can do chunked uploads, and once you do, you'd then have multiple smaller files on HDFS rather than the original 100+ GB one (assuming you're not trying to append to the same file reference on HDFS)