Some experience about upload/download files on HDFS by JAVA Spring framework

  1. Using Stream method to transfer data

At first, I tried to use a easy way to transfer files, read them into memory as binary arrays, and then upload to HDFS. However, I met problem with java.lang.OutOfMemoryError: Java heap space. The reason is that I use some redundant function like File.getBytes(), which occupied too much memory.

Then I fixed the problem by using double stream. From Client(browser) to Web Server, and from Web Server to HDFS Server.

  • From Browser To Web Server

We can use the normal way, MultipartFile file to upload file.

The file contents are either stored in memory or temporarily on disk. In either case, the user is responsible for copying file contents to a session-level or persistent store as and if desired. The temporary storage will be cleared at the end of request processing.

Developer can config the threshold to decide if the temporary file is saved in memory or in disk by modify proterty file like following.


So that we can support high level concurrent visit.

  • From Web Server to HDFS Server

After initialized FileSystem by Config Class, The process is easily.

FSDataOutputStream out = fs.create(path, true);
IOUtils.copyBytes(file.getInputStream(), out, 1024, true);
  • From HDFS to Web Server

While downloading file, Hadoop API offers powerful API, only if we input the file path, we can create a InputStream.

FSDataInputStream in =;
  • From Web Server to Browser

Normally, in Spring framework, developer can use Resource Instance to transer file. And kindly, Resource offers a function InputStreamResource(InputStream), so that we can use it to download files from HDFS to browser directly.

  1. File Name Code Problem

This problem is exactly for Chinese developer. Chinese characters will be grabled in Http header. The solution is that change charset to “ISO8859-1”

String fileName = new String(fileName.getBytes(), "ISO-8859-1");