- Using Stream method to transfer data
At first, I tried to use a easy way to transfer files, read them into memory as binary arrays, and then upload to HDFS. However, I met problem with
java.lang.OutOfMemoryError: Java heap space. The reason is that I use some redundant function like
File.getBytes(), which occupied too much memory.
Then I fixed the problem by using double stream. From Client(browser) to Web Server, and from Web Server to HDFS Server.
- From Browser To Web Server
We can use the normal way,
MultipartFile file to upload file.
The file contents are either stored in memory or temporarily on disk. In either case, the user is responsible for copying file contents to a session-level or persistent store as and if desired. The temporary storage will be cleared at the end of request processing.
Developer can config the threshold to decide if the temporary file is saved in memory or in disk by modify proterty file like following.
So that we can support high level concurrent visit.
- From Web Server to HDFS Server
FileSystem by Config Class, The process is easily.
FSDataOutputStream out = fs.create(path, true); IOUtils.copyBytes(file.getInputStream(), out, 1024, true);
- From HDFS to Web Server
While downloading file, Hadoop API offers powerful API, only if we input the file path, we can create a InputStream.
FSDataInputStream in = currentVfs.open(srcPath);
- From Web Server to Browser
Normally, in Spring framework, developer can use
Resource Instance to transer file. And kindly, Resource offers a function
InputStreamResource(InputStream), so that we can use it to download files from HDFS to browser directly.
- File Name Code Problem
This problem is exactly for Chinese developer. Chinese characters will be grabled in Http header. The solution is that change charset to “ISO8859-1”
String fileName = new String(fileName.getBytes(), "ISO-8859-1");