This article features our recent experience with multipart file upload to AWS S3.
Our application builds line-delimited JSON files that contain patient information and uploads it to an AWS S3 bucket. The file consists of a few hundred lines, each representing a patient record.
Since each row was ranging up to 10MB, we adopted the “multipart S3 file upload” approach, as it allows us to break the file into parts and upload them using ThreadExecutive. This approach ensures good performance.
In this approach, we are required to pass the part stream and its byte size as arguments in the Part Upload Request.
The multipart AWS S3 upload was successfully done, as anticipated.
However, after going live, we were hit with a production bug. The upload file had an incomplete line in the middle of the file, with few missing characters at its end. The issue occurrence was sporadic and was unreplicable.
We investigated the issue and found that it occurred when the patient record had a binary file-like image attached. On further inspection, we understood that there was something fishy with the highlighted line.
After replicating the issue in our Dev environment with a suitable record, we changed the following lines and resolved the issue.
We found the “content.getBytes( ).length minus stream.available( )” gave 0 (zero) with normal string contents, and -1 or -2, when content had binary strings, representing any image. It happened with just a few images and not with all images.
The documentation says, “Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not.”
The “InputStream.available( )” method might not be appropriate in all cases. This is why we decided not to use it in the first place.
Please, refer to this link to know more about multipart AWS S3 upload. We hope this information will help everyone leveraging AWS S3 file upload. If you have any comments, feedback, or suggestions, please feel free to drop them here.