Back to Blogs

AWS S3 Multipart File Upload: Our Recent Experience at Ideas2IT

This article features our recent experience with multipart file upload to AWS S3. Our application builds line-delimited JSON files that contain patient information and uploads them to an AWS S3 bucket. The file consists of a few hundred lines, each representing a patient record.

Since each row was up to 10MB, we adopted the "multipart S3 file upload" approach. This method allows us to break the file into parts and upload them using ThreadExecutive, ensuring good performance. In this approach, we are required to pass the part stream and its byte size as arguments in the Part Upload Request.

Ideas2IT: Multipart file upload to AWS S3

The multipart AWS S3 upload was successfully done, as anticipated. However, after going live, we encountered a production bug. The uploaded file had an incomplete line in the middle, with a few characters missing at the end. The issue occurrence was sporadic and unreplicable.

Upon investigation, we discovered that the issue occurred when the patient record had a binary file-like image attached. Further inspection revealed something unusual with the highlighted line.

After replicating the issue in our Dev environment with a suitable record, we changed the following lines and resolved the issue.

Ideas2IT: Multipart file upload to AWS S3

We changed content.getBytes().length to inputStream.available().

We found that content.getBytes().length minus inputStream.available() yielded 0 with normal string contents and -1 or -2 with binary strings, such as images. This discrepancy occurred with a few images but not all.

The documentation notes, "Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not." Therefore, InputStream.available() might not be appropriate in all cases, which is why we initially decided against using it.

Conclusion

Switching from content.getBytes().length to inputStream.available() addressed the issue of incomplete lines in our S3 multipart uploads, especially when dealing with binary files like images. This change was crucial in stabilizing the upload process. The problem arose due to the method's inconsistent handling of binary data, as noted in the documentation. This experience underscores the importance of thorough testing and selecting the right methods based on the nature of the data to ensure reliable file uploads.

Ideas2IT Team

Co-create with Ideas2IT
We show up early, listen hard, and figure out how to move the needle. If that’s the kind of partner you’re looking for, we should talk.

We’ll align on what you're solving for - AI, software, cloud, or legacy systems
You'll get perspective from someone who’s shipped it before
If there’s a fit, we move fast — workshop, pilot, or a real build plan
Trusted partner of the world’s most forward-thinking teams.
AWS partner AICPA SOC ISO 27002 SOC 2 Type ||
Tell us a bit about your business, and we’ll get back to you within the hour.
Open Modal
Subscribe

Big decisions need bold perspectives. Sign up to get access to Ideas2IT’s best playbooks, frameworks and accelerators crafted from years of product engineering excellence.

Big decisions need bold perspectives. Sign up to get access to Ideas2IT’s best playbooks, frameworks and accelerators crafted from years of product engineering excellence.