Richard Willis - S3 ETag - Generate an accurate S3 ETag for any file

Posted on: Tuesday, 21 December 2021

I've been developing a new GitHub Action to sync to S3 (https://github.com/badsyntax/github-action-aws-s3) and one of the features of the tool is to be able to sync based on contents hash, a feature the aws cli does not provide. This was easier said than done, as calculating a MD5 hash that matches S3 ETag is not straightforward if you've uploaded multipart files.

Here's the general algorithm for accurately calculating a S3 ETag for multipart uploads:

"Say you uploaded a 14MB file to a bucket without server-side encryption, and your part size is 5MB. Calculate 3 MD5 checksums corresponding to each part, i.e. the checksum of the first 5MB, the second 5MB, and the last 4MB. Then take the checksum of their concatenation. MD5 checksums are often printed as hex representations of binary data, so make sure you take the MD5 of the decoded binary concatenation, not of the ASCII or UTF-8 encoded concatenation. When that's done, add a hyphen and the number of parts to get the ETag."

Note this algorithm has been kindly shared on Stack Overflow: https://stackoverflow.com/a/19896823/492325 That SO thread has a bunch of answers of the implementation of the algorithm in different languages, but none for Node.js. Well, there is one, but it doesn't seem correct, and so i decided to implement the algorithm myself.

Introducing S3 ETag: Generate an accurate S3 ETAG in Node.js for any file (including multipart).

GitHub Repo: https://github.com/badsyntax/s3-etag
NPM Package: https://www.npmjs.com/package/s3-etag

I validated the logic against the original shell script. In-fact I improved on the shell script. It appears my Node.js implementation of the algorithm is correct.

Hopefully some will find this helpful. Any questions or suggestions? Leave a comment below!

S3 ETag - Generate an accurate S3 ETag for any file

Comments

Add a new comment