Question:

AWS Glacier inventory is either wrong or asynchronous?

Stella: 02 February 2022

Using Boto3, I uploaded 12 files to Amazon's Glacier last night at 21:43 EST.

I received ArchiveID's for all 12 files, so I assume they uploaded correctly.

According the the AWS management console, the most recent inventory was run at 02:53 EST this morning. So about 5 hours after upload.

But the inventory does not show those 12 files. There is only 1 file (that I uploaded 1 week ago).

I know I have to wait another day for the next AWS Inventory to run, but I thought I'd ask if this is expected behavior?

Does the time amazon calculated/reported time of the Inventory not really match when it was run? Is it possible the inventory actually ran before I uploaded those files?

If not, why would I get archiveID's (indicating successful upload) but they aren't listed in an intentory run AFTER they were uploaded?

Edit:

The files did show up on the next inventory.

But I'm still curious why the time of the 'Last Inventory' in AWS Console which was 5 hours after upload reported zero files. My only explanation is that the 'Last Inventory' time of the AWS Console is when the information was uploaded to the AWS Console. But that the ACTUAL inventory could have run hours earlier (hence it missed the recently uploaded files).

Answer:
Mia: 02 February 2022

When you initiate a job for a vault inventory, Amazon Glacier returns the last inventory it generated, which is a point-in-time snapshot and not real-time data.

http://docs.aws.amazon.com/amazonglacier/latest/dev/vault-inventory.html

Asking for an inventory apparently doesn't trigger the actual generation -- it just preps the last inventory for fetching.

Inventories are updated approximately once every 24 hours, so there's a good chance of those new files not appearing in the timetable you describe.

Unless you're interested in features only available through the Glacier API, like vault locks, you may find the S3/Glacier integration provides a more useful interface. Files uploaded as S3 objects and then transitioned to the Glacier storage class by lifecycle policies are not visble through the Glacier API -- they continue to appear as S3 objects, making it more straightforward to iterate through them and their metadata, all of which is effectively in real time.