Simple Storage Service
- Global Storage Platform
- Private by default
- Only root user has access by default
- Region Based - Data is stored in a specific region
- Regionally resilient - Data is replicated across AZs in the regions
- Public Service running from AWS Public Zone
Components
- Objects
- Like files
- Made of object key, object value and some metadata
- Min size: 0 bytes to Max size: 5TB
- Buckets
- Containers for objects
- Created in specific region
- Data inside the bucket never leaves the region
- Blast Radius = Region
- Bucket name needs to be globally unique
- Can store unlimited number of objects - Infinitely scalable storage system
- There is no concept of file type. Everything is the object key including the file type extension
- Names should be 3-63 characters
- Cant be formatted as IP addresses
- 100 soft limit, 1000 hard limit per account
- Flat Structure
- Everything is stored in the bucket at the root level
- When the key has
/
S3 presents this as a folder structure. Folders are referred to as prefixes in S3
- Not a file or block storage
- Cant mount a S3 buckets
- All buckets are private by default
Security
Bucket Policy
- Type of Resource Policy
- A resource policy is like identity policy but is attached to resources instead of identities
- Unlike identity policies which can only provide access to identities inside the current account, resource policies can provide access to identities on the same account or different accounts as well
- Resource policies can also be attached to anonymous principals
- Identity policies control permissions from an identity perspective and the resource policy is from a resource perspective
- One bucket policy allowed per bucket
- A single policy can contain multiple statements
- When an identity inside an account tries to access a bucket inside the same account, then the final policy is a combinaiton of identity policy & the bucket policy
- When an identity outside an account tries to access a bucket inside the same account, then the final policy is a combinaiton of that identity's policy & the bucket policy
- The identitiy's policy should still be able to access the bucket in S3
- When an anonymous identity tries to access a bucket inside the account, then the final policy is just the bucket policy
ACLs
- Legacy concept
- Inflexible & Simple permissions
Block Public Access
- Applies only to anonymous principles
- Does not apply to identities
Object Storage Classes
Standard
- Default is S3 standard class
- Replicated across atleast 3 AZs
- 99.9999999999 of durability
- Replication uses MD5 & CRC to detect and fix data issues
- When objects are stored successfully, a
HTTP/1.1 200 OK
response is returned
- Has milliseconds first byte data latency and objects can be made publicly available
- To be used for frquently accessed data
- No retrieval fee
Standard - Infrequent Access
- Almost same as Standard
- Replicated acrosss 3 AZ
- Same Durability, Availability as Standard
- Same basic cost
- Storage class is about half as the Standard Class
- Has retrieval fee
- Minimum duration charge of 30 days and minimum capacity charge of 128KB per object
- Used for long lived but infrequent data access
One Zone IA
- Similar to Std. IA
- Has retrieval fee, min storage and capacity charge
- Data is not replicated and stored only in 1 AZ
- Much Cheaper to store data
- Data which is non critical and easily replaced and where access is infrequent
Glacier - Instant Retrieval
- Same as Std. IA but cheaper storage and expensive retrieval, longer minimum duration/sizes
- Has per GB retrieval fee and min duration of 90 days
Glacier - Flexible
- 1/6th of Storage cost of Std
- Not ready for immediate use and not immediately available
- A retrieval process needs to be started to gain access which costs money
- Rertieved data is stored in Std IA
- Retrieval Types
- Expedited: 1-5 mins
- Standard: 3-5 Hours
- Bulk: 5-12 Hours
- First byte latency of mins or hours
- 40KB min billable size / 90 days min billable durations
- Used for storing archival data
- Objects cannot be made publicly available
Glacier - Deep Archive
- Cheapest option for storage
- 40KB min billable size / 180 days min billable durations
- Objects cannot be made publicly available
- A retrieval process needs to be started to gain access which costs money
- Rertieved data is stored in Std IA
- Retrieval Types
- Standard: 12 Hours
- Bulk: 48 Hours
- First Byte Latency: hours or days
- Used for archiving regulatory data
Intelligent Tiering
- Contains 5 different storage tiers
- Frequent Access - Std
- Infrequent Access - Std IA
- Archive Instant Access - Glacier Instant
- Archive Access - Glacier Flexible
- Deep Archive - Glacier Deep Archive
S3 Static Website Hosting
- Allows access using http
- After enabling it, set and index document and error document
- A static website hosting address is created
- DNS depends on bucket name and region
- Custom domain can be used
- Bucket name should match the custom domain
Pricing
- Storage Charge
- Data transfer charge
- Data transfer into S3 is free
- Data transfer out of S3 is per gig charge
- Request Charge
- GET, PUT, POST, LIST operations cost based on the storage class
- Free Tier
- 5GB of storage
- 20000 Get requests
- 2000 Put requests
Object Versioning
- Disabled by default
- Once enabled cannot be reverted but can be suspended and enabled again
- Without versioning, modification results in replacement of an object
- With versioning, a modification creates a new version
- Automated ID is assigned
- Old version is also retained
- A specific version can be provided during access
- Versioning impacts deletion
- When deleted without providing a version, S3 attaches a delete marker to the object
- This delete marker can be deleted which is an undo delete
- When deleting a version, the previous version becomes the latest
- Space is consumed by all versions which impacts billing
- Only way to 0 costs is to delete the bucket and reupload the files without versioning
MFA Delete
- MFA is required to change versioning state
- MFA is required to delete versions
- Serial Number (MFA) + MFA Code is passed with API calls
Performance Optimization
- By default, an upload is done using a single PUT Object API call
- If stream fails, the whole upload fails
- In AWS single part upload supports a max of 5GB
Multipart Upload
- Data is broken up
- Minimum is 100mb for 1 blob of data
- A upload can be split into a max of 10000 max parts
- Each blob can be from 5MB to 5GB
- Last part can be smaller than 5MB
- Failed blobs can be restarted
- Transfer rate = sum of speed of all parts
Accelerated Transfer
- By default S3 uploads take the public internet which is slow in nature
- Uses AWS edge location networks
- By default transfer acceleration is turned off
- Bucket name cannot contain periods and the name should be DNS name compatible
- Using this, the transfer to the closest edge location happens via public internet post which the data is transferred within AWS private network
Labs
Bucket Policy
- Create a bucket policy to provide public access
- Create a bucket policy to deny access to specific IP
- Create a bucket policy to deny access without MFA to a prefix
- Access a bucket in an account
- By an identity in the same account
- By identity in another account
- By anonymous identity
Static Website Hosting
- Host a static website backed by S3
- Redirect requests for an object
- Custom redirection rules
- Redirect from http to https
- Track website usage metrics
- Requester pays model
Versioning
- Request a specific version of an object
Performance Optimization
- S3 Accelerated Transfer
- S3 multipart upload
Storage Classes
- Store objects in different storage classes
- Try out intelligent tiering
- Features
- Components
- Buckets
- Objects
- Access Points
- Security
- Monitoring
- Perf Otimization
- Managing Storage
- Replication
- Versioning
- Archiving
- Backups
- Object Locking
- Storage Classes