Amazon S3 (Simple Storage Service) is a file storage web service offered as part of Amazon Web Services (AWS). Files are uploaded using HTTP REST, e.g. via a super basic web UI hosted by AWS S3 or via third-party upload tools. It’s of course also accessible programatically via countless of SDKs available for all languages/platforms. If the files are marked “public” they can optionally be served as a static website. It’s possible to setup logging to track how often each file/page is loaded, custom 404 pages and you can configure custom HTTP response headers including cache control and CORS etc. It should be noted that while the REST endpoint offers https, the static website endpoint is http only. S3 can also act as a BitTorrent seeder, just append ?torrent to any file URL.

S3 first launched in 2006 and since then many competitors has sprung up (often API compatible with S3), for example Google Cloud Storage and Azure Storage. S3 has a 99.9% SLA (i.e it can be down up to 60 * 24 * 30 * (1 - 0.999) == 43.2 minutes per month) but this only means that you get a 10-25% discount the next month if the service has been down more than that. S3 is billed both based on how much you store and how much that content is downloaded. It does not offer any automated cost control (hard quota based on pre-paid amount), but you can configure AWS to send you an alert if your bill exceeds a limit you set. There is a billing dashboard that gives you a quick overview, and if you’re on the free tier it will tell you how many free S3 PUTs / GETs you have left each month etc. It’s also possible to create “Requester Pays” buckets where the person downloading pays for the request, or optionally pays for the request plus some price that is paid out to the bucket owner via AWS DevPay.

Another basic way to interact with S3 buckets are via the AWS CLI utility. If you’re using Ubuntu you can usually just do sudo apt-get install awscli to get it, or if you for some reasons need the absolute latest version you might want to do sudo apt-get install python-pip followed by sudo pip install awscli instead. Once you have that installed, you can list the contents of public S3 buckets. For example, there is a public S3 bucket called “flaws.cloud” and you can list the contents in it by running:

aws --no-sign-request s3 ls flaws.cloud

By default the AWS CLI utility will sign all requests using your credentials, but you can access public buckets without even having your own credentials if you pass the --no-sign-request parameter like above.

To create your own S3 buckets you need to register an AWS account (there is a free tier, that you can use for up to a year, that includes basic variants of both S3 and EC2 and bunch of other stuff). Once you have registered, go to the AWS management console and click your name in the top-right corner. From that dropdown menu, select “My Security Credentials”, and then click “Continue to Security Credentials”. You really don’t want AWS credentials stolen, so before doing anything else, it’s a good idea to enable two-factor authentication (you can use any TOTP device like a smartphone running Google authenticator). To do that, expand “Multi-Factor Authentication” and click “Activate MFA”, select “A virtual MFA device”, scan the barcode etc.

To access AWS programmatically or via the AWS CLI utility you need to create an access key, you can do this directly from the “Security Credentials” screen on your account but such a key would have complete access to your entire account so instead it’s recommended that you create separate sub-accounts using Amazon IAM and that each such account is given only the specific access that it needs. To do this, click the “Services” link in the top left corner and select “IAM”. Then click “Users”, then “Add User”, check “Programmatic Access” and in the “Permissions” step, use “Attach existing policies directly” to give the user “AmazonS3FullAccess”. For a production system you’d probably want to attach the permissions to a group instead of course. Once the user has been created you get an “access key id” (starting with “AKIA…”) and an “Secret access key”.

Once you have an “access key id” and the corresponding “secret” you can configure these in the AWS CLI utility by running aws configure. This will make the specified key the default key, so all commands submitted to AWS using the CLI utility will be signed by this key.

Once you have a default key setup, you can find S3 commands described in aws s3 help and more advanced S3 API calls described in aws s3api help. For example:

# Try to create a bucket called "test" (hits BucketAlreadyExists since name is taken)
aws s3 mb s3://test

# List all your buckets
aws s3 ls

# Upload a file to a bucket
aws s3 cp file.txt s3://bucket-name-here/

# Upload a file to a bucket ... and also make it world readable
aws s3 cp file.txt s3://bucket-name-here/ --acl public-read

# Show the contents of a text file stored in a bucket
aws s3 cp s3://some-bucket-name/object -

# Remove a file from a bucket
aws s3 rm s3://my-bucket-name/file.txt

# Remove a bucket (only works if bucket is empty)
aws s3 rb s3://bucket-name

The above commands all operate on the default region you specified when you ran aws configure but you can pass --region foo to use another region. Also, if you want, you can configure several keys in the AWS CLI tool, just use a --profile profile-name-here parameter to select the key you want to use (both when you run “aws configure” and then every time you invoke a regular command).

When interacting with the S3 REST endpoint, you can two different URL styles, called “Virtual Hosted-style” or “Path-style”. In the former, the bucket name is part of the domain, and in the latter it is the first part of the path. For example, if you have uploaded a file to stuff/items/file.txt in the bucket situla-deadbeef in region eu-west-1, it can be referred to using either one of the curl command URLs below:

$ aws s3 --region eu-west-1 mb s3://situla-123
$ echo hello | aws s3 cp - s3://situla-123/stuff/items/file.txt --acl public-read
$ aws s3 cp s3://situla-123/stuff/items/file2.txt -
hello

# First curl the file using "virtual hosted-style URL"
$ curl https://situla-123.s3-eu-west-1.amazonaws.com/stuff/items/file.txt
hello

# And now curl the file using "path-style URL"
$ curl https://s3-eu-west-1.amazonaws.com/situla-123/stuff/items/file4.txt
hello

One thing to keep in mind is that you might run into SSL certificate name mismatch when using “virtual hosted-style URLs” for buckets that have . in their name.

$ aws s3 --region eu-west-1 mb s3://fail.pail
$ echo hello | aws s3 cp - s3://fail.pail/file.txt --acl public-read

# Beware that bucket names that contain "." gives you a SSL name mismatch error
$ curl https://fail.pail.s3-eu-west-1.amazonaws.com/file.txt
curl: (51) SSL: certificate subject name (*.s3-eu-west-1.amazonaws.com) does
      not match target host name 'fail.pail.s3-eu-west-1.amazonaws.com'

# But loading the same file over HTTPS using "Path-style URL" works fine
$ curl https://s3-eu-west-1.amazonaws.com/fail.pail/file.txt

And you are not allowed to create buckets with names that cannot be used in a hostname, for example foo..bar, foobar. and .foobar are all illegal bucket names. Bucket names may consist of 3-63 characters.

It’s also possible to make an S3 bucket listable even for anonymous users (for people that knows the bucket name). However, just like it’s a good idea to keep directory listing turned off on webservers, most of the time it’s best to not allow anonymous users to list your bucket contents; this is true even if many or all objects (files) in the bucket are marked with ACL “public-read”. If you want to make a bucket listable for anonymous users anyway, you can do it like this:

# Try to list bucket contents anonymously
$ aws --no-sign-request s3 ls s3://situla-123

# Make the bucket contents listable for anonymous users (previous command now works)
$ aws s3api put-bucket-acl --bucket situla-123 --acl public-read

Other useful things you might need to do are:

# Upload (or update if it already exists) a whole directory to S3
aws s3 sync ~/mydir s3://my-bucket/

# List files inside a bucket recursively
aws s3 ls --recursive s3://situla-123

# Make a public file private again
$ aws s3api put-object-acl --bucket my-bucket --key stuff/hello.txt --acl private

# Create a presigned URL for public access (valid for 30 seconds then stops working)
$ aws s3 presign s3://situla-42/private.txt --expires-in 30

To access your bucket as a static website you need to enable “Static website hosting” for it. The easiest way to do this is via the AWS console web UI, just select the bucket and switch to the “Properties” tab and you’ll find it there. The main difference compared to HTTP GET operations issued to the API REST endpoint is that the static website endpoint will serve a HTTP 403 Forbidden that contains HTML code instead of an XML error message. For URLs that point to files that doesn’t exists, the website endpoint returns a 404 document that you can configure when you enable “Static website hosting”. Unlike the REST endpoint, the website endpoint can also return redirects. You can access the website endpoint via:

http://foo-bucket.s3-website-REGION.amazonaws.com/prefix1/prefix2/file.txt

When talking about the file s3://my-bucket/subdir/blah/file.txt, the subdir/blah is a prefix to the full key name which is subdir/blah/file.txt. There is no concept of folders or subdirectories etc even though the simple web UI that S3 hosts kind of makes it look like S3 has folders. The full key name is a string of Unicode characters for which the UTF-8 encoding is at most 1024 bytes. While all Unicode characters are allowed, certain characters like ?, & and + needs URL encoding so they are a bit messy (especially, if your application has a bug and you need to do some trouble shooting using curl etc). The AWS S3 documentation has a list of characters that you should avoid in key names if possible.

In the properties page of the bucket inside the AWS web UI you can also enable versioning of all objects in that bucket. Once you’ve done this, the change history of each file is saved. You can then do:

# List all versions of the object specified by --prefix foo
$ aws s3api list-object-versions --bucket bkt \
        --prefix private.txt | jq -r '.Versions[] | .LastModified + " " + .VersionId'

# Use .versionId printed above to retrieve the old version of the file
$ aws s3api get-object --bucket bkt --key private.txt --version-id VERSION_ID foo.txt

Finally, I would like to recommend the flAWS challenge, it’s a fun little challenge where you need to fiddle around with AWS CLI tool a bit to gain access to the next “level”.