I am trying download a directory inside s3 bucket. I am trying to use transfer to download a directory from S3 bucket but I am getting an error as «An error occurred (404) when calling the HeadObject operation: Not Found». Please help.
S3 structure:
**Bucket
Folder1
File1**
Note: Trying to download Folder1
transfer.download_file(self.bucket_name, self.dir_name, self.file_dir + self.dir_name)
asked Oct 8, 2017 at 20:46
3
I had the same issue recently. You are probably misspelling the path and folder name. In my case, for example, I was messing up with the ‘/’.
To fix it, make sure the variables you are using as arguments for the function contains the correct names of the directories, folders and files as it is in S3. Also, make sure you put the ‘/’ in the correct places in the correct variables. For instance, in my case I found that:
- bucket name: bucket_name (with no ‘/’ in the end, and no ‘s3://’)
- directory name: folder1/folder2/file_name (with no ‘/’ in the beginning)
I hope it helps you and others to get around this error easily.
answered Dec 20, 2017 at 16:02
2
Yet another possibly is that you entered an incorrect endpoint_url
parameter when creating your S3 resource.
For future users, create your resource like this:
s3 = boto3.resource(
's3',
region_name=[your region, e.g. eu-central-1],
aws_access_key_id=[your access key],
aws_secret_access_key=[your secret key]
)
In the above, it is possible to pass an endpoint_url
, as I erroneously did (I later found out that I had accidentally passed the endpoint URL to a different AWS service).
If you are using AWS CLI in order to authenticate, you can omit the region_name
, aws_access_key
, and aws_secret_access_key
parameters, like so:
s3 = boto3.resource('s3')
answered Apr 18, 2019 at 12:40
ShreyasShreyas
6392 gold badges6 silver badges20 bronze badges
Spent much time finding why I got this error message with DigitalOcean platform.
The request should be performed as that:
client = boto3.client('s3', endpoint_url='https://fra1.digitaloceanspaces.com')
client.download_file('mybucketname', 'remotefilekeytoread', 'localfilenametosave')
If endpoint_url is set as ‘https://mybucketname.fra1.digitaloceanspaces.com’, download will fail with 404 error, even though other things like requesting signed URLs work with this endpoint URL. Hope this helps anyone.
answered Sep 2, 2021 at 10:20
Spent too much time on this. Here’s a quick fix —
s3_client = boto3.client('s3')
s3_client.download_file('item1','item2', 'item3')
Here in .download_file
item1 = bucket name, e.g. ‘lambda-ec2-test-bucket’
item2 = location of the key pair .pem file in that s3 bucket. e.g. ‘keys/kp08092022.pem’
item3 = «tmp» folder in your lambda function where you want to save the downloaded file. e.g. ‘/tmp/keyname.pem’
Now the below code with the examples should work perfectly —
s3_client = boto3.client('s3')
#Download private key file from secure S3 bucket
s3_client.download_file('lambda-ec2-test-bucket','kp08092022.pem', '/tmp/keyname.pem')
answered Aug 9, 2022 at 18:09
Describe the bug
Unable to get S3.Object fields for «directory» object only if it was created via boto3 API.
When we create directories manually in AWS S3 Web-Console everything works fine.
Important
We have two user accounts with AdministratorAccess permission:
- First for s3-bot and programmatically access.
- Second for user and manual manipulations with S3 Service.
Steps to reproduce
This leads to ERROR 404
- Create clear bucket (for example, «test-bucket») manually in AWS S3 Web-Console.
- Upload file with two sub-directories in path (for example, «first/second/file.txt»):
s3_resource = boto3.resource(
"s3",
aws_access_key_id="my_access_key_id",
aws_secret_access_key="my_secret_access_key",
region_name="us-east-1"
)
bucket = s3_resource.Bucket(name="test-bucket")
bucket.upload_file(Filename=r"c:testfile.txt", Key="first/second/file.txt")
- Try to get object of «first/» directory and print it’s content length:
first_dir = bucket.Object(key=r"first/")
print(first_dir.content_length)
- Here comes error botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found.
This work’s fine
- Create clear bucket (for example, «test-bucket-2») manually in AWS S3 Web-Console.
- Create manually nested directory structure in your new bucket — «first_dir/second_dir/».
- Try to get object of «first_dir/» directory and print it’s content length:
s3_resource = boto3.resource(
"s3",
aws_access_key_id="my_access_key_id",
aws_secret_access_key="my_secret_access_key",
region_name="us-east-1"
)
bucket = s3_resource.Bucket(name="test-bucket")
first_dir = bucket.Object(key=r"first_dir/")
print(first_dir.content_length)
- Now we have our «content_length» value.
Expected behavior
A clear and concise description of what you expected to happen.
Debug logs
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from before-call.apigateway to before-call.api-gateway
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2020-05-22 17:49:04,527 botocore.hooks [DEBUG] Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
2020-05-22 17:49:04,543 botocore.hooks [DEBUG] Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2020-05-22 17:49:04,543 botocore.hooks [DEBUG] Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
2020-05-22 17:49:04,543 botocore.hooks [DEBUG] Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2020-05-22 17:49:04,543 botocore.hooks [DEBUG] Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
2020-05-22 17:49:04,574 botocore.loaders [DEBUG] Loading JSON file: C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesboto3datas32006-03-01resources-1.json
2020-05-22 17:49:04,574 botocore.loaders [DEBUG] Loading JSON file: C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesbotocoredataendpoints.json
2020-05-22 17:49:04,574 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x0000027DB613FF28>
2020-05-22 17:49:04,626 botocore.loaders [DEBUG] Loading JSON file: C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesbotocoredatas32006-03-01service-2.json
2020-05-22 17:49:04,631 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x0000027DB6116B70>
2020-05-22 17:49:04,631 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x0000027DB636AB70>
2020-05-22 17:49:04,653 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x0000027DB6116950>
2020-05-22 17:49:04,654 botocore.endpoint [DEBUG] Setting s3 timeout as (60, 60)
2020-05-22 17:49:04,655 botocore.loaders [DEBUG] Loading JSON file: C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesbotocoredata_retry.json
2020-05-22 17:49:04,656 botocore.client [DEBUG] Registering retry handlers for service: s3
2020-05-22 17:49:04,656 boto3.resources.factory [DEBUG] Loading s3:s3
2020-05-22 17:49:04,657 boto3.resources.factory [DEBUG] Loading s3:Bucket
2020-05-22 17:49:04,657 boto3.resources.model [DEBUG] Renaming Bucket attribute name
2020-05-22 17:49:04,657 botocore.hooks [DEBUG] Event creating-resource-class.s3.Bucket: calling handler <function lazy_call.<locals>._handler at 0x0000027DB63B2510>
2020-05-22 17:49:04,658 s3transfer.utils [DEBUG] Acquiring 0
2020-05-22 17:49:04,658 s3transfer.tasks [DEBUG] UploadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x0000027DB6934780>}) about to wait for the following futures []
2020-05-22 17:49:04,658 s3transfer.tasks [DEBUG] UploadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x0000027DB6934780>}) done waiting for dependent futures
2020-05-22 17:49:04,659 s3transfer.tasks [DEBUG] Executing task UploadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x0000027DB6934780>}) with kwargs {'client': <botocore.client.S3 object at 0x0000027DB68AC588>, 'config': <boto3.s3.transfer.TransferConfig object at 0x0000027DB69340B8>, 'osutil': <s3transfer.utils.OSUtils object at 0x0000027DB6934080>, 'request_executor': <s3transfer.futures.BoundedExecutor object at 0x0000027DB69342B0>, 'transfer_future': <s3transfer.futures.TransferFuture object at 0x0000027DB6934780>}
2020-05-22 17:49:04,659 s3transfer.futures [DEBUG] Submitting task PutObjectTask(transfer_id=0, {'bucket': 'tarakhti-test', 'key': 'first/second/file.txt', 'extra_args': {}}) to executor <s3transfer.futures.BoundedExecutor object at 0x0000027DB69342B0> for transfer request: 0.
2020-05-22 17:49:04,659 s3transfer.utils [DEBUG] Acquiring 0
2020-05-22 17:49:04,659 s3transfer.tasks [DEBUG] PutObjectTask(transfer_id=0, {'bucket': 'tarakhti-test', 'key': 'first/second/file.txt', 'extra_args': {}}) about to wait for the following futures []
2020-05-22 17:49:04,659 s3transfer.tasks [DEBUG] PutObjectTask(transfer_id=0, {'bucket': 'tarakhti-test', 'key': 'first/second/file.txt', 'extra_args': {}}) done waiting for dependent futures
2020-05-22 17:49:04,659 s3transfer.tasks [DEBUG] Executing task PutObjectTask(transfer_id=0, {'bucket': 'tarakhti-test', 'key': 'first/second/file.txt', 'extra_args': {}}) with kwargs {'client': <botocore.client.S3 object at 0x0000027DB68AC588>, 'fileobj': <s3transfer.utils.ReadFileChunk object at 0x0000027DB6934B38>, 'bucket': 'tarakhti-test', 'key': 'first/second/file.txt', 'extra_args': {}}
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x0000027DB61796A8>
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x0000027DB6176AE8>
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x0000027DB6179F28>
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x0000027DB6176A60>
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x0000027DB68CF630>>
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x0000027DB68CFCC0>>
2020-05-22 17:49:04,660 botocore.hooks [DEBUG] Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x0000027DB61766A8>
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x0000027DB61769D8>
2020-05-22 17:49:04,662 s3transfer.utils [DEBUG] Releasing acquire 0/None
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x0000027DB6176D90>
2020-05-22 17:49:04,662 botocore.handlers [DEBUG] Adding expect 100 continue header to request.
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x0000027DB68CF630>>
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event before-call.s3.PutObject: calling handler <function inject_api_version_header_if_needed at 0x0000027DB617A0D0>
2020-05-22 17:49:04,662 botocore.endpoint [DEBUG] Making request for OperationModel(name=PutObject) with params: {'url_path': '/tarakhti-test/first/second/file.txt', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Boto3/1.13.14 Python/3.7.3 Windows/10 Botocore/1.16.14 Resource', 'Content-MD5': 'mfivAtEyk6BJ20L8BGF/oA==', 'Expect': '100-continue'}, 'body': <s3transfer.utils.ReadFileChunk object at 0x0000027DB6934B38>, 'url': 'https://s3.amazonaws.com/tarakhti-test/first/second/file.txt', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x0000027DB68AC6A0>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'tarakhti-test'}}}
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event request-created.s3.PutObject: calling handler <function signal_not_transferring at 0x0000027DB66DAAE8>
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x0000027DB68ACA20>>
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event choose-signer.s3.PutObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x0000027DB63F8908>>
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x0000027DB6176598>
2020-05-22 17:49:04,662 botocore.hooks [DEBUG] Event before-sign.s3.PutObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x0000027DB68CFD30>>
2020-05-22 17:49:04,662 botocore.utils [DEBUG] Defaulting to S3 virtual host style addressing with path style addressing fallback.
2020-05-22 17:49:04,662 botocore.utils [DEBUG] Checking for DNS compatible bucket for: https://s3.amazonaws.com/tarakhti-test/first/second/file.txt
2020-05-22 17:49:04,662 botocore.utils [DEBUG] URI updated to: https://tarakhti-test.s3.amazonaws.com/first/second/file.txt
2020-05-22 17:49:04,662 botocore.auth [DEBUG] Calculating signature using v4 auth.
2020-05-22 17:49:04,662 botocore.auth [DEBUG] CanonicalRequest:
PUT
/first/second/file.txt
content-md5:mfivAtEyk6BJ20L8BGF/oA==
host:tarakhti-test.s3.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20200522T144904Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2020-05-22 17:49:04,662 botocore.auth [DEBUG] StringToSign:
AWS4-HMAC-SHA256
20200522T144904Z
20200522/us-east-1/s3/aws4_request
589c3fb87a672582117df8c32afc0410a55501ced06b296b5973897af5e1804a
2020-05-22 17:49:04,663 botocore.auth [DEBUG] Signature:
0a50f542fac6771b98a0f3cb4a55e2cb7b7bcc676035389f9cb9a4db3051c140
2020-05-22 17:49:04,663 botocore.hooks [DEBUG] Event request-created.s3.PutObject: calling handler <function signal_transferring at 0x0000027DB66DAB70>
2020-05-22 17:49:04,663 botocore.endpoint [DEBUG] Sending http request: <AWSPreparedRequest stream_output=False, method=PUT, url=https://tarakhti-test.s3.amazonaws.com/first/second/file.txt, headers={'User-Agent': b'Boto3/1.13.14 Python/3.7.3 Windows/10 Botocore/1.16.14 Resource', 'Content-MD5': b'mfivAtEyk6BJ20L8BGF/oA==', 'Expect': b'100-continue', 'X-Amz-Date': b'20200522T144904Z', 'X-Amz-Content-SHA256': b'UNSIGNED-PAYLOAD', 'Authorization': b'AWS4-HMAC-SHA256 Credential=AKIATJFUOQC4GVET5CJ5/20200522/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=0a50f542fac6771b98a0f3cb4a55e2cb7b7bcc676035389f9cb9a4db3051c140', 'Content-Length': '21'}>
2020-05-22 17:49:04,663 urllib3.connectionpool [DEBUG] Starting new HTTPS connection (1): tarakhti-test.s3.amazonaws.com:443
2020-05-22 17:49:05,168 botocore.awsrequest [DEBUG] Waiting for 100 Continue response.
2020-05-22 17:49:05,324 botocore.awsrequest [DEBUG] 100 Continue response seen, now sending request body.
2020-05-22 17:49:05,496 urllib3.connectionpool [DEBUG] https://tarakhti-test.s3.amazonaws.com:443 "PUT /first/second/file.txt HTTP/1.1" 200 0
2020-05-22 17:49:05,496 botocore.parsers [DEBUG] Response headers: {'x-amz-id-2': 'f+mYBYi9cjNIiD583lJwA6/7vm1FT8sZFo5XTbR2zTZcNu6WIRbCp6+wZoenJndTPMjpJ7+wnXs=', 'x-amz-request-id': '8CF3961EB3937A02', 'Date': 'Fri, 22 May 2020 14:49:06 GMT', 'ETag': '"99f8af02d13293a049db42fc04617fa0"', 'Content-Length': '0', 'Server': 'AmazonS3'}
2020-05-22 17:49:05,496 botocore.parsers [DEBUG] Response body:
b''
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x0000027DB68CF978>
2020-05-22 17:49:05,496 botocore.retryhandler [DEBUG] No retry needed.
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x0000027DB68CF630>>
2020-05-22 17:49:05,496 s3transfer.utils [DEBUG] Releasing acquire 0/None
2020-05-22 17:49:05,496 boto3.resources.factory [DEBUG] Loading s3:Object
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event creating-resource-class.s3.Object: calling handler <function lazy_call.<locals>._handler at 0x0000027DB63B2598>
2020-05-22 17:49:05,496 boto3.resources.action [DEBUG] Calling s3:head_object with {'Bucket': 'tarakhti-test', 'Key': 'first/'}
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-parameter-build.s3.HeadObject: calling handler <function sse_md5 at 0x0000027DB6176AE8>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-parameter-build.s3.HeadObject: calling handler <function validate_bucket_name at 0x0000027DB6176A60>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-parameter-build.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x0000027DB68CF630>>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-parameter-build.s3.HeadObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x0000027DB68CFCC0>>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-parameter-build.s3.HeadObject: calling handler <function generate_idempotent_uuid at 0x0000027DB61766A8>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-call.s3.HeadObject: calling handler <function add_expect_header at 0x0000027DB6176D90>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-call.s3.HeadObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x0000027DB68CF630>>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event before-call.s3.HeadObject: calling handler <function inject_api_version_header_if_needed at 0x0000027DB617A0D0>
2020-05-22 17:49:05,496 botocore.endpoint [DEBUG] Making request for OperationModel(name=HeadObject) with params: {'url_path': '/tarakhti-test/first/', 'query_string': {}, 'method': 'HEAD', 'headers': {'User-Agent': 'Boto3/1.13.14 Python/3.7.3 Windows/10 Botocore/1.16.14 Resource'}, 'body': b'', 'url': 'https://s3.amazonaws.com/tarakhti-test/first/', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x0000027DB68AC6A0>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': 'tarakhti-test'}}}
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event request-created.s3.HeadObject: calling handler <function signal_not_transferring at 0x0000027DB66DAAE8>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event request-created.s3.HeadObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x0000027DB68ACA20>>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event choose-signer.s3.HeadObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x0000027DB63F8908>>
2020-05-22 17:49:05,496 botocore.hooks [DEBUG] Event choose-signer.s3.HeadObject: calling handler <function set_operation_specific_signer at 0x0000027DB6176598>
2020-05-22 17:49:05,512 botocore.hooks [DEBUG] Event before-sign.s3.HeadObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x0000027DB68CFD30>>
2020-05-22 17:49:05,512 botocore.utils [DEBUG] Checking for DNS compatible bucket for: https://s3.amazonaws.com/tarakhti-test/first/
2020-05-22 17:49:05,512 botocore.utils [DEBUG] URI updated to: https://tarakhti-test.s3.amazonaws.com/first/
2020-05-22 17:49:05,512 botocore.auth [DEBUG] Calculating signature using v4 auth.
2020-05-22 17:49:05,512 botocore.auth [DEBUG] CanonicalRequest:
HEAD
/first/
host:tarakhti-test.s3.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20200522T144905Z
host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2020-05-22 17:49:05,512 botocore.auth [DEBUG] StringToSign:
AWS4-HMAC-SHA256
20200522T144905Z
20200522/us-east-1/s3/aws4_request
9aecf738d76489a898ce323a23650c1ea2c1d5dc51a95ae364c4c613434770f7
2020-05-22 17:49:05,512 botocore.auth [DEBUG] Signature:
281122febb2c5d1164fb21175080d62ea2cae7d5a91dda2f6acf854e5f32c00b
2020-05-22 17:49:05,512 botocore.hooks [DEBUG] Event request-created.s3.HeadObject: calling handler <function signal_transferring at 0x0000027DB66DAB70>
2020-05-22 17:49:05,512 botocore.endpoint [DEBUG] Sending http request: <AWSPreparedRequest stream_output=False, method=HEAD, url=https://tarakhti-test.s3.amazonaws.com/first/, headers={'User-Agent': b'Boto3/1.13.14 Python/3.7.3 Windows/10 Botocore/1.16.14 Resource', 'X-Amz-Date': b'20200522T144905Z', 'X-Amz-Content-SHA256': b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': b'AWS4-HMAC-SHA256 Credential=AKIATJFUOQC4GVET5CJ5/20200522/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=281122febb2c5d1164fb21175080d62ea2cae7d5a91dda2f6acf854e5f32c00b'}>
2020-05-22 17:49:05,652 urllib3.connectionpool [DEBUG] https://tarakhti-test.s3.amazonaws.com:443 "HEAD /first/ HTTP/1.1" 404 0
2020-05-22 17:49:05,652 botocore.parsers [DEBUG] Response headers: {'x-amz-request-id': '2F8918091001D969', 'x-amz-id-2': 'VN5/HE64IJmBwjRTTAr7KDGHWzHbnLSH2OMJSRCmS5t2qWE4amQkdpYLRELyxYOGH/iuUMN98ug=', 'Content-Type': 'application/xml', 'Transfer-Encoding': 'chunked', 'Date': 'Fri, 22 May 2020 14:49:05 GMT', 'Server': 'AmazonS3'}
2020-05-22 17:49:05,652 botocore.parsers [DEBUG] Response body:
b''
2020-05-22 17:49:05,653 botocore.hooks [DEBUG] Event needs-retry.s3.HeadObject: calling handler <botocore.retryhandler.RetryHandler object at 0x0000027DB68CF978>
2020-05-22 17:49:05,653 botocore.retryhandler [DEBUG] No retry needed.
2020-05-22 17:49:05,653 botocore.hooks [DEBUG] Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x0000027DB68CF630>>
Traceback (most recent call last):
File "C:/Projects/Veeam Test/boto3_example/main.py", line 97, in <module>
main()
File "C:/Projects/Veeam Test/boto3_example/main.py", line 88, in main
print(first_dir.content_length)
File "C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesboto3resourcesfactory.py", line 339, in property_loader
self.load()
File "C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesboto3resourcesfactory.py", line 505, in do_action
response = action(self, *args, **kwargs)
File "C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesboto3resourcesaction.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
File "C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesbotocoreclient.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:ProjectsVeeam Testboto3_examplevenvlibsite-packagesbotocoreclient.py", line 635, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
Problem Statement − Use boto3 library in Python to download an object from S3 at a given local path/default path with overwrite existing file as true. For example, download test.zip from Bucket_1/testfolder of S3.
Approach/Algorithm to solve this problem
Step 1 − Import boto3 and botocore exceptions to handle exceptions.
Step 2 − From pathlib, import Path to check filename
Step 3 − s3_path, localpath and overwrite_existing_file are the three parameters in the function download_object_from_s3
Step 4 − Validate the s3_path is passed in AWS format as s3://bucket_name/key. By default, localpath = None and overwrite_existing_file = True. User can pass these values as well to download in a given local path
Step 5 − Create an AWS session using boto3 library.
Step 6 − Create an AWS resource for S3.
Step 7 − Split the S3 path and perform operations to separate the root bucket name and the object path to download.
Step 8 − Check whether overwrite_existing_file set as False and the file already exists in a given local path; in that case don’t do any operation.
Step 9 − Else (if any of these conditions are not true), download the object. If localpath is given, download there; else download into default path.
Step 10 − Handle the exception based on response code to validate whether the file is downloaded or not.
Step 11 − Handle the generic exception if something went wrong while downloading the file.
Example
Use the following code to download a file from AWS S3 −
import boto3 from botocore.exceptions import ClientError from pathlib import Path def download_object_from_s3(s3path, localPath=None, overwrite_existing_file=True): if 's3://' not in s3path: print('Given path is not a valid s3 path.') raise Exception('Given path is not a valid s3 path.') session = boto3.session.Session() s3_resource = session.resource('s3') s3_tokens = s3path.split('/') bucket_name = s3_tokens[2] object_path = "" filename = s3_tokens[len(s3_tokens) - 1] print('Filename: ' + filename) if len(s3_tokens) > 4: for tokn in range(3, len(s3_tokens) - 1): object_path += s3_tokens[tokn] + "/" object_path += filename else: object_path += filename print('object: ' + object_path) try: if not overwrite_existing_file and Path.is_file(filename): pass else: if localPath is None: s3_resource.meta.client.download_file(bucket_name, object_path, filename) else: s3_resource.meta.client.download_file(bucket_name, object_path, localPath + '/' + filename) print('Filename: ' + filename) return filename except ClientError as error: if error.response['Error']['Code'] == '404': print(s3path + " File not found: ") raise Exception(s3path + " File not found: ") except Exception as error: print("Unexpected error in download_object function of s3 helper: " + error.__str__()) raise Exception("Unexpected error in download_object function of s3 helper: " + error.__str__()) #Download into default localpath print(download_object_from_s3("s3://Bucket_1/testfolder/test.zip")) #Download into given path print(download_object_from_s3("s3://Bucket_1/testfolder/test.zip","C://AWS")) #File doesn’t exist in S3 print(download_object_from_s3("s3://Bucket_1/testfolder/abc.zip"))
Output
#Download into default localpath Filename: test.zip object: testfolder/test.zip Filename: test.zip #Download into given path Filename: test.zip object: testfolder/test.zip Filename: test.zip #File doesn’t exist in S3 Filename: abc.zip object: testfolder/abc.zip s3://Bucket_1/testfolder/abc.zip File not found: botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
Note: The default path to download is the directory where this function is written. In the same directory, file will be downloaded if local path is not provided.
For example, if this function is written into S3_class and this class is present at C://AWS/src/S3_class, then file test.zip will be downloaded into C://AWS/src/test.zip
Using AWS SDK for Python can be confusing. First of all, there seems to be two different ones (Boto and Boto3). Even if you choose one, either one of them seems to have multiple ways to authenticate and connect to AWS services. Googling solutions can quickly become confusing as you may find different variations of code examples. If you are just getting into AWS, this can be scary. In this post, I will explain the different and give you the code examples that work by using the example of downloading files from S3.
Boto is the older version of Python AWS SDK. Boto3 is the newer version. Boto3 doesn’t mean it is for Python 3. It works for Python 2.6 or 2.7. All the code provided in this post will work for both Python versions. As Boto3 is rewritten from ground up, the code you write with boto3 is different from Boto. You can read further about the change made in Boto3 here.
Generally speaking, you should use Boto3 if you are writing new programs. It is the new and improved version of Boto. I personally find it more efficient and easier to use. You also need to know Boto to maintain legacy code.
Before you start, you need to install boto and boto3. All the code work for both Python 2.7 and 3. You can use either one of them
pip install boto
pip install boto3
(1) Downloading S3 Files With Boto3
Boto3 provides super-easy way to configure credentials and access to AWS resources. To connect to S3, you can either create a S3 resorce or S3 client. Resource is an object-oriented interface to AWS and provides higher-level abstraction while Client is a low-level interface to AWS whose methods map close to 1:1 with service APIs.
1-1. Using Resource
You can create s3 resources with boto3.resource(‘s3’). It will use the AWS credentials configured in your AWS CLI (see here). This is perfect when you want to use the code between different environments as the credentials come from environment variable and you do not need to hardcode it. Once you have the resources, create the bucket object and use the download_file method. This page shows different s3 methods that you can use with resource.
import boto3
import botocore
def download_file_with_resource(bucket_name, key, local_path):
s3 = boto3.resource(‘s3’)
s3.Bucket(bucket_name).download_file(key, local_path)
print(‘Downloaded File with boto3 resource’)
bucket_name = ‘<your bucket name>’
key = ‘<folder…/filename>’
local_path = ‘<e.g. ./log.txt>’
download_file_with_resource(bucket_name, key, local_path)
Here is the troubleshooting tip for 404 error: An error occurred (404) when calling the HeadObject operation: Not Found.
You first need to make sure you have the correct bucket & key names. If you still get this error after triple-checking bucket name and object key, make sure your key does not start with ‘/’. For example, key = ‘/2018-02-27/log.txt’ should be key = ‘2018-02-27/log.txt’ in the code above.
1-2. Using Client
With Client, you can specify the credentials. As opposed to Boto, you do not need to specify the region. Boto3 handles it for you. You can also configure multiple credentials in AWS CLI and choose it to connect to non-default S3 with client. This documentation here is for further details on configuring credentials. This page shows the list of methods you can use with client. Note that download_file method argument is different from the one used with resource.
import boto3
import botocore
def download_file_with_client(access_key, secret_key, bucket_name, key, local_path):
client = boto3.client(‘s3’,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
)
client.download_file(bucket_name, key, local_path)
print(‘Downloaded frile with boto3 client’)
access_key = ‘<Access Key>’
secret_key = ‘<Secret Key>’
bucket_name = ‘<your bucket name>’
key = ‘<folder…/filename>’
local_path = ‘<e.g. ./log.txt>’
download_file_with_client(access_key, secret_key, bucket_name, key, local_path)
(2) Downloading S3 Files With Boto
Boto requires more parameters compared to Boto3 and you need to know a little bit more about the package to make it work.
Here are some common errors and how to handle them.
Error 1:
sl.CertificateError: hostname your.bucket.name.s3.amazonaws.com’ doesn’t match either of ‘*.s3.amazonaws.com’, ‘s3.amazonaws.com’
You will get this error when your bucket contain ‘.’ Without specifying OrdinaryCallingFormat() in connect_s3().
Error 2:
Cannot resolve boto.exception.S3ResponseError: S3ResponseError: 301 Moved Permanently
You will get this error without specifying the host with region if you are not using one of the default US region. Otherwise, it will use s3.amazonaws.com as a host name which assumes your bucket is created in the default US region. Alternatively, you can use boto.s3.connect_to_region() to specify the region.
Here is the further reference on Python S3 with Boto.
2-1. Using boto.connect_s3() and default credentials
Let’s start with using the default credential configures with AWS CLI. This is the equivalent of the first code in Boto3. You can see how simplified Boto3 is.
import boto
import boto.s3.connection
def download_data(region, bucket_name, key, local_path):
conn = boto.connect_s3(
host=‘s3-{}.amazonaws.com’.format(region),
calling_format = boto.s3.connection.OrdinaryCallingFormat())
bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(key)
key.get_contents_to_filename(local_path)
print(‘Downloaded File {} to {}’.format(key, local_path))
region = ‘<Region e.g. ap-southeast-2>’
bucket_name = ‘<your bucket name>’
key = ‘<folder…/filename>’
local_path = ‘<e.g. ./log.txt>’
download_data(region, bucket_name, key, local_path)
2-2. Using boto.connect_s3() with custom credentials
Of course you can configure the credential within the code.
import boto
import boto.s3.connection
def download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path):
conn = boto.connect_s3(aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host=‘s3-{}.amazonaws.com’.format(region),
calling_format = boto.s3.connection.OrdinaryCallingFormat()
)
bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(key)
key.get_contents_to_filename(local_path)
print(‘Downloaded File {} to {}’.format(key, local_path))
region = ‘<Region e.g. ap-southeast-2>’
access_key = ‘<Access Key>’
secret_key = ‘<Secret Key>’
bucket_name = ‘<your bucket name>’
key = ‘<folder…/filename>’
local_path = ‘<e.g. ./log.txt>’
download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path)
2-3. Using boto.s3.connect_to_region()
The main difference from connect_s3() is that this function takes region as an argument while the region had to be defined in the host url in connect_s3(). This is slightly simpler implementation to specify the region.
import boto
import boto.s3.connection
def download_data_connect_to_region(region, access_key, secret_key, bucket_name, key, local_path):
»’This will use connect_to_region() function in boto»’
conn = boto.s3.connect_to_region(
region_name=region,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
calling_format=boto.s3.connection.OrdinaryCallingFormat()
)
bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(key)
key.get_contents_to_filename(local_path)
print(‘Downloaded File {} to {}’.format(key, local_path))
region = ‘<Region e.g. ap-southeast-2>’
access_key = ‘<Access Key>’
secret_key = ‘<Secret Key>’
bucket_name = ‘<your bucket name>’
key = ‘<folder…/filename>’
local_path = ‘<e.g. ./log.txt>’
download_data_connect_to_region(region, access_key, secret_key, bucket_name, key, local_path)
Hope this clarifies a few things!
If you are interested in moving data to S3 and Redshift, check out this post: Data Engineering in S3 and Redshift with Python.
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
HeadObject returns 404 when specifying a version ID in an unversioned bucket
Expected Behavior
It should return 400 —
aws s3api head-object --bucket my-unversioned-bucket --key what --version-id nope An error occurred (400) when calling the HeadObject operation: Bad Request
How are you starting LocalStack?
With a docker run
command
Steps To Reproduce
How are you starting localstack (e.g., bin/localstack
command, arguments, or docker-compose.yml
)
docker run --rm -d -p 4566:4566 -e 'DEFAULT_REGION=eu-west-1' -e 'SERVICES=s3' --name localstack localstack/localstack:0.12.17.5
Client commands (e.g., AWS SDK code snippet, or sequence of «awslocal» commands)
awslocal s3api create-bucket --bucket noversion
awslocal s3api head-object --bucket noversion --version-id what --key test
Returns An error occurred (404) when calling the HeadObject operation: Not Found
Environment
- OS: Arch - LocalStack: 0.12.17.5
Anything else?
No response