Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Making HTTP Requests With Python
The requests
library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.
Throughout this article, you’ll see some of the most useful features that requests
has to offer as well as how to customize and optimize those features for different situations you may come across. You’ll also learn how to use requests
in an efficient way as well as how to prevent requests to external services from slowing down your application.
In this tutorial, you’ll learn how to:
- Make requests using the most common HTTP methods
- Customize your requests’ headers and data, using the query string and message body
- Inspect data from your requests and responses
- Make authenticated requests
- Configure your requests to help prevent your application from backing up or slowing down
Though I’ve tried to include as much information as you need to understand the features and examples included in this article, I do assume a very basic general knowledge of HTTP. That said, you still may be able to follow along fine anyway.
Now that that is out of the way, let’s dive in and see how you can use requests
in your application!
Getting Started With requests
Let’s begin by installing the requests
library. To do so, run the following command:
If you prefer to use Pipenv for managing Python packages, you can run the following:
$ pipenv install requests
Once requests
is installed, you can use it in your application. Importing requests
looks like this:
Now that you’re all set up, it’s time to begin your journey through requests
. Your first goal will be learning how to make a GET
request.
The GET Request
HTTP methods such as GET
and POST
, determine which action you’re trying to perform when making an HTTP request. Besides GET
and POST
, there are several other common methods that you’ll use later in this tutorial.
One of the most common HTTP methods is GET
. The GET
method indicates that you’re trying to get or retrieve data from a specified resource. To make a GET
request, invoke requests.get()
.
To test this out, you can make a GET
request to GitHub’s Root REST API by calling get()
with the following URL:
>>>
>>> requests.get('https://api.github.com')
<Response [200]>
Congratulations! You’ve made your first request. Let’s dive a little deeper into the response of that request.
The Response
A Response
is a powerful object for inspecting the results of the request. Let’s make that same request again, but this time store the return value in a variable so that you can get a closer look at its attributes and behaviors:
>>>
>>> response = requests.get('https://api.github.com')
In this example, you’ve captured the return value of get()
, which is an instance of Response
, and stored it in a variable called response
. You can now use response
to see a lot of information about the results of your GET
request.
Status Codes
The first bit of information that you can gather from Response
is the status code. A status code informs you of the status of the request.
For example, a 200 OK
status means that your request was successful, whereas a 404 NOT FOUND
status means that the resource you were looking for was not found. There are many other possible status codes as well to give you specific insights into what happened with your request.
By accessing .status_code
, you can see the status code that the server returned:
>>>
>>> response.status_code
200
.status_code
returned a 200
, which means your request was successful and the server responded with the data you were requesting.
Sometimes, you might want to use this information to make decisions in your code:
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found.')
With this logic, if the server returns a 200
status code, your program will print Success!
. If the result is a 404
, your program will print Not Found
.
requests
goes one step further in simplifying this process for you. If you use a Response
instance in a conditional expression, it will evaluate to True
if the status code was between 200
and 400
, and False
otherwise.
Therefore, you can simplify the last example by rewriting the if
statement:
if response:
print('Success!')
else:
print('An error has occurred.')
Keep in mind that this method is not verifying that the status code is equal to 200
. The reason for this is that other status codes within the 200
to 400
range, such as 204 NO CONTENT
and 304 NOT MODIFIED
, are also considered successful in the sense that they provide some workable response.
For example, the 204
tells you that the response was successful, but there’s no content to return in the message body.
So, make sure you use this convenient shorthand only if you want to know if the request was generally successful and then, if necessary, handle the response appropriately based on the status code.
Let’s say you don’t want to check the response’s status code in an if
statement. Instead, you want to raise an exception if the request was unsuccessful. You can do this using .raise_for_status()
:
import requests
from requests.exceptions import HTTPError
for url in ['https://api.github.com', 'https://api.github.com/invalid']:
try:
response = requests.get(url)
# If the response was successful, no Exception will be raised
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}') # Python 3.6
except Exception as err:
print(f'Other error occurred: {err}') # Python 3.6
else:
print('Success!')
If you invoke .raise_for_status()
, an HTTPError
will be raised for certain status codes. If the status code indicates a successful request, the program will proceed without that exception being raised.
Now, you know a lot about how to deal with the status code of the response you got back from the server. However, when you make a GET
request, you rarely only care about the status code of the response. Usually, you want to see more. Next, you’ll see how to view the actual data that the server sent back in the body of the response.
Content
The response of a GET
request often has some valuable information, known as a payload, in the message body. Using the attributes and methods of Response
, you can view the payload in a variety of different formats.
To see the response’s content in bytes
, you use .content
:
>>>
>>> response = requests.get('https://api.github.com')
>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
While .content
gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. response
will do that for you when you access .text
:
>>>
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
Because the decoding of bytes
to a str
requires an encoding scheme, requests
will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding
before accessing .text
:
>>>
>>> response.encoding = 'utf-8' # Optional: requests infers this internally
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str
you retrieved from .text
and deserialize it using json.loads()
. However, a simpler way to accomplish this task is to use .json()
:
>>>
>>> response.json()
{'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.github.com/user/keys', 'notifications_url': 'https://api.github.com/notifications', 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}', 'organization_url': 'https://api.github.com/orgs/{org}', 'public_gists_url': 'https://api.github.com/gists/public', 'rate_limit_url': 'https://api.github.com/rate_limit', 'repository_url': 'https://api.github.com/repos/{owner}/{repo}', 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}', 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}', 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}', 'starred_gists_url': 'https://api.github.com/gists/starred', 'team_url': 'https://api.github.com/teams', 'user_url': 'https://api.github.com/users/{user}', 'user_organizations_url': 'https://api.github.com/user/orgs', 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}', 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}
The type
of the return value of .json()
is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
Query String Parameters
One common way to customize a GET
request is to pass values through query string parameters in the URL. To do this using get()
, you pass data to params
. For example, you can use GitHub’s Search API to look for the requests
library:
import requests
# Search GitHub's repositories for requests
response = requests.get(
'https://api.github.com/search/repositories',
params={'q': 'requests+language:python'},
)
# Inspect some attributes of the `requests` repository
json_response = response.json()
repository = json_response['items'][0]
print(f'Repository name: {repository["name"]}') # Python 3.6+
print(f'Repository description: {repository["description"]}') # Python 3.6+
By passing the dictionary {'q': 'requests+language:python'}
to the params
parameter of .get()
, you are able to modify the results that come back from the Search API.
You can pass params
to get()
in the form of a dictionary, as you have just done, or as a list of tuples:
>>>
>>> requests.get(
... 'https://api.github.com/search/repositories',
... params=[('q', 'requests+language:python')],
... )
<Response [200]>
You can even pass the values as bytes
:
>>>
>>> requests.get(
... 'https://api.github.com/search/repositories',
... params=b'q=requests+language:python',
... )
<Response [200]>
Query strings are useful for parameterizing GET
requests. You can also customize your requests by adding or modifying the headers you send.
Other HTTP Methods
Aside from GET
, other popular HTTP methods include POST
, PUT
, DELETE
, HEAD
, PATCH
, and OPTIONS
. requests
provides a method, with a similar signature to get()
, for each of these HTTP methods:
>>>
>>> requests.post('https://httpbin.org/post', data={'key':'value'})
>>> requests.put('https://httpbin.org/put', data={'key':'value'})
>>> requests.delete('https://httpbin.org/delete')
>>> requests.head('https://httpbin.org/get')
>>> requests.patch('https://httpbin.org/patch', data={'key':'value'})
>>> requests.options('https://httpbin.org/get')
Each function call makes a request to the httpbin
service using the corresponding HTTP method. For each method, you can inspect their responses in the same way you did before:
>>>
>>> response = requests.head('https://httpbin.org/get')
>>> response.headers['Content-Type']
'application/json'
>>> response = requests.delete('https://httpbin.org/delete')
>>> json_response = response.json()
>>> json_response['args']
{}
Headers, response bodies, status codes, and more are returned in the Response
for each method. Next you’ll take a closer look at the POST
, PUT
, and PATCH
methods and learn how they differ from the other request types.
The Message Body
According to the HTTP specification, POST
, PUT
, and the less common PATCH
requests pass their data through the message body rather than through parameters in the query string. Using requests
, you’ll pass the payload to the corresponding function’s data
parameter.
data
takes a dictionary, a list of tuples, bytes, or a file-like object. You’ll want to adapt the data you send in the body of your request to the specific needs of the service you’re interacting with.
For example, if your request’s content type is application/x-www-form-urlencoded
, you can send the form data as a dictionary:
>>>
>>> requests.post('https://httpbin.org/post', data={'key':'value'})
<Response [200]>
You can also send that same data as a list of tuples:
>>>
>>> requests.post('https://httpbin.org/post', data=[('key', 'value')])
<Response [200]>
If, however, you need to send JSON data, you can use the json
parameter. When you pass JSON data via json
, requests
will serialize your data and add the correct Content-Type
header for you.
httpbin.org is a great resource created by the author of requests
, Kenneth Reitz. It’s a service that accepts test requests and responds with data about the requests. For instance, you can use it to inspect a basic POST
request:
>>>
>>> response = requests.post('https://httpbin.org/post', json={'key':'value'})
>>> json_response = response.json()
>>> json_response['data']
'{"key": "value"}'
>>> json_response['headers']['Content-Type']
'application/json'
You can see from the response that the server received your request data and headers as you sent them. requests
also provides this information to you in the form of a PreparedRequest
.
Inspecting Your Request
When you make a request, the requests
library prepares the request before actually sending it to the destination server. Request preparation includes things like validating headers and serializing JSON content.
You can view the PreparedRequest
by accessing .request
:
>>>
>>> response = requests.post('https://httpbin.org/post', json={'key':'value'})
>>> response.request.headers['Content-Type']
'application/json'
>>> response.request.url
'https://httpbin.org/post'
>>> response.request.body
b'{"key": "value"}'
Inspecting the PreparedRequest
gives you access to all kinds of information about the request being made such as payload, URL, headers, authentication, and more.
So far, you’ve made a lot of different kinds of requests, but they’ve all had one thing in common: they’re unauthenticated requests to public APIs. Many services you may come across will want you to authenticate in some way.
Authentication
Authentication helps a service understand who you are. Typically, you provide your credentials to a server by passing data through the Authorization
header or a custom header defined by the service. All the request functions you’ve seen to this point provide a parameter called auth
, which allows you to pass your credentials.
One example of an API that requires authentication is GitHub’s Authenticated User API. This endpoint provides information about the authenticated user’s profile. To make a request to the Authenticated User API, you can pass your GitHub username and password in a tuple to get()
:
>>>
>>> from getpass import getpass
>>> requests.get('https://api.github.com/user', auth=('username', getpass()))
<Response [200]>
The request succeeded if the credentials you passed in the tuple to auth
are valid. If you try to make this request with no credentials, you’ll see that the status code is 401 Unauthorized
:
>>>
>>> requests.get('https://api.github.com/user')
<Response [401]>
When you pass your username and password in a tuple to the auth
parameter, requests
is applying the credentials using HTTP’s Basic access authentication scheme under the hood.
Therefore, you could make the same request by passing explicit Basic authentication credentials using HTTPBasicAuth
:
>>>
>>> from requests.auth import HTTPBasicAuth
>>> from getpass import getpass
>>> requests.get(
... 'https://api.github.com/user',
... auth=HTTPBasicAuth('username', getpass())
... )
<Response [200]>
Though you don’t need to be explicit for Basic authentication, you may want to authenticate using another method. requests
provides other methods of authentication out of the box such as HTTPDigestAuth
and HTTPProxyAuth
.
You can even supply your own authentication mechanism. To do so, you must first create a subclass of AuthBase
. Then, you implement __call__()
:
import requests
from requests.auth import AuthBase
class TokenAuth(AuthBase):
"""Implements a custom authentication scheme."""
def __init__(self, token):
self.token = token
def __call__(self, r):
"""Attach an API token to a custom auth header."""
r.headers['X-TokenAuth'] = f'{self.token}' # Python 3.6+
return r
requests.get('https://httpbin.org/get', auth=TokenAuth('12345abcde-token'))
Here, your custom TokenAuth
mechanism receives a token, then includes that token in the X-TokenAuth
header of your request.
Bad authentication mechanisms can lead to security vulnerabilities, so unless a service requires a custom authentication mechanism for some reason, you’ll always want to use a tried-and-true auth scheme like Basic or OAuth.
While you’re thinking about security, let’s consider dealing with SSL Certificates using requests
.
SSL Certificate Verification
Any time the data you are trying to send or receive is sensitive, security is important. The way that you communicate with secure sites over HTTP is by establishing an encrypted connection using SSL, which means that verifying the target server’s SSL Certificate is critical.
The good news is that requests
does this for you by default. However, there are some cases where you might want to change this behavior.
If you want to disable SSL Certificate verification, you pass False
to the verify
parameter of the request function:
>>>
>>> requests.get('https://api.github.com', verify=False)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
<Response [200]>
requests
even warns you when you’re making an insecure request to help you keep your data safe!
Performance
When using requests
, especially in a production application environment, it’s important to consider performance implications. Features like timeout control, sessions, and retry limits can help you keep your application running smoothly.
Timeouts
When you make an inline request to an external service, your system will need to wait upon the response before moving on. If your application waits too long for that response, requests to your service could back up, your user experience could suffer, or your background jobs could hang.
By default, requests
will wait indefinitely on the response, so you should almost always specify a timeout duration to prevent these things from happening. To set the request’s timeout, use the timeout
parameter. timeout
can be an integer or float representing the number of seconds to wait on a response before timing out:
>>>
>>> requests.get('https://api.github.com', timeout=1)
<Response [200]>
>>> requests.get('https://api.github.com', timeout=3.05)
<Response [200]>
In the first request, the request will timeout after 1 second. In the second request, the request will timeout after 3.05 seconds.
You can also pass a tuple to timeout
with the first element being a connect timeout (the time it allows for the client to establish a connection to the server), and the second being a read timeout (the time it will wait on a response once your client has established a connection):
>>>
>>> requests.get('https://api.github.com', timeout=(2, 5))
<Response [200]>
If the request establishes a connection within 2 seconds and receives data within 5 seconds of the connection being established, then the response will be returned as it was before. If the request times out, then the function will raise a Timeout
exception:
import requests
from requests.exceptions import Timeout
try:
response = requests.get('https://api.github.com', timeout=1)
except Timeout:
print('The request timed out')
else:
print('The request did not time out')
Your program can catch the Timeout
exception and respond accordingly.
The Session Object
Until now, you’ve been dealing with high level requests
APIs such as get()
and post()
. These functions are abstractions of what’s going on when you make your requests. They hide implementation details such as how connections are managed so that you don’t have to worry about them.
Underneath those abstractions is a class called Session
. If you need to fine-tune your control over how requests are being made or improve the performance of your requests, you may need to use a Session
instance directly.
Sessions are used to persist parameters across requests. For example, if you want to use the same authentication across multiple requests, you could use a session:
import requests
from getpass import getpass
# By using a context manager, you can ensure the resources used by
# the session will be released after use
with requests.Session() as session:
session.auth = ('username', getpass())
# Instead of requests.get(), you'll use session.get()
response = session.get('https://api.github.com/user')
# You can inspect the response just like you did before
print(response.headers)
print(response.json())
Each time you make a request with session
, once it has been initialized with authentication credentials, the credentials will be persisted.
The primary performance optimization of sessions comes in the form of persistent connections. When your app makes a connection to a server using a Session
, it keeps that connection around in a connection pool. When your app wants to connect to the same server again, it will reuse a connection from the pool rather than establishing a new one.
Max Retries
When a request fails, you may want your application to retry the same request. However, requests
will not do this for you by default. To apply this functionality, you need to implement a custom Transport Adapter.
Transport Adapters let you define a set of configurations per service you’re interacting with. For example, let’s say you want all requests to https://api.github.com
to retry three times before finally raising a ConnectionError
. You would build a Transport Adapter, set its max_retries
parameter, and mount it to an existing Session
:
import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError
github_adapter = HTTPAdapter(max_retries=3)
session = requests.Session()
# Use `github_adapter` for all requests to endpoints that start with this URL
session.mount('https://api.github.com', github_adapter)
try:
session.get('https://api.github.com')
except ConnectionError as ce:
print(ce)
When you mount the HTTPAdapter
, github_adapter
, to session
, session
will adhere to its configuration for each request to https://api.github.com.
Timeouts, Transport Adapters, and sessions are for keeping your code efficient and your application resilient.
Conclusion
You’ve come a long way in learning about Python’s powerful requests
library.
You’re now able to:
- Make requests using a variety of different HTTP methods such as
GET
,POST
, andPUT
- Customize your requests by modifying headers, authentication, query strings, and message bodies
- Inspect the data you send to the server and the data the server sends back to you
- Work with SSL Certificate verification
- Use
requests
effectively usingmax_retries
,timeout
, Sessions, and Transport Adapters
Because you learned how to use requests
, you’re equipped to explore the wide world of web services and build awesome applications using the fascinating data they provide.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Making HTTP Requests With Python
Содержание
- HOWTO Fetch Internet Resources Using The urllib Package¶
- Introduction¶
- Fetching URLs¶
- Data¶
- Headers¶
- Handling Exceptions¶
- URLError¶
- HTTPError¶
- Error Codes¶
- Wrapping it Up¶
- Number 1В¶
- Number 2В¶
- info and geturl¶
- Openers and Handlers¶
- Basic Authentication¶
- Proxies¶
- Sockets and Layers¶
- Footnotes¶
HOWTO Fetch Internet Resources Using The urllib Package¶
There is a French translation of an earlier revision of this HOWTO, available at urllib2 — Le Manuel manquant.
Introduction¶
You may also find useful the following article on fetching web resources with Python:
A tutorial on Basic Authentication, with examples in Python.
urllib.request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling common situations — like basic authentication, cookies, proxies and so on. These are provided by objects called handlers and openers.
urllib.request supports fetching URLs for many “URL schemes” (identified by the string before the «:» in URL — for example «ftp» is the URL scheme of «ftp://python.org/» ) using their associated network protocols (e.g. FTP, HTTP). This tutorial focuses on the most common case, HTTP.
For straightforward situations urlopen is very easy to use. But as soon as you encounter errors or non-trivial cases when opening HTTP URLs, you will need some understanding of the HyperText Transfer Protocol. The most comprehensive and authoritative reference to HTTP is RFC 2616. This is a technical document and not intended to be easy to read. This HOWTO aims to illustrate using urllib, with enough detail about HTTP to help you through. It is not intended to replace the urllib.request docs, but is supplementary to them.
Fetching URLs¶
The simplest way to use urllib.request is as follows:
If you wish to retrieve a resource via URL and store it in a temporary location, you can do so via the shutil.copyfileobj() and tempfile.NamedTemporaryFile() functions:
Many uses of urllib will be that simple (note that instead of an вЂhttp:’ URL we could have used a URL starting with вЂftp:’, вЂfile:’, etc.). However, it’s the purpose of this tutorial to explain the more complicated cases, concentrating on HTTP.
HTTP is based on requests and responses — the client makes requests and servers send responses. urllib.request mirrors this with a Request object which represents the HTTP request you are making. In its simplest form you create a Request object that specifies the URL you want to fetch. Calling urlopen with this Request object returns a response object for the URL requested. This response is a file-like object, which means you can for example call .read() on the response:
Note that urllib.request makes use of the same Request interface to handle all URL schemes. For example, you can make an FTP request like so:
In the case of HTTP, there are two extra things that Request objects allow you to do: First, you can pass data to be sent to the server. Second, you can pass extra information (“metadata”) about the data or about the request itself, to the server — this information is sent as HTTP “headers”. Let’s look at each of these in turn.
Data¶
Sometimes you want to send data to a URL (often the URL will refer to a CGI (Common Gateway Interface) script or other web application). With HTTP, this is often done using what’s known as a POST request. This is often what your browser does when you submit a HTML form that you filled in on the web. Not all POSTs have to come from forms: you can use a POST to transmit arbitrary data to your own application. In the common case of HTML forms, the data needs to be encoded in a standard way, and then passed to the Request object as the data argument. The encoding is done using a function from the urllib.parse library.
Note that other encodings are sometimes required (e.g. for file upload from HTML forms — see HTML Specification, Form Submission for more details).
If you do not pass the data argument, urllib uses a GET request. One way in which GET and POST requests differ is that POST requests often have “side-effects”: they change the state of the system in some way (for example by placing an order with the website for a hundredweight of tinned spam to be delivered to your door). Though the HTTP standard makes it clear that POSTs are intended to always cause side-effects, and GET requests never to cause side-effects, nothing prevents a GET request from having side-effects, nor a POST requests from having no side-effects. Data can also be passed in an HTTP GET request by encoding it in the URL itself.
This is done as follows:
Notice that the full URL is created by adding a ? to the URL, followed by the encoded values.
We’ll discuss here one particular HTTP header, to illustrate how to add headers to your HTTP request.
Some websites 1 dislike being browsed by programs, or send different versions to different browsers 2. By default urllib identifies itself as Python-urllib/x.y (where x and y are the major and minor version numbers of the Python release, e.g. Python-urllib/2.5 ), which may confuse the site, or just plain not work. The way a browser identifies itself is through the User-Agent header 3. When you create a Request object you can pass a dictionary of headers in. The following example makes the same request as above, but identifies itself as a version of Internet Explorer 4.
The response also has two useful methods. See the section on info and geturl which comes after we have a look at what happens when things go wrong.
Handling Exceptions¶
urlopen raises URLError when it cannot handle a response (though as usual with Python APIs, built-in exceptions such as ValueError , TypeError etc. may also be raised).
HTTPError is the subclass of URLError raised in the specific case of HTTP URLs.
The exception classes are exported from the urllib.error module.
URLError¶
Often, URLError is raised because there is no network connection (no route to the specified server), or the specified server doesn’t exist. In this case, the exception raised will have a вЂreason’ attribute, which is a tuple containing an error code and a text error message.
HTTPError¶
Every HTTP response from the server contains a numeric “status code”. Sometimes the status code indicates that the server is unable to fulfil the request. The default handlers will handle some of these responses for you (for example, if the response is a “redirection” that requests the client fetch the document from a different URL, urllib will handle that for you). For those it can’t handle, urlopen will raise an HTTPError . Typical errors include вЂ404’ (page not found), вЂ403’ (request forbidden), and вЂ401’ (authentication required).
See section 10 of RFC 2616 for a reference on all the HTTP error codes.
The HTTPError instance raised will have an integer вЂcode’ attribute, which corresponds to the error sent by the server.
Error Codes¶
Because the default handlers handle redirects (codes in the 300 range), and codes in the 100–299 range indicate success, you will usually only see error codes in the 400–599 range.
http.server.BaseHTTPRequestHandler.responses is a useful dictionary of response codes in that shows all the response codes used by RFC 2616. The dictionary is reproduced here for convenience
When an error is raised the server responds by returning an HTTP error code and an error page. You can use the HTTPError instance as a response on the page returned. This means that as well as the code attribute, it also has read, geturl, and info, methods as returned by the urllib.response module:
Wrapping it Up¶
So if you want to be prepared for HTTPError or URLError there are two basic approaches. I prefer the second approach.
Number 1В¶
The except HTTPError must come first, otherwise except URLError will also catch an HTTPError .
Number 2В¶
info and geturl¶
The response returned by urlopen (or the HTTPError instance) has two useful methods info() and geturl() and is defined in the module urllib.response ..
geturl — this returns the real URL of the page fetched. This is useful because urlopen (or the opener object used) may have followed a redirect. The URL of the page fetched may not be the same as the URL requested.
info — this returns a dictionary-like object that describes the page fetched, particularly the headers sent by the server. It is currently an http.client.HTTPMessage instance.
Typical headers include вЂContent-length’, вЂContent-type’, and so on. See the Quick Reference to HTTP Headers for a useful listing of HTTP headers with brief explanations of their meaning and use.
Openers and Handlers¶
When you fetch a URL you use an opener (an instance of the perhaps confusingly named urllib.request.OpenerDirector ). Normally we have been using the default opener — via urlopen — but you can create custom openers. Openers use handlers. All the “heavy lifting” is done by the handlers. Each handler knows how to open URLs for a particular URL scheme (http, ftp, etc.), or how to handle an aspect of URL opening, for example HTTP redirections or HTTP cookies.
You will want to create openers if you want to fetch URLs with specific handlers installed, for example to get an opener that handles cookies, or to get an opener that does not handle redirections.
To create an opener, instantiate an OpenerDirector , and then call .add_handler(some_handler_instance) repeatedly.
Alternatively, you can use build_opener , which is a convenience function for creating opener objects with a single function call. build_opener adds several handlers by default, but provides a quick way to add more and/or override the default handlers.
Other sorts of handlers you might want to can handle proxies, authentication, and other common but slightly specialised situations.
install_opener can be used to make an opener object the (global) default opener. This means that calls to urlopen will use the opener you have installed.
Opener objects have an open method, which can be called directly to fetch urls in the same way as the urlopen function: there’s no need to call install_opener , except as a convenience.
Basic Authentication¶
To illustrate creating and installing a handler we will use the HTTPBasicAuthHandler . For a more detailed discussion of this subject – including an explanation of how Basic Authentication works — see the Basic Authentication Tutorial.
When authentication is required, the server sends a header (as well as the 401 error code) requesting authentication. This specifies the authentication scheme and a вЂrealm’. The header looks like: WWW-Authenticate: SCHEME realm=»REALM» .
The client should then retry the request with the appropriate name and password for the realm included as a header in the request. This is вЂbasic authentication’. In order to simplify this process we can create an instance of HTTPBasicAuthHandler and an opener to use this handler.
The HTTPBasicAuthHandler uses an object called a password manager to handle the mapping of URLs and realms to passwords and usernames. If you know what the realm is (from the authentication header sent by the server), then you can use a HTTPPasswordMgr . Frequently one doesn’t care what the realm is. In that case, it is convenient to use HTTPPasswordMgrWithDefaultRealm . This allows you to specify a default username and password for a URL. This will be supplied in the absence of you providing an alternative combination for a specific realm. We indicate this by providing None as the realm argument to the add_password method.
The top-level URL is the first URL that requires authentication. URLs “deeper” than the URL you pass to .add_password() will also match.
In the above example we only supplied our HTTPBasicAuthHandler to build_opener . By default openers have the handlers for normal situations – ProxyHandler (if a proxy setting such as an http_proxy environment variable is set), UnknownHandler , HTTPHandler , HTTPDefaultErrorHandler , HTTPRedirectHandler , FTPHandler , FileHandler , DataHandler , HTTPErrorProcessor .
top_level_url is in fact either a full URL (including the вЂhttp:’ scheme component and the hostname and optionally the port number) e.g. «http://example.com/» or an “authority” (i.e. the hostname, optionally including the port number) e.g. «example.com» or «example.com:8080» (the latter example includes a port number). The authority, if present, must NOT contain the “userinfo” component — for example «joe:password@example.com» is not correct.
Proxies¶
urllib will auto-detect your proxy settings and use those. This is through the ProxyHandler , which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful 5. One way to do this is to setup our own ProxyHandler , with no proxies defined. This is done using similar steps to setting up a Basic Authentication handler:
Currently urllib.request does not support fetching of https locations through a proxy. However, this can be enabled by extending urllib.request as shown in the recipe 6.
HTTP_PROXY will be ignored if a variable REQUEST_METHOD is set; see the documentation on getproxies() .
Sockets and Layers¶
The Python support for fetching resources from the web is layered. urllib uses the http.client library, which in turn uses the socket library.
As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has no timeout and can hang. Currently, the socket timeout is not exposed at the http.client or urllib.request levels. However, you can set the default timeout globally for all sockets using
This document was reviewed and revised by John Lee.
Google for example.
Browser sniffing is a very bad practice for website design — building sites using web standards is much more sensible. Unfortunately a lot of sites still send different versions to different browsers.
The user agent for MSIE 6 is вЂMozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)’
For details of more HTTP request headers, see Quick Reference to HTTP Headers.
In my case I have to use a proxy to access the internet at work. If you attempt to fetch localhost URLs through this proxy it blocks them. IE is set to use the proxy, which urllib picks up on. In order to test scripts with a localhost server, I have to prevent urllib from using the proxy.
urllib opener for SSL proxy (CONNECT method): ASPN Cookbook Recipe.
Источник
#lambda #python-3.8
Вопрос:
Я вызываю общедоступный API с поддержкой OAuth 2.0 в Python 3.8, чтобы сохранить ответ json в AWS-S3 с помощью функции AWS Lambda. Я сталкиваюсь с ошибкой HTTP Ошибка 401: Несанкционированный.
Функция(fnGetToken), которая получает токен, работает, и я могу успешно использовать токен в Postman и получить ответ. Код прерывается в функции fnGetFeed с сообщением об ошибке, показанным ниже. Я думаю, что urllib.request.urlopen(req) вызывает, но я думаю, что это стандартно. Я не эксперт по python и поэтому прошу экспертов поделиться своими отзывами, чтобы решить эту проблему.
import json
import base64
import urllib.request
import os
import boto3
def fnGetToken():
url = os.environ['authurl']
headers = {}
key = os.environ['okey']
secret = os.environ['osecret']
# auth header will be combination of client id, secret with 'Basic' Auth Header
authHeader = 'Basic ' str(base64.b64encode(bytes((key ':' secret), 'utf-8')), "utf-8")
print(authHeader)
headers['Authorization'] = authHeader
headers['ContentType'] = 'application/x-www-form-urlencoded;charset=UTF-8'
data = "grant_type=client_credentials"
req = urllib.request.Request(url, headers = headers, method = 'POST')
response = urllib.request.urlopen(req, data.encode('utf-8'))
respData = response.read()
data = json.loads(respData)
return data['access_token']
def fnGetFeed(token):
screenname = os.environ['screenname']
url = str(os.environ['apiurl']) screenname
print("url generated {}".format(url))
headers = {}
authHeader = 'bearer ' token
print("token passed: ".format(authHeader))
headers['Authorization'] = authHeader
req = urllib.request.Request(url, headers = headers, method = 'GET')
response = urllib.request.urlopen(req)
respData = response.read()
data = json.loads(respData)
return data
def lambda_handler(event, context):
token = fnGetToken()
print ("Token generated successfully {}".format(token))
data = fnGetFeed(token)
print ("response from get feed function {}".format(data))
{
"errorMessage": "HTTP Error 401: Unauthorized",
"errorType": "HTTPError",
"stackTrace": [
" File "/var/task/lambda_function.py", line 58, in lambda_handlern data = fnGetFeed(token)n",
" File "/var/task/lambda_function.py", line 42, in fnGetFeedn response = urllib.request.urlopen(req)n",
" File "/var/lang/lib/python3.8/urllib/request.py", line 222, in urlopenn return opener.open(url, data, timeout)n",
" File "/var/lang/lib/python3.8/urllib/request.py", line 531, in openn response = meth(req, response)n",
" File "/var/lang/lib/python3.8/urllib/request.py", line 640, in http_responsen response = self.parent.error(n",
" File "/var/lang/lib/python3.8/urllib/request.py", line 569, in errorn return self._call_chain(*args)n",
" File "/var/lang/lib/python3.8/urllib/request.py", line 502, in _call_chainn result = func(*args)n",
" File "/var/lang/lib/python3.8/urllib/request.py", line 649, in http_error_defaultn raise HTTPError(req.full_url, code, msg, hdrs, fp)n"
]
}
Спасибо!
Requests — это модуль для языка Python, который используют для упрощения работы с HTTP-запросами. Он удобнее и проще встроенного Urllib настолько, что даже в документации Python рекомендовано использовать Requests.
Установка библиотеки Requests
Самый простой вариант установки сторонних пакетов в Python — использовать pip — систему управления пакетами. Обычно pip предустанавливается вместе с интерпретатором. Если его нет — можно скачать. Для этого нужно ввести в командную строку:
Linux / MacOS
python -m ensurepip --upgrade
Windows
py -m ensurepip --upgrade
Когда pip установлен, для установки модуля Requests нужно ввести команду:
pip install requests
Как настроить библиотеку Requests. Библиотека не требует дополнительной настройки — ею можно сразу же пользоваться.
Начало работы. Давайте рассмотрим пример простейшего запроса в модуле Requests:
import requests # делаем запрос на чтение страницы https://sky.pro/media/ response = requests.get('https://sky.pro/media/') print(response.ok) # проверяем успешен ли запрос? print(response.text) # выводим полученный ответ на экран
А вот как сделать то же самое, но при помощи встроенной библиотеки Urllib:
from urllib.request import urlopen # открываем запрос на чтение страницы http://sky.pro/media with urlopen('http://sky.pro/media') as response: response_status = response.status # сохраняем статус запроса в переменную html = response.read() # вычитываем ответ в переменную print(response_status == 200) # проверяем успешен ли запрос print(html.decode()) # выводим полученный ответ на экран
Модуль Requests в Python упрощает и автоматизирует многие действия, которые в стандартной библиотеке надо делать самостоятельно. Именно за это её любят и используют многие разработчики.
Давайте разберёмся, как работать с Requests, и из чего состоят HTTP-запросы.
Методы HTTP-запросов
HTTP — это протокол передачи информации в интернете. Он описывает правила и формат общения между двумя сторонами. Например, как браузеру описать запрос, а серверу — сформировать ответ. HTTP — это текстовый протокол, поэтому его может прочитать и человек.
Давайте разберем простейший запрос:
GET /media/ HTTP/1.1 Host: sky.pro
Первая строка формирует запрос: мы говорим серверу, что хотим прочитать (GET) ресурс по адресу /media/. В конце указывается версия протокола: HTTP/1.1.
Начиная со второй строки передается дополнительная информация, которая называется заголовками. Она опциональная — кроме заголовка Host. Он указывает домен, на котором находится запрашиваемый ресурс.
HTTP-ответ выглядит аналогично:
HTTP/1.1 200 OK Content-Type: text/html <тело ответа>
В первой строке указывается версия протокола и код ответа — статус, который описывает результат запроса. В следующих строках, так же, как и в запросе, перечисляются заголовки. В данном случае сервер говорит, что в ответе находится HTML-страница (Content-Type: text/html).
И в самом конце находится тело ответа: файл, HTML-страница или ничего. Браузер отрисовывает тело ответа — это уже то, что видит человек, когда загружает страницу.
Методы HTTP-запросов нужны, чтобы объяснить серверу, какое действие мы хотим совершить над ресурсом. Ресурс — это цель HTTP-запроса. Это может быть документ, фотография или просто веб-страница.
Разберем на примерах распространённые методы — в чём их суть и чем они отличаются. Важно: ниже разбираются механизмы работы каждого метода в том виде, в котором они описаны в спецификации. На практике поведение может отличаться, но такое встречается нечасто.
OPTIONS
Метод OPTIONS нужен, чтобы спросить сервер о том, какие методы поддерживает ресурс. Он редко используется напрямую, обычно вызывается браузером автоматически. Поддерживается не всеми сайтами/ресурсами. Пример:
HTTP-ответ выглядит аналогично:
import requests response = requests.options('https://httpbin.org') print(response.text) # будет пустым print(response.headers['Allow']) # 'HEAD, GET, OPTIONS'
GET
GET — самый распространённый HTTP-метод. Его используют для чтения интернет-ресурса. Браузер отправляет метод GET, когда мы открываем какой-либо сайт. Пример:
import requests response = requests.get('https://httpbin.org/get') print(response.text)
POST
Метод POST используют для отправки на сервер данных, которые передаются в теле запроса. Для этого при вызове requests.post() надо указать аргумент data, который принимает на вход словарь, список кортежей, байты или файл.
Если для передачи данных используется формат JSON, вместо data можно указать json. Это просто удобная конвенция, которая правильно формирует отправляемый запрос. Пример:
import requests data_response = requests.post('https://httpbin.org/post', data={'foo': 'bar'}) print(data_response.text) # переданные данные находятся по ключу form json_response = requests.post('https://httpbin.org/post', json={'foo': 'bar'}) print(data_response.text) # ключ form пустой, теперь данные лежат в json
HEAD
Этот метод очень похож на GET — с той лишь разницей, что HEAD возвращает пустое тело ответа. Он нужен, когда нужно посмотреть только на заголовки, не загружая ответ целиком.
Например, мы хотим иметь свежую версию PDF-файла с расписанием автобусов. Файл хранится на каком-то сайте и периодически обновляется. Вместо того, чтобы каждый раз скачивать и сверять файл вручную, можно использовать метод HEAD. Он поможет быстро проверить дату изменения файла по заголовкам ответа.
import requests response = requests.get('https://httpbin.org/head') print(response.text) # ответ будет пустым print(response.headers)
PUT
Метод PUT очень похож на POST — с той разницей, что несколько последовательных вызовов PUT должны приводить к одному и тому же результату.
POST этого не гарантирует и может привести к неожиданным результатам, например к дублированию созданной сущности.
import requests response = requests.put('https://httpbin.org/put', data={'foo': 'bar'}) print(response.text)
PATCH
PATCH аналогичен методу POST, но с двумя отличиями: он используется для частичных изменений ресурса и его нельзя использовать в HTML-формах.
В теле запроса передается набор модификаций, которые надо применить.
import requests response = requests.patch('https://httpbin.org/patch', data={'foo': 'bar'}) print(response.text)
DELETE
Метод используется для удаления ресурса. Поддерживает передачу данных, однако не требует её: тело запроса может быть пустым.
Как и PUT, последовательный вызов DELETE должен приводить к одному и тому же результату.
import requests response = requests.delete('https://httpbin.org/delete') print(response.text)
HTTP-коды состояний
Каждый ответ HTTP-запроса обязательно имеет код состояния — трехзначное число, которое как-то характеризует полученный результат. По этому коду можно понять, всё ли успешно отработало, и если произошла ошибка, то почему.
Всего выделяют пять групп кодов состояний:
1хх-коды.
К этой группе относятся информационные коды состояний. Они сообщают клиенту о промежуточном статусе запроса и не являются финальным результатом.
Их немного, и останавливаться на них мы не будем, потому что они встречаются нечасто.
2хх-коды.
Коды из этой группы означают, что запрос принят и обработан сервером без ошибок:
- 200 OK — запрос выполнен успешно. Чаще всего встречается именно это число.
- 201 Created — в результате запроса был создан новый ресурс. Как правило, этим кодом отвечают на POST- и иногда PUT-запросы.
- 202 Accepted — запрос принят, но ещё не выполнен. Используется, когда по какой-то причине сервер не может выполнить его сразу. Например, если обработку делает какой-то сторонний процесс, который выполняется раз в день.
- 204 No Content — указывает, что тело ответа пустое, но заголовки могут содержать полезную информацию. Не используется с методом HEAD, поскольку ответ на него всегда должен быть пустым.
3хх-коды.
Это группа кодов перенаправления. Это значит, что клиенту нужно сделать какое-то действие, чтобы запрос продолжил выполняться:
- 301 Moved Permanently — URL запрашиваемого ресурса изменился, новый URL содержится в ответе.
- 302 Found — аналогичен предыдущему коду. Отличие в том, что URL изменился временно. При этом статусе состояния поисковые системы не будут менять ссылку в своей поисковой выдаче на новую.
- 304 Not Modified — означает, что содержимое ресурса было закешировано, его содержимое не поменялось и запрос можно не продолжать.
4хх-коды.
Это коды ошибок, которые допустил клиент при формировании запроса:
- 400 Bad Request — запрос сформирован с ошибкой, поэтому сервер не может его обработать. Причин может быть много, но чаще всего ошибку надо искать в теле запроса.
- 401 Unauthorized — для продолжения необходимо залогиниться.
- 403 Forbidden — пользователь залогинен, но у него нет прав для доступа к ресурсу.
- 404 Not Found — всем известный код: страница не найдена. Некоторые сайты могут возвращать 404 вместо 403, чтобы скрыть информацию от неавторизованных пользователей.
- 405 Method Not Allowed — данный ресурс не поддерживает метод запроса. Например, так бывает, если разработчик хочет отправить PUT-запрос на ресурс, который его не поддерживает.
- 429 Too Many Requests — означает, что сработал защитный механизм: он ограничивает слишком частые запросы от одного пользователя. Таким образом защищаются от DDoS- или brute-force-атак.
5хх-коды.
Это ошибки, которые возникли на сервере во время выполнения запроса:
- 500 Internal Server Error — на сервере произошла неожиданная ошибка. Как правило, происходит из-за того, что в коде сервера возникает исключение.
- 502 Bad Gateway — возникает, если на сервере используется обратный прокси, который не смог достучаться до приложения.
- 503 Service Unavailable — сервер пока не готов обработать запрос. В ответе также может содержаться информация о том, когда сервис станет доступен.
- 504 Gateway Timeout — эта ошибка означает, что обратный прокси не смог получить ответ за отведенное время (обычно — 60 секунд).
Заголовки, текст ответа и файлы Cookie
Теперь рассмотрим, как работать с запросами и ответами в Requests. Чтобы увидеть результат HTTP-запроса, можно использовать один из трех способов.
Выбор способа зависит от того, какие данные мы получили. В непонятной ситуации можно использовать атрибут text, который возвращает содержимое в виде строки:
import requests response = requests.get('https://httpbin.org/get') print(response.text)
Если заранее известно, что ответ будет в формате JSON, можно использовать одноименный атрибут, который автоматически распарсит ответ и вернет его в виде словаря:
json_response = response.json() print(json_response)
Обратите внимание, как изменится вывод функции print().
Наконец, если ответом на запрос является файл, стоит использовать атрибут content, который возвращает байты:
import requests response = requests.get('https://httpbin.org/image/jpeg') print(response.content)
Попробуйте вывести на экран response.text для предыдущего запроса и сравните результат.
Заголовок — это дополнительная информация, которой обмениваются клиент и сервер. В заголовках могут содержаться: размер ответа (Content-Length), формат передаваемых данных (Content-Type) или информация о клиенте (User-Agent).
Полный список очень длинный, знать их все необязательно, а часть и вовсе подставляется автоматом. Например, модуль Requests зачастую сам проставляет Content-Type — формат передаваемых данных.
Заголовок состоит из названия и значения, которые разделяются двоеточием, поэтому удобнее всего передавать их в виде словаря. Рассмотрим на примере, как это работает:
import requests response = requests.get('https://httpbin.org/image', headers={'Accept': 'image/jpeg'}) print(response.headers)
Здесь мы передали заголовок, который указывает, в каком формате мы хотим получить изображение. Попробуйте поменять значение на image/png и посмотрите, как изменится ответ.
Так же можно посмотреть и на заголовки запроса:
print(response.request.headers)
Обратите внимание, что Requests сам подставил информацию о клиенте — User-Agent.
Cookie (куки) — это информация, которую сервер отправляет браузеру для хранения. Они позволяют зафиксировать некоторое состояние. Например, в куки может храниться информация о том, что пользователь уже залогинен. Она хранится в браузере и передается на сервер при каждом запросе, поэтому нам не нужно каждый раз проходить авторизацию заново.
Работать с куками в модуле Requests очень просто:
import requests response = requests.get('https://httpbin.org/cookies', cookies={'foo': 'bar'}) print(response.text)
Посмотреть, какие куки пришли от сервера, можно при помощи атрибута cookies объекта Response:
print(response.cookies)
Как отправлять запросы при помощи Python Requests
Рассмотрим несколько частых примеров использования модуля Requests, чтобы понять, как отправлять запросы.
Скачивание файлов
import requests response = requests.get('https://www.python.org/static/img/python-logo.png') with open('python_logo.png', 'wb') as image: image.write(response.content)
Выше описан не самый эффективный способ скачивания файлов. Если файл окажется большого размера, код выше загрузит результат целиком в оперативную память. В лучшем случае программа упадет с ошибкой, в худшем — всё намертво зависнет.
Вот как это можно исправить:
import requests response = requests.get('https://www.python.org/static/img/python-logo@2x.png', stream=True) with open('python_logo.png', 'wb') as image: for chunk in response.iter_content(chunk_size=1024): image.write(chunk)
В этом варианте мы используем параметр stream=True, который открывает соединение, но не скачивает содержимое. Затем мы задаем размер чанка — кусочка информации, который будет скачиваться за одну итерацию цикла, и делаем его равным 1 Кб (1024 байт). Модуль Requests сам закрывает соединение после прочтения последнего чанка.
Чтобы заранее узнать размер файла, можно воспользоваться методом HEAD. Эта информация передается в заголовке ‘Content-Length’ и исчисляется в байтах.
import requests head_response = requests.head('https://www.python.org/static/img/python-logo@2x.png') image_size = int(head_response.headers['Content-Length']) print('Размер загружаемого файла: {0} кб'.format(image_size / 1024))
Авторизация на сайте
Рассмотрим два способа авторизации, которые встречаются чаще всего: Basic Auth и Bearer Auth. В обоих случаях механизм очень похожий — запрос должен передать заголовок ‘Authorization’ с каким-то значением. Для Basic Auth — это логин и пароль, закодированные в base64, для Bearer — токен, который мы получили на сайте заранее.
Для базовой авторизации у модуля Requests есть очень удобный параметр auth=, который делает всю работу за нас:
import requests response = requests.get('https://httpbin.org/basic-auth/foo/bar') print(response.status_code) # 401 response = requests.get('https://httpbin.org/basic-auth/foo/bar', auth=('foo', 'bar')) print(response.status_code) # 200 print(response.request.headers[‘Authorization’]) # 'Basic Zm9vOmJhcg=='
Обратите внимание, что модуль Requests сам добавил заголовок Authorization и подставил туда закодированные логин и пароль.
Для Bearer Auth нам придется добавлять его самостоятельно:
import requests response = requests.get('https://httpbin.org/bearer') print(response.status_code) # 401 headers = {'Authorization': 'Bearer some_token'} response = requests.get('https://httpbin.org/bearer', headers=headers) print(response.status_code) # 200
У каждого API своя спецификация — вместо Bearer может быть Token или что-то другое. Поэтому важно внимательно читать документацию сервиса.
Мультискачивание
Напишем код, который умеет скачивать сразу несколько файлов. Для этого вынесем работу с модулем Requests в отдельную функцию и параметризируем место сохранения файла.
Не забывайте про сохранение файла по чанкам, чтобы крупные файлы не загружались в память целиком.
import requests def download_file(url, save_path): response = requests.get(url, stream=True) with open(save_path, 'wb') as file: for chunk in response.iter_content(chunk_size=1024): file.write(chunk) download_list = [ 'https://cdn.pixabay.com/photo/2022/04/10/19/33/house-7124141_1280.jpg', 'https://cdn.pixabay.com/photo/2022/08/05/18/50/houseplant-7367379_1280.jpg', 'https://cdn.pixabay.com/photo/2022/06/09/04/53/ride-7251713_1280.png', ] for url in download_list: save_path = url.split('/')[-1] download_file(url, save_path)
Заключение
Модуль Requests — мощный инструмент, с которым разработчик может сделать сложный HTTP-запрос всего в пару строк. У него интуитивно понятный интерфейс, поэтому он так популярен в сообществе Python.
С помощью модуля реквест можно выполнить множество функций: от авторизации на сайте до скачивания нескольких файлов одновременно.
Upon further testing there might be other places where redirect or something along those lines needs to be fixed, I just had the following error
Traceback (most recent call last):
File "/snap/pycharm-community/43/helpers/pydev/pydevd.py", line 1668, in <module>
main()
File "/snap/pycharm-community/43/helpers/pydev/pydevd.py", line 1662, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/snap/pycharm-community/43/helpers/pydev/pydevd.py", line 1072, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/43/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"n", file, 'exec'), glob, loc)
File "run.py", line 13, in <module>
Bot().run()
File "/lib/python3.5/site-packages/mattermost_bot/bot.py", line 36, in run
self._dispatcher.loop()
File "/lib/python3.5/site-packages/mattermost_bot/dispatcher.py", line 120, in loop
'user_added', 'user_removed']):
File "/lib/python3.5/site-packages/mattermost_bot/mattermost.py", line 193, in messages
if not self.connect_websocket():
File "/lib/python3.5/site-packages/mattermost_bot/mattermost.py", line 180, in connect_websocket
self._connect_websocket(url, cookie_name='MMAUTHTOKEN')
File "/lib/python3.5/site-packages/mattermost_bot/mattermost.py", line 189, in _connect_websocket
else ssl.CERT_NONE
File "/lib/python3.5/site-packages/websocket/_core.py", line 494, in create_connection
websock.connect(url, **options)
File "/lib/python3.5/site-packages/websocket/_core.py", line 220, in connect
self.handshake_response = handshake(self.sock, *addrs, **options)
File "/lib/python3.5/site-packages/websocket/_handshake.py", line 69, in handshake
status, resp = _get_resp_headers(sock)
File "lib/python3.5/site-packages/websocket/_handshake.py", line 135, in _get_resp_headers
raise WebSocketBadStatusException("Handshake status %d %s", status, status_message)
websocket._exceptions.WebSocketBadStatusException: Handshake status 301 Moved Permanently
Makes sense that the websocket connection would also need a similar fix no?