In this tutorial, you’ll learn how to use timeouts in the Python requests library, when working with any type of HTTP request being made. By default, the requests
library will not time out any request you make, which can result in your programming running indefinitely if a server doesn’t respond.
By the end of this tutorial, you’ll have learned:
- How to set timeouts in requests
- How to set unique timeouts for connecting and reading in Python requests
- How to catch and handle timeout errors in Python requests
How Does Python requests Handle Timeouts?
By default, the Python requests library does not set a timeout for any request it sends. This is true for GET
, POST
, and PUT
requests. While this can prevent unexpected errors, it can result in your request running indefinitely.
Because of this, it’s important to set a timeout to prevent unexpected behavior. Remember, the Python requests
library will not timeout by default, unless explicitly instructed.
How to Set a Timeout for Python requests
In order to set a timeout in an HTTP request made via the requests
library, you can use the timeout
parameter. The parameter accepts either an integer or a floating point value, which describes the time in seconds.
It’s important to note that this behavior is different from many other HTTP request libraries, such as those in JavaScript. In other libraries or languages, this behavior tends to be expressed in milliseconds.
Let’s take a look at an example of how we can send a GET
request with a timeout:
# Setting a Timeout on a GET Request with an Integer
import requests
resp = requests.get('https://datagy.io', timeout=3)
In the example above, we set a timeout of 3 seconds. We used an integer to represent the time of our timeout. If we wanted to be more precise, we could also pass in a floating point value:
# Setting a Timeout on a GET Request with a Floating Point Value
import requests
resp = requests.get('https://datagy.io', timeout=3.5)
By passing in a single value, we set a timeout for the request. If we wanted to set different timeouts for connecting and reading a request, we can pass in a tuple of values.
How to Set Timeouts for Connecting and Reading in Python requests
In some cases, you’ll want to set different timeouts for making a connection and for reading results. This can easily be done using the timeout
parameter in the requests
library. Similar to the example above, this can be applied to any type of request being made.
Let’s see how we can pass in different timeout limits for connecting and reading requests in the Python requests
library:
# Setting Different Timeouts for Connecting and Reading Requests
import requests
resp = requests.get('https://datagy.io', timeout=(1, 2))
In the example above, we set the request to timeout after 1 second for connecting and 2 seconds for reading the request.
In the following section, you’ll learn how to catch and handle errors that arise due to requests timing out.
How to Catch and Handle Timeout Errors in Python requests
When applying a timeout, it’s important to note that this is not a time limit on the entire response. Instead, it raises an exception if no bytes have been received on the underlying socket.
If the request does not receive any bytes within the specified timeout
limit, a Timeout
error is raised. Let’s see what this looks like:
# Raising a Timeout Error in a Python requests GET Request
import requests
resp = requests.get('https://datagy.io', timeout=0.0001)
# Raises:
# ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fbc988f59a0>, 'Connection to datagy.io timed out. (connect timeout=0.0001)')
In order to prevent your program from crashing, you need to handle the exception using a try-except
block. Let’s see how this can be done:
# Handling a Timeout Error
import requests
from requests.exceptions import ConnectTimeout
try:
requests.get('https://datagy.io', timeout=0.0001)
except ConnectTimeout:
print('Request has timed out')
# Returns:
# Request has timed out
We can see in the code above that the error was handled safely. In order to do this, we:
- Imported the error from the exceptions module of the
requests
library - We created a
try-except
block to handle theConnectTimeout
error.
Frequently Asked Questions
What is the default timeout for Python requests?
None. There is no default timeout for Python requests, unless explicitly set using the timeout
parameter.
How do you set a timeout for requests made in Python?
You set a timeout (in seconds) using the timeout=
parameter when making HTTP requests in the Python requests library.
What is the best time to set for a timeout for requests made in Python?
While there is no best set value for timeouts for HTTP requests made in Python, a good practice is to set them under 500ms. This allows your application to provide a better user experience and to process more requests.
Conclusion
In this tutorial, you learned how to handle timeouts in the Python requests
library. You first learned how the Python requests
library handles timeouts. Then, you learned how to set timeouts when making HTTP requests, both using integers and floating point values. Then, you learned how to specify specific timeouts for connecting and reading requests. Finally, you learned how to handle timeout exceptions in the Python requests
library.
Additional Resources
To learn more about related topics, check out the tutorials below:
- Python Requests Response Object Explained
- Python Requests Headers Explained
- Python Requests Sessions Explained
- Official Documentation: Python requests Timeout
Requests is the de-facto standard when it comes to making HTTP requests in Python, especially since Python 3. The open-source library abstracts away the complexity of managing network connections and make sending HTTP requests a breeze.
Most of you may already know how to send HTTP POST, GET, as well as other type of HTTP requests. In this article, we will guide you through how to set timeout in requests
, as well as managing exceptions.
Timeouts in Python requests
You can tell requests
library to stop waiting for a response after a given amount of time by passing a number to the timeout
parameter. If the requests
library does not receive response in x
seconds, it will raise a Timeout
error.
It’s a best practice that production code should use this parameter in all network requests. Failure to do so can cause your program to hang indefinitely. If no timeout
is specified explicitly, requests do not time out.
Code language: PHP (php)
requests.get('https://example.com/', timeout=10) # OUTPUT Traceback (most recent call last): File "<stdin>", line 1, in <module> requests.exceptions.Timeout: HTTPConnectionPool(host='example.com', port=80): Request timed out. (timeout=10)
Catch timeout exception
Once a timeout value has been set, every request that doesn’t receive a response in the specified timeframe will raise a Timeout
error. It’s important to handle this exception, otherwise your program would be terminated.
In order to catch Timeout
errors in requests
, you have to import the exception itself using from requests.exceptions import Timeout
.
Code language: PHP (php)
import requests from requests.exceptions import Timeout try: requests.get('https://www.example.com, timeout=10) except Timeout: print('Timeout has been raised.')
Advanced timeout handling
Most requests to external servers should have a timeout attached, in case the server is not responding in a timely manner. By default, requests do not time out unless a timeout value is set explicitly. Without a timeout, your code may hang for minutes or more.
The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It’s a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.
Once your client has connected to the server and sent the HTTP request, the read timeout is the number of seconds the client will wait for the server to send a response. (Specifically, it’s the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte).
If you specify a single value for the timeout, like this:
Code language: JavaScript (javascript)
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect
and the read
timeouts. Specify a tuple if you would like to set the values separately:
Code language: JavaScript (javascript)
r = requests.get('https://github.com', timeout=(3.05, 27))
Request timeout in async coroutines
Since there are a lot of beginners asked how to use requests
in asynchronous programs or coroutines, we’ve made this section dedicated to answering just that.
requests
is a blocking library, which means it should not be used in asynchronous coroutines. The proper libraries to use in this context is aiohttp and async-timeout. aiohttp is an asynchronous HTTP client/server framework, while async-timeout is a asyncio-compatible timeout context manager that can be used to wrap any coroutines inside a “timeout” context manager.
Eager to get started? This page gives a good introduction in how to get started
with Requests.
First, make sure that:
- Requests is installed
- Requests is up-to-date
Let’s get started with some simple examples.
Make a Request¶
Making a request with Requests is very simple.
Begin by importing the Requests module:
Now, let’s try to get a webpage. For this example, let’s get GitHub’s public
timeline:
>>> r = requests.get('https://api.github.com/events')
Now, we have a Response
object called r
. We can
get all the information we need from this object.
Requests’ simple API means that all forms of HTTP request are as obvious. For
example, this is how you make an HTTP POST request:
>>> r = requests.post('https://httpbin.org/post', data = {'key':'value'})
Nice, right? What about the other HTTP request types: PUT, DELETE, HEAD and
OPTIONS? These are all just as simple:
>>> r = requests.put('https://httpbin.org/put', data = {'key':'value'}) >>> r = requests.delete('https://httpbin.org/delete') >>> r = requests.head('https://httpbin.org/get') >>> r = requests.options('https://httpbin.org/get')
That’s all well and good, but it’s also only the start of what Requests can
do.
Passing Parameters In URLs¶
You often want to send some sort of data in the URL’s query string. If
you were constructing the URL by hand, this data would be given as key/value
pairs in the URL after a question mark, e.g. httpbin.org/get?key=val
.
Requests allows you to provide these arguments as a dictionary of strings,
using the params
keyword argument. As an example, if you wanted to pass
key1=value1
and key2=value2
to httpbin.org/get
, you would use the
following code:
>>> payload = {'key1': 'value1', 'key2': 'value2'} >>> r = requests.get('https://httpbin.org/get', params=payload)
You can see that the URL has been correctly encoded by printing the URL:
>>> print(r.url) https://httpbin.org/get?key2=value2&key1=value1
Note that any dictionary key whose value is None
will not be added to the
URL’s query string.
You can also pass a list of items as a value:
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']} >>> r = requests.get('https://httpbin.org/get', params=payload) >>> print(r.url) https://httpbin.org/get?key1=value1&key2=value2&key2=value3
Response Content¶
We can read the content of the server’s response. Consider the GitHub timeline
again:
>>> import requests >>> r = requests.get('https://api.github.com/events') >>> r.text u'[{"repository":{"open_issues":0,"url":"https://github.com/...
Requests will automatically decode content from the server. Most unicode
charsets are seamlessly decoded.
When you make a request, Requests makes educated guesses about the encoding of
the response based on the HTTP headers. The text encoding guessed by Requests
is used when you access r.text
. You can find out what encoding Requests is
using, and change it, using the r.encoding
property:
>>> r.encoding 'utf-8' >>> r.encoding = 'ISO-8859-1'
If you change the encoding, Requests will use the new value of r.encoding
whenever you call r.text
. You might want to do this in any situation where
you can apply special logic to work out what the encoding of the content will
be. For example, HTML and XML have the ability to specify their encoding in
their body. In situations like this, you should use r.content
to find the
encoding, and then set r.encoding
. This will let you use r.text
with
the correct encoding.
Requests will also use custom encodings in the event that you need them. If
you have created your own encoding and registered it with the codecs
module, you can simply use the codec name as the value of r.encoding
and
Requests will handle the decoding for you.
Binary Response Content¶
You can also access the response body as bytes, for non-text requests:
>>> r.content b'[{"repository":{"open_issues":0,"url":"https://github.com/...
The gzip
and deflate
transfer-encodings are automatically decoded for you.
For example, to create an image from binary data returned by a request, you can
use the following code:
>>> from PIL import Image >>> from io import BytesIO >>> i = Image.open(BytesIO(r.content))
JSON Response Content¶
There’s also a builtin JSON decoder, in case you’re dealing with JSON data:
>>> import requests >>> r = requests.get('https://api.github.com/events') >>> r.json() [{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
In case the JSON decoding fails, r.json()
raises an exception. For example, if
the response gets a 204 (No Content), or if the response contains invalid JSON,
attempting r.json()
raises ValueError: No JSON object could be decoded
.
It should be noted that the success of the call to r.json()
does not
indicate the success of the response. Some servers may return a JSON object in a
failed response (e.g. error details with HTTP 500). Such JSON will be decoded
and returned. To check that a request is successful, use
r.raise_for_status()
or check r.status_code
is what you expect.
Raw Response Content¶
In the rare case that you’d like to get the raw socket response from the
server, you can access r.raw
. If you want to do this, make sure you set
stream=True
in your initial request. Once you do, you can do this:
>>> r = requests.get('https://api.github.com/events', stream=True) >>> r.raw <urllib3.response.HTTPResponse object at 0x101194810> >>> r.raw.read(10) 'x1fx8bx08x00x00x00x00x00x00x03'
In general, however, you should use a pattern like this to save what is being
streamed to a file:
with open(filename, 'wb') as fd: for chunk in r.iter_content(chunk_size=128): fd.write(chunk)
Using Response.iter_content
will handle a lot of what you would otherwise
have to handle when using Response.raw
directly. When streaming a
download, the above is the preferred and recommended way to retrieve the
content. Note that chunk_size
can be freely adjusted to a number that
may better fit your use cases.
Note
An important note about using Response.iter_content
versus Response.raw
.
Response.iter_content
will automatically decode the gzip
and deflate
transfer-encodings. Response.raw
is a raw stream of bytes – it does not
transform the response content. If you really need access to the bytes as they
were returned, use Response.raw
.
More complicated POST requests¶
Typically, you want to send some form-encoded data — much like an HTML form.
To do this, simply pass a dictionary to the data
argument. Your
dictionary of data will automatically be form-encoded when the request is made:
>>> payload = {'key1': 'value1', 'key2': 'value2'} >>> r = requests.post("https://httpbin.org/post", data=payload) >>> print(r.text) { ... "form": { "key2": "value2", "key1": "value1" }, ... }
The data
argument can also have multiple values for each key. This can be
done by making data
either a list of tuples or a dictionary with lists
as values. This is particularly useful when the form has multiple elements that
use the same key:
>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')] >>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples) >>> payload_dict = {'key1': ['value1', 'value2']} >>> r2 = requests.post('https://httpbin.org/post', data=payload_dict) >>> print(r1.text) { ... "form": { "key1": [ "value1", "value2" ] }, ... } >>> r1.text == r2.text True
There are times that you may want to send data that is not form-encoded. If
you pass in a string
instead of a dict
, that data will be posted directly.
For example, the GitHub API v3 accepts JSON-Encoded POST/PATCH data:
>>> import json >>> url = 'https://api.github.com/some/endpoint' >>> payload = {'some': 'data'} >>> r = requests.post(url, data=json.dumps(payload))
Instead of encoding the dict
yourself, you can also pass it directly using
the json
parameter (added in version 2.4.2) and it will be encoded automatically:
>>> url = 'https://api.github.com/some/endpoint' >>> payload = {'some': 'data'} >>> r = requests.post(url, json=payload)
Note, the json
parameter is ignored if either data
or files
is passed.
Using the json
parameter in the request will change the Content-Type
in the header to application/json
.
POST a Multipart-Encoded File¶
Requests makes it simple to upload Multipart-encoded files:
>>> url = 'https://httpbin.org/post' >>> files = {'file': open('report.xls', 'rb')} >>> r = requests.post(url, files=files) >>> r.text { ... "files": { "file": "<censored...binary...data>" }, ... }
You can set the filename, content_type and headers explicitly:
>>> url = 'https://httpbin.org/post' >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})} >>> r = requests.post(url, files=files) >>> r.text { ... "files": { "file": "<censored...binary...data>" }, ... }
If you want, you can send strings to be received as files:
>>> url = 'https://httpbin.org/post' >>> files = {'file': ('report.csv', 'some,data,to,sendnanother,row,to,sendn')} >>> r = requests.post(url, files=files) >>> r.text { ... "files": { "file": "some,data,to,send\nanother,row,to,send\n" }, ... }
In the event you are posting a very large file as a multipart/form-data
request, you may want to stream the request. By default, requests
does not
support this, but there is a separate package which does —
requests-toolbelt
. You should read the toolbelt’s documentation for more details about how to use it.
For sending multiple files in one request refer to the advanced
section.
Warning
It is strongly recommended that you open files in binary
mode. This is because Requests may attempt to provide
the Content-Length
header for you, and if it does this value
will be set to the number of bytes in the file. Errors may occur
if you open the file in text mode.
Response Status Codes¶
We can check the response status code:
>>> r = requests.get('https://httpbin.org/get') >>> r.status_code 200
Requests also comes with a built-in status code lookup object for easy
reference:
>>> r.status_code == requests.codes.ok True
If we made a bad request (a 4XX client error or 5XX server error response), we
can raise it with
Response.raise_for_status()
:
>>> bad_r = requests.get('https://httpbin.org/status/404') >>> bad_r.status_code 404 >>> bad_r.raise_for_status() Traceback (most recent call last): File "requests/models.py", line 832, in raise_for_status raise http_error requests.exceptions.HTTPError: 404 Client Error
But, since our status_code
for r
was 200
, when we call
raise_for_status()
we get:
>>> r.raise_for_status() <Response [200]>
All is well.
Note
raise_for_status
returns the response object for a successful response. This eases chaining in trivial cases, where we want bad codes to raise an exception, but use the response otherwise:
>>> value = requests.get('http://httpbin.org/ip').raise_for_status().json()['origin']
Cookies¶
If a response contains some Cookies, you can quickly access them:
>>> url = 'http://example.com/some/cookie/setting/url' >>> r = requests.get(url) >>> r.cookies['example_cookie_name'] 'example_cookie_value'
To send your own cookies to the server, you can use the cookies
parameter:
>>> url = 'https://httpbin.org/cookies' >>> cookies = dict(cookies_are='working') >>> r = requests.get(url, cookies=cookies) >>> r.text '{"cookies": {"cookies_are": "working"}}'
Cookies are returned in a RequestsCookieJar
,
which acts like a dict
but also offers a more complete interface,
suitable for use over multiple domains or paths. Cookie jars can
also be passed in to requests:
>>> jar = requests.cookies.RequestsCookieJar() >>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies') >>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere') >>> url = 'https://httpbin.org/cookies' >>> r = requests.get(url, cookies=jar) >>> r.text '{"cookies": {"tasty_cookie": "yum"}}'
Redirection and History¶
By default Requests will perform location redirection for all verbs except
HEAD.
We can use the history
property of the Response object to track redirection.
The Response.history
list contains the
Response
objects that were created in order to
complete the request. The list is sorted from the oldest to the most recent
response.
For example, GitHub redirects all HTTP requests to HTTPS:
>>> r = requests.get('http://github.com/') >>> r.url 'https://github.com/' >>> r.status_code 200 >>> r.history [<Response [301]>]
If you’re using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable
redirection handling with the allow_redirects
parameter:
>>> r = requests.get('http://github.com/', allow_redirects=False) >>> r.status_code 301 >>> r.history []
If you’re using HEAD, you can enable redirection as well:
>>> r = requests.head('http://github.com/', allow_redirects=True) >>> r.url 'https://github.com/' >>> r.history [<Response [301]>]
Timeouts¶
You can tell Requests to stop waiting for a response after a given number of
seconds with the timeout
parameter. Nearly all production code should use
this parameter in nearly all requests. Failure to do so can cause your program
to hang indefinitely:
>>> requests.get('https://github.com/', timeout=0.001) Traceback (most recent call last): File "<stdin>", line 1, in <module> requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
Note
timeout
is not a time limit on the entire response download;
rather, an exception is raised if the server has not issued a
response for timeout
seconds (more precisely, if no bytes have been
received on the underlying socket for timeout
seconds). If no timeout is specified explicitly, requests do
not time out.
Errors and Exceptions¶
In the event of a network problem (e.g. DNS failure, refused connection, etc),
Requests will raise a ConnectionError
exception.
Response.raise_for_status()
will
raise an HTTPError
if the HTTP request
returned an unsuccessful status code.
If a request times out, a Timeout
exception is
raised.
If a request exceeds the configured number of maximum redirections, a
TooManyRedirects
exception is raised.
All exceptions that Requests explicitly raises inherit from
requests.exceptions.RequestException
.
Ready for more? Check out the advanced section.
If you’re on the job market, consider taking this programming quiz. A substantial donation will be made to this project, if you find a job through this platform.
Python request module is a simple and elegant Python HTTP library. It provides methods for accessing Web resources via HTTP. In the following article, we will use the HTTP GET method in the Request module. This method requests data from the server and the Exception handling comes in handy when the response is not successful. Here, we will go through such situations. We will use Python’s try and except functionality to explore the exceptions that arise from the Requests module.
- url: Returns the URL of the response
- raise_for_status(): If an error occur, this method returns a HTTPError object
- request: Returns the request object that requested this response
- status_code: Returns a number that indicates the status (200 is OK, 404 is Not Found)
Successful Connection Request
The first thing to know is that the response code is 200 if the request is successful.
Python3
Output:
200
Exception Handling for HTTP Errors
Here, we tried the following URL sequence and then passed this variable to the Python requests module using raised_for_status(). If the try part is successful, we will get the response code 200, if the page that we requested doesn’t exist. This is an HTTP error, which was handled by the Request module’s exception HTTPError and you probably got the error 404.
Python3
import
requests
try
:
r
=
requests.get(url, timeout
=
1
)
r.raise_for_status()
except
requests.exceptions.HTTPError as errh:
print
(
"HTTP Error"
)
print
(errh.args[
0
])
print
(r)
Output:
HTTP Error 404 Client Error: Not Found for url: https://www.amazon.com/nothing_here <Response [404]>
General Exception Handling
You could also use a general exception from the Request module. That is requests.exceptions.RequestException.
Python3
try
:
r
=
requests.get(url, timeout
=
1
)
r.raise_for_status()
except
requests.exceptions.RequestException as errex:
print
(
"Exception request"
)
Output:
Exception request
Now, you may have noticed that there is an argument ‘timeout’ passed into the Request module. We could prescribe a time limit for the requested connection to respond. If this has not happened, we could catch that using the exception requests.exceptions.ReadTimeout. To demonstrate this let us find a website that responds successfully.
Python3
import
requests
try
:
r
=
requests.get(url, timeout
=
1
)
r.raise_for_status()
except
requests.exceptions.ReadTimeout as errrt:
print
(
"Time out"
)
print
(r)
Output:
<Response [200]>
If we change timeout = 0.01, the same code would return, because the request could not possibly be that fast.
Time out <Response [200]>
Exception Handling for Missing Schema
Another common error is that we might not specify HTTPS or HTTP in the URL. For example, We cause use requests.exceptions.MissingSchema to catch this exception.
Python3
url
=
"www.google.com"
try
:
r
=
requests.get(url, timeout
=
1
)
r.raise_for_status()
except
requests.exceptions.MissingSchema as errmiss:
print
(
"Missing schema: include http or https"
)
except
requests.exceptions.ReadTimeout as errrt:
print
(
"Time out"
)
Output:
Missing scheme: include http or https
Exception Handling for Connection Error
Let us say that there is a site that doesn’t exist. Here, the error will occur even when you can’t make a connection because of the lack of an internet connection
Python3
try
:
r
=
requests.get(url, timeout
=
1
, verify
=
True
)
r.raise_for_status()
except
requests.exceptions.HTTPError as errh:
print
(
"HTTP Error"
)
print
(errh.args[
0
])
except
requests.exceptions.ReadTimeout as errrt:
print
(
"Time out"
)
except
requests.exceptions.ConnectionError as conerr:
print
(
"Connection error"
)
Output:
Connection error
Putting Everything Together
Here, We put together everything we tried so far the idea is that the exceptions are handled according to the specificity.
For example, url = “https://www.gle.com”, When this code is run for this URL will produce an Exception request. Whereas, In the absence of connection requests.exceptions.ConnectionError will print the Connection Error, and when the connection is not made the general exception is handled by requests.exceptions.RequestException.
Python3
try
:
r
=
requests.get(url, timeout
=
1
, verify
=
True
)
r.raise_for_status()
except
requests.exceptions.HTTPError as errh:
print
(
"HTTP Error"
)
print
(errh.args[
0
])
except
requests.exceptions.ReadTimeout as errrt:
print
(
"Time out"
)
except
requests.exceptions.ConnectionError as conerr:
print
(
"Connection error"
)
except
requests.exceptions.RequestException as errex:
print
(
"Exception request"
)
Output:
Note: The output may change according to requests.
Time out
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Making HTTP Requests With Python
The requests
library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.
Throughout this article, you’ll see some of the most useful features that requests
has to offer as well as how to customize and optimize those features for different situations you may come across. You’ll also learn how to use requests
in an efficient way as well as how to prevent requests to external services from slowing down your application.
In this tutorial, you’ll learn how to:
- Make requests using the most common HTTP methods
- Customize your requests’ headers and data, using the query string and message body
- Inspect data from your requests and responses
- Make authenticated requests
- Configure your requests to help prevent your application from backing up or slowing down
Though I’ve tried to include as much information as you need to understand the features and examples included in this article, I do assume a very basic general knowledge of HTTP. That said, you still may be able to follow along fine anyway.
Now that that is out of the way, let’s dive in and see how you can use requests
in your application!
Getting Started With requests
Let’s begin by installing the requests
library. To do so, run the following command:
If you prefer to use Pipenv for managing Python packages, you can run the following:
$ pipenv install requests
Once requests
is installed, you can use it in your application. Importing requests
looks like this:
Now that you’re all set up, it’s time to begin your journey through requests
. Your first goal will be learning how to make a GET
request.
The GET Request
HTTP methods such as GET
and POST
, determine which action you’re trying to perform when making an HTTP request. Besides GET
and POST
, there are several other common methods that you’ll use later in this tutorial.
One of the most common HTTP methods is GET
. The GET
method indicates that you’re trying to get or retrieve data from a specified resource. To make a GET
request, invoke requests.get()
.
To test this out, you can make a GET
request to GitHub’s Root REST API by calling get()
with the following URL:
>>>
>>> requests.get('https://api.github.com')
<Response [200]>
Congratulations! You’ve made your first request. Let’s dive a little deeper into the response of that request.
The Response
A Response
is a powerful object for inspecting the results of the request. Let’s make that same request again, but this time store the return value in a variable so that you can get a closer look at its attributes and behaviors:
>>>
>>> response = requests.get('https://api.github.com')
In this example, you’ve captured the return value of get()
, which is an instance of Response
, and stored it in a variable called response
. You can now use response
to see a lot of information about the results of your GET
request.
Status Codes
The first bit of information that you can gather from Response
is the status code. A status code informs you of the status of the request.
For example, a 200 OK
status means that your request was successful, whereas a 404 NOT FOUND
status means that the resource you were looking for was not found. There are many other possible status codes as well to give you specific insights into what happened with your request.
By accessing .status_code
, you can see the status code that the server returned:
>>>
>>> response.status_code
200
.status_code
returned a 200
, which means your request was successful and the server responded with the data you were requesting.
Sometimes, you might want to use this information to make decisions in your code:
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found.')
With this logic, if the server returns a 200
status code, your program will print Success!
. If the result is a 404
, your program will print Not Found
.
requests
goes one step further in simplifying this process for you. If you use a Response
instance in a conditional expression, it will evaluate to True
if the status code was between 200
and 400
, and False
otherwise.
Therefore, you can simplify the last example by rewriting the if
statement:
if response:
print('Success!')
else:
print('An error has occurred.')
Keep in mind that this method is not verifying that the status code is equal to 200
. The reason for this is that other status codes within the 200
to 400
range, such as 204 NO CONTENT
and 304 NOT MODIFIED
, are also considered successful in the sense that they provide some workable response.
For example, the 204
tells you that the response was successful, but there’s no content to return in the message body.
So, make sure you use this convenient shorthand only if you want to know if the request was generally successful and then, if necessary, handle the response appropriately based on the status code.
Let’s say you don’t want to check the response’s status code in an if
statement. Instead, you want to raise an exception if the request was unsuccessful. You can do this using .raise_for_status()
:
import requests
from requests.exceptions import HTTPError
for url in ['https://api.github.com', 'https://api.github.com/invalid']:
try:
response = requests.get(url)
# If the response was successful, no Exception will be raised
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}') # Python 3.6
except Exception as err:
print(f'Other error occurred: {err}') # Python 3.6
else:
print('Success!')
If you invoke .raise_for_status()
, an HTTPError
will be raised for certain status codes. If the status code indicates a successful request, the program will proceed without that exception being raised.
Now, you know a lot about how to deal with the status code of the response you got back from the server. However, when you make a GET
request, you rarely only care about the status code of the response. Usually, you want to see more. Next, you’ll see how to view the actual data that the server sent back in the body of the response.
Content
The response of a GET
request often has some valuable information, known as a payload, in the message body. Using the attributes and methods of Response
, you can view the payload in a variety of different formats.
To see the response’s content in bytes
, you use .content
:
>>>
>>> response = requests.get('https://api.github.com')
>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
While .content
gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. response
will do that for you when you access .text
:
>>>
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
Because the decoding of bytes
to a str
requires an encoding scheme, requests
will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding
before accessing .text
:
>>>
>>> response.encoding = 'utf-8' # Optional: requests infers this internally
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str
you retrieved from .text
and deserialize it using json.loads()
. However, a simpler way to accomplish this task is to use .json()
:
>>>
>>> response.json()
{'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.github.com/user/keys', 'notifications_url': 'https://api.github.com/notifications', 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}', 'organization_url': 'https://api.github.com/orgs/{org}', 'public_gists_url': 'https://api.github.com/gists/public', 'rate_limit_url': 'https://api.github.com/rate_limit', 'repository_url': 'https://api.github.com/repos/{owner}/{repo}', 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}', 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}', 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}', 'starred_gists_url': 'https://api.github.com/gists/starred', 'team_url': 'https://api.github.com/teams', 'user_url': 'https://api.github.com/users/{user}', 'user_organizations_url': 'https://api.github.com/user/orgs', 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}', 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}
The type
of the return value of .json()
is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
Query String Parameters
One common way to customize a GET
request is to pass values through query string parameters in the URL. To do this using get()
, you pass data to params
. For example, you can use GitHub’s Search API to look for the requests
library:
import requests
# Search GitHub's repositories for requests
response = requests.get(
'https://api.github.com/search/repositories',
params={'q': 'requests+language:python'},
)
# Inspect some attributes of the `requests` repository
json_response = response.json()
repository = json_response['items'][0]
print(f'Repository name: {repository["name"]}') # Python 3.6+
print(f'Repository description: {repository["description"]}') # Python 3.6+
By passing the dictionary {'q': 'requests+language:python'}
to the params
parameter of .get()
, you are able to modify the results that come back from the Search API.
You can pass params
to get()
in the form of a dictionary, as you have just done, or as a list of tuples:
>>>
>>> requests.get(
... 'https://api.github.com/search/repositories',
... params=[('q', 'requests+language:python')],
... )
<Response [200]>
You can even pass the values as bytes
:
>>>
>>> requests.get(
... 'https://api.github.com/search/repositories',
... params=b'q=requests+language:python',
... )
<Response [200]>
Query strings are useful for parameterizing GET
requests. You can also customize your requests by adding or modifying the headers you send.
Other HTTP Methods
Aside from GET
, other popular HTTP methods include POST
, PUT
, DELETE
, HEAD
, PATCH
, and OPTIONS
. requests
provides a method, with a similar signature to get()
, for each of these HTTP methods:
>>>
>>> requests.post('https://httpbin.org/post', data={'key':'value'})
>>> requests.put('https://httpbin.org/put', data={'key':'value'})
>>> requests.delete('https://httpbin.org/delete')
>>> requests.head('https://httpbin.org/get')
>>> requests.patch('https://httpbin.org/patch', data={'key':'value'})
>>> requests.options('https://httpbin.org/get')
Each function call makes a request to the httpbin
service using the corresponding HTTP method. For each method, you can inspect their responses in the same way you did before:
>>>
>>> response = requests.head('https://httpbin.org/get')
>>> response.headers['Content-Type']
'application/json'
>>> response = requests.delete('https://httpbin.org/delete')
>>> json_response = response.json()
>>> json_response['args']
{}
Headers, response bodies, status codes, and more are returned in the Response
for each method. Next you’ll take a closer look at the POST
, PUT
, and PATCH
methods and learn how they differ from the other request types.
The Message Body
According to the HTTP specification, POST
, PUT
, and the less common PATCH
requests pass their data through the message body rather than through parameters in the query string. Using requests
, you’ll pass the payload to the corresponding function’s data
parameter.
data
takes a dictionary, a list of tuples, bytes, or a file-like object. You’ll want to adapt the data you send in the body of your request to the specific needs of the service you’re interacting with.
For example, if your request’s content type is application/x-www-form-urlencoded
, you can send the form data as a dictionary:
>>>
>>> requests.post('https://httpbin.org/post', data={'key':'value'})
<Response [200]>
You can also send that same data as a list of tuples:
>>>
>>> requests.post('https://httpbin.org/post', data=[('key', 'value')])
<Response [200]>
If, however, you need to send JSON data, you can use the json
parameter. When you pass JSON data via json
, requests
will serialize your data and add the correct Content-Type
header for you.
httpbin.org is a great resource created by the author of requests
, Kenneth Reitz. It’s a service that accepts test requests and responds with data about the requests. For instance, you can use it to inspect a basic POST
request:
>>>
>>> response = requests.post('https://httpbin.org/post', json={'key':'value'})
>>> json_response = response.json()
>>> json_response['data']
'{"key": "value"}'
>>> json_response['headers']['Content-Type']
'application/json'
You can see from the response that the server received your request data and headers as you sent them. requests
also provides this information to you in the form of a PreparedRequest
.
Inspecting Your Request
When you make a request, the requests
library prepares the request before actually sending it to the destination server. Request preparation includes things like validating headers and serializing JSON content.
You can view the PreparedRequest
by accessing .request
:
>>>
>>> response = requests.post('https://httpbin.org/post', json={'key':'value'})
>>> response.request.headers['Content-Type']
'application/json'
>>> response.request.url
'https://httpbin.org/post'
>>> response.request.body
b'{"key": "value"}'
Inspecting the PreparedRequest
gives you access to all kinds of information about the request being made such as payload, URL, headers, authentication, and more.
So far, you’ve made a lot of different kinds of requests, but they’ve all had one thing in common: they’re unauthenticated requests to public APIs. Many services you may come across will want you to authenticate in some way.
Authentication
Authentication helps a service understand who you are. Typically, you provide your credentials to a server by passing data through the Authorization
header or a custom header defined by the service. All the request functions you’ve seen to this point provide a parameter called auth
, which allows you to pass your credentials.
One example of an API that requires authentication is GitHub’s Authenticated User API. This endpoint provides information about the authenticated user’s profile. To make a request to the Authenticated User API, you can pass your GitHub username and password in a tuple to get()
:
>>>
>>> from getpass import getpass
>>> requests.get('https://api.github.com/user', auth=('username', getpass()))
<Response [200]>
The request succeeded if the credentials you passed in the tuple to auth
are valid. If you try to make this request with no credentials, you’ll see that the status code is 401 Unauthorized
:
>>>
>>> requests.get('https://api.github.com/user')
<Response [401]>
When you pass your username and password in a tuple to the auth
parameter, requests
is applying the credentials using HTTP’s Basic access authentication scheme under the hood.
Therefore, you could make the same request by passing explicit Basic authentication credentials using HTTPBasicAuth
:
>>>
>>> from requests.auth import HTTPBasicAuth
>>> from getpass import getpass
>>> requests.get(
... 'https://api.github.com/user',
... auth=HTTPBasicAuth('username', getpass())
... )
<Response [200]>
Though you don’t need to be explicit for Basic authentication, you may want to authenticate using another method. requests
provides other methods of authentication out of the box such as HTTPDigestAuth
and HTTPProxyAuth
.
You can even supply your own authentication mechanism. To do so, you must first create a subclass of AuthBase
. Then, you implement __call__()
:
import requests
from requests.auth import AuthBase
class TokenAuth(AuthBase):
"""Implements a custom authentication scheme."""
def __init__(self, token):
self.token = token
def __call__(self, r):
"""Attach an API token to a custom auth header."""
r.headers['X-TokenAuth'] = f'{self.token}' # Python 3.6+
return r
requests.get('https://httpbin.org/get', auth=TokenAuth('12345abcde-token'))
Here, your custom TokenAuth
mechanism receives a token, then includes that token in the X-TokenAuth
header of your request.
Bad authentication mechanisms can lead to security vulnerabilities, so unless a service requires a custom authentication mechanism for some reason, you’ll always want to use a tried-and-true auth scheme like Basic or OAuth.
While you’re thinking about security, let’s consider dealing with SSL Certificates using requests
.
SSL Certificate Verification
Any time the data you are trying to send or receive is sensitive, security is important. The way that you communicate with secure sites over HTTP is by establishing an encrypted connection using SSL, which means that verifying the target server’s SSL Certificate is critical.
The good news is that requests
does this for you by default. However, there are some cases where you might want to change this behavior.
If you want to disable SSL Certificate verification, you pass False
to the verify
parameter of the request function:
>>>
>>> requests.get('https://api.github.com', verify=False)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
<Response [200]>
requests
even warns you when you’re making an insecure request to help you keep your data safe!
Performance
When using requests
, especially in a production application environment, it’s important to consider performance implications. Features like timeout control, sessions, and retry limits can help you keep your application running smoothly.
Timeouts
When you make an inline request to an external service, your system will need to wait upon the response before moving on. If your application waits too long for that response, requests to your service could back up, your user experience could suffer, or your background jobs could hang.
By default, requests
will wait indefinitely on the response, so you should almost always specify a timeout duration to prevent these things from happening. To set the request’s timeout, use the timeout
parameter. timeout
can be an integer or float representing the number of seconds to wait on a response before timing out:
>>>
>>> requests.get('https://api.github.com', timeout=1)
<Response [200]>
>>> requests.get('https://api.github.com', timeout=3.05)
<Response [200]>
In the first request, the request will timeout after 1 second. In the second request, the request will timeout after 3.05 seconds.
You can also pass a tuple to timeout
with the first element being a connect timeout (the time it allows for the client to establish a connection to the server), and the second being a read timeout (the time it will wait on a response once your client has established a connection):
>>>
>>> requests.get('https://api.github.com', timeout=(2, 5))
<Response [200]>
If the request establishes a connection within 2 seconds and receives data within 5 seconds of the connection being established, then the response will be returned as it was before. If the request times out, then the function will raise a Timeout
exception:
import requests
from requests.exceptions import Timeout
try:
response = requests.get('https://api.github.com', timeout=1)
except Timeout:
print('The request timed out')
else:
print('The request did not time out')
Your program can catch the Timeout
exception and respond accordingly.
The Session Object
Until now, you’ve been dealing with high level requests
APIs such as get()
and post()
. These functions are abstractions of what’s going on when you make your requests. They hide implementation details such as how connections are managed so that you don’t have to worry about them.
Underneath those abstractions is a class called Session
. If you need to fine-tune your control over how requests are being made or improve the performance of your requests, you may need to use a Session
instance directly.
Sessions are used to persist parameters across requests. For example, if you want to use the same authentication across multiple requests, you could use a session:
import requests
from getpass import getpass
# By using a context manager, you can ensure the resources used by
# the session will be released after use
with requests.Session() as session:
session.auth = ('username', getpass())
# Instead of requests.get(), you'll use session.get()
response = session.get('https://api.github.com/user')
# You can inspect the response just like you did before
print(response.headers)
print(response.json())
Each time you make a request with session
, once it has been initialized with authentication credentials, the credentials will be persisted.
The primary performance optimization of sessions comes in the form of persistent connections. When your app makes a connection to a server using a Session
, it keeps that connection around in a connection pool. When your app wants to connect to the same server again, it will reuse a connection from the pool rather than establishing a new one.
Max Retries
When a request fails, you may want your application to retry the same request. However, requests
will not do this for you by default. To apply this functionality, you need to implement a custom Transport Adapter.
Transport Adapters let you define a set of configurations per service you’re interacting with. For example, let’s say you want all requests to https://api.github.com
to retry three times before finally raising a ConnectionError
. You would build a Transport Adapter, set its max_retries
parameter, and mount it to an existing Session
:
import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError
github_adapter = HTTPAdapter(max_retries=3)
session = requests.Session()
# Use `github_adapter` for all requests to endpoints that start with this URL
session.mount('https://api.github.com', github_adapter)
try:
session.get('https://api.github.com')
except ConnectionError as ce:
print(ce)
When you mount the HTTPAdapter
, github_adapter
, to session
, session
will adhere to its configuration for each request to https://api.github.com.
Timeouts, Transport Adapters, and sessions are for keeping your code efficient and your application resilient.
Conclusion
You’ve come a long way in learning about Python’s powerful requests
library.
You’re now able to:
- Make requests using a variety of different HTTP methods such as
GET
,POST
, andPUT
- Customize your requests by modifying headers, authentication, query strings, and message bodies
- Inspect the data you send to the server and the data the server sends back to you
- Work with SSL Certificate verification
- Use
requests
effectively usingmax_retries
,timeout
, Sessions, and Transport Adapters
Because you learned how to use requests
, you’re equipped to explore the wide world of web services and build awesome applications using the fascinating data they provide.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Making HTTP Requests With Python
Содержание
- Введение в тему
- Создание get и post запроса
- Передача параметров в url
- Содержимое ответа response
- Бинарное содержимое ответа
- Содержимое ответа в json
- Необработанное содержимое ответа
- Пользовательские заголовки
- Более сложные post запросы
- Post отправка multipart encoded файла
- Коды состояния ответа
- Заголовки ответов
- Cookies
- Редиректы и история
- Тайм ауты
- Ошибки и исключения
Введение в тему
Модуль python requests – это общепринятый стандарт для работы с запросами по протоколу HTTP.
Этот модуль избавляет Вас от необходимости работать с низкоуровневыми деталями. Работа с запросами становится простой и элегантной.
В этом уроке будут рассмотрены самые полезные функций библиотеки requests и различные способы их использования.
Перед использованием модуля его необходимо установить:
Создание get и post запроса
Сперва необходимо добавить модуль Requests в Ваш код:
Создадим запрос и получим ответ, содержащий страницу и все необходимые данные о ней.
import requests
response = requests.get('https://www.google.ru/')
В переменную response попадает ответ на запрос. Благодаря этому объекту можно использовать любую информацию, относящуюся к этому ответу.
Сделать POST запрос так же очень просто:
import requests
response = requests.post('https://www.google.ru/', data = {'foo':3})
Другие виды HTTP запросов, к примеру: PUT, DELETE, и прочих, выполнить ничуть не сложнее:
import requests
response = requests.put('https://www.google.ru/', data = {'foo':3})
response = requests.delete('https://www.google.ru/')
response = requests.head('https://www.google.ru/')
response = requests.options('https://www.google.ru/')
Передача параметров в url
Иногда может быть необходимо отправить различные данные вместе с запросом URL. При ручной настройке URL, параметры выглядят как пары ключ=значение после знака «?». Например, https://www.google.ru/search?q=Python. Модуль Requests предоставляет возможность передать эти параметры как словарь, применяя аргумент params. Если вы хотите передать q = Python и foo=’bar’ ресурсу google.ru/search, вы должны использовать следующий код:
import requests
params_dict = {'q':'Python', 'foo':'bar'}
response = requests.get('https://www.google.ru/search', params=params_dict)
print(response.url)
#Вывод:
https://www.google.ru/search?q=Python&foo=bar
Здесь мы видим, что URL был сформирован именно так, как это было задумано.
Пара ключ=значение, где значение равняется None, не будет добавлена к параметрам запроса URL.
Так же есть возможность передавать в запрос список параметров:
import requests
params_dict = {'q':'Python', 'foo':['bar', 'eggs']}
response = requests.get('https://www.google.ru/search', params=params_dict)
print(response.url)
#Вывод:
https://www.google.ru/search?q=Python&foo=bar&foo=eggs
Содержимое ответа response
Код из предыдущего листинга создаёт объект Response, содержащий ответ сервера на наш запрос. Обратившись к его атрибуту .url можно просмотреть адрес, куда был направлен запрос. Атрибут .text позволяет просмотреть содержимое ответа. Вот как это работает:
import requests
params_dict = {'q':'Python'}
response = requests.get('https://www.google.ru/search', params=params_dict)
print(response.text)
#Вывод:<!doctype html><html lang="ru"><head><meta charset="UTF-8"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png"…
Библиотека автоматически пытается определить кодировку ответа основываясь на его заголовках. Узнать, какую кодировку выбрал модуль, можно следующим образом:
import requests
params_dict = {'q':'Python'}
response = requests.get('https://www.google.ru/search', params=params_dict)
print(response.encoding)
#Вывод:
windows-1251
Можно так же самостоятельно установить кодировку используя атрибут .encoding.
import requests
params_dict = {'q':'Python'}
response = requests.get('https://www.google.ru/search', params=params_dict)
response.encoding = 'utf-8' # указываем необходимую кодировку вручную
print(response.encoding)
#Вывод:
utf-8
Бинарное содержимое ответа
Существует возможность просмотра ответа в виде байтов:
import requests
params_dict = {'q':'Python'}
response = requests.get('https://www.google.ru/search', params=params_dict)
print(response.content)
#Вывод:
b'<!doctype html><html lang="ru"><head><meta charset="UTF-8"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" …
При передаче со сжатием ответ автоматически декодируется для Вас.
Содержимое ответа в json
Так же в Requests есть встроенная обработка ответов в формате JSON:
import requests
import json
response = requests.get(‘http://api.open-notify.org/astros.json’)
print(json.dumps(response.json(), sort_keys=True, indent=4))
#Вывод:
{
«message»: «success»,
«number»: 10,
«people»: [
{
«craft»: «ISS»,
«name»: «Mark Vande Hei»
},
{
«craft»: «ISS»,
«name»: «Oleg Novitskiy»
},
…
[/dm_code_snippet]
Если ответ не является JSON, то .json выбросит исключение:
import requests
import json
response = requests.get('https://www.google.ru/search')
print(json.dumps(response.json(), sort_keys=True, indent=4))
#Вывод:
…
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Необработанное содержимое ответа
Если Вам нужно получить доступ к ответу сервера в чистом виде на уровне сокета, обратитесь к атрибуту .raw. Для этого необходимо указать параметр stream=True в запросе. Этот параметр заставляет модуль читать данные по мере их прибытия.
import requests
response = requests.get('https://www.google.ru/', stream=True)
print(response.raw)
print('Q'*10)
print(response.raw.read(15))
#Вывод:
<urllib3.response.HTTPResponse object at 0x000001E368771FA0>
QQQQQQQQQQ
b'x1fx8bx08x00x00x00x00x00x02xffxc5[[sxdb'
Так же можно использовать метод .iter_content. Этот метод итерирует данные потокового ответа и это позволяет избежать чтения содержимого сразу в память для больших ответов. Параметр chunk_size – это количество байтов, которые он должен прочитать в памяти. Параметр chunk_size можно произвольно менять.
import requests
response = requests.get('https://www.google.ru/', stream=True)
print(response.iter_content)
print('Q'*10)
print([i for i in response.iter_content(chunk_size=256)])
#Вывод:
<bound method Response.iter_content of <Response [200]>>
QQQQQQQQQQ
[b'<!doctype html><html itemscope="" itemtype="http://sche', b'ma.org/WebPage" lang="ru"><head><meta content=…
response.iter_content будет автоматически декодировать сжатый ответ. Response.raw — чистый набор байтов, неизменённое содержимое ответа.
Пользовательские заголовки
Если необходимо установить заголовки в HTTP запросе, передайте словарь с ними в параметр headers. Значения заголовка должны быть типа string, bytestring или unicode. Имена заголовков не чувствительны к регистру символов.
В следующем примере мы устанавливаем информацию об используемом браузере:
import requests
response = requests.get('https://www.google.ru/', headers={'user-agent': 'unknown_browser'})
print(response.request.headers)
# Вывод:
{'user-agent': 'unknown_browser', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Более сложные post запросы
Существует способ отправить данные так, будто это результат заполнения формы на сайте:
import requests
response = requests.post('https://httpbin.org/post', data={'foo': 'bar'})
print(response.text)
# Вывод:
{
"args": {},
"data": "",
"files": {},
"form": {
"foo": "bar"
},
"headers": {
…
Параметр data может иметь произвольное количество значений для каждого ключа. Для этого необходимо указать data в формате кортежа, либо в виде dict со списками значений.
import requests
response = requests.post('https://httpbin.org/post', data={'foo':['bar', 'eggs']})
print(response.json()['form'])
print('|'*10)
response = requests.post('https://httpbin.org/post', data=[('foo', 'bar'), ('foo', 'eggs')])
print(response.json()['form'])
# Вывод:
{'foo': ['bar', 'eggs']}
||||||||||
{'foo': ['bar', 'eggs']}
Если нужно отправить данные, не закодированные как данные формы, то передайте в запрос строку вместо словаря. Тогда данные отправятся в изначальном виде.
import requests
response = requests.post('https://httpbin.org/post', data={'foo': 'bar'})
print('URL:', response.request.url)
print('Body:', response.request.body)
print('-' * 10)
response = requests.post('https://httpbin.org/post', data='foo=bar')
print('URL:', response.request.url)
print('Body:', response.request.body)
# Вывод:
URL: https://httpbin.org/post
URL: https://httpbin.org/post
Body: foo=bar
----------
URL: https://httpbin.org/post
Body: foo=bar
Post отправка multipart encoded файла
Запросы упрощают загрузку файлов с многостраничным кодированием (Multipart-Encoded):
import requests
url = 'https://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
response = requests.post(url, files=files)
print(response.text)
# Вывод:
{
...
"files": {
"file": "<censored...binary...data>"
},
...
}
Вы можете установить имя файла, content_type и заголовки в явном виде:
import requests
url = 'https://httpbin.org/post'
files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
response = requests.post(url, files=files)
print(response.text)
# Вывод:
{
...
"files": {
"file": "<censored...binary...data>"
},
...
}
Можете отправить строки, которые будут приняты в виде файлов:
import requests
url = 'https://httpbin.org/post'
files = {'file': ('report.csv', 'some,data,to,sendnanother,row,to,sendn')}
response = requests.post(url, files=files)
print(response.text)
# Вывод:
{
...
"files": {
"file": "some,data,to,send\nanother,row,to,send\n"
},
...
}
Коды состояния ответа
Возможно, наиболее важные данные (первые – уж точно), которые вы можете получить, используя библиотеку requests, является код состояния ответа.
Так, 200 статус означает, что запрос выполнен успешно, тогда как 404 статус означает, что ресурс не найден.
Важнее всего то, с какой цифры начинается код состояния:
- 1XX — информация
- 2XX — успешно
- 3XX — перенаправление
- 4XX — ошибка клиента (ошибка на нашей стороне)
- 5XX — ошибка сервера (самые страшные коды для разработчика)
Используя атрибут .status_code можно получить статус, который вернул сервер:
import requests
response = requests.get('https://www.google.ru/')
print(response.status_code)
# Вывод:
200
.status_code вернул 200 — это означает, что запрос успешно выполнен и сервер вернул запрашиваемые данные.
При желании, такую информацию можно применить в Вашем Пайтон скрипте для принятия решений:
import requests
response = requests.get('https://www.google.ru/')
if response.status_code == 200: print('Успех!')elif response.status_code == 404: print('Страница куда-то пропала…')
# Вывод:
Успех!
Если код состояния response равен 200, то скрипт выведет «Успех!», но, если он равен 404, то скрипт вернёт «Страница куда-то пропала…».
Если применить модуль Response в условном выражении и проверить логическое значение его экземпляра (if response) то он продемонстрирует значение True, если код ответа находится в диапазоне между 200 и 400, и False во всех остальных случаях.
Упростим код из предыдущего примера:
import requests
response = requests.get('https://www.google.ru/fake/')
if response:
print('Успех!')
else:
print('Хьюстон, у нас проблемы!')
# Вывод:
Хьюстон, у нас проблемы!
Данный способ не проверяет, что код состояния равен именно 200.
Причиной этого является то, что response с кодом в диапазоне от 200 до 400, такие как 204 и 304, тоже являются успешными, ведь они возвращают обрабатываемый ответ. Следовательно, этот подход делит все запросы на успешные и неуспешные – не более того. Во многих случаях Вам потребуется более детальная обработка кодов состояния запроса.
Вы можете вызвать exception, если requests.get был неудачным. Такую конструкцию можно создать вызвав .raise_for_status() используя конструкцию try- except:
import requests
from requests.exceptions import HTTPError
for url in ['https://www.google.ru/', 'https://www.google.ru/invalid']:
try:
response = requests.get(url)
response.raise_for_status()
except HTTPError:
print(f'Возникла ошибка HTTP: {HTTPError}')
except Exception as err:
print(f'Возникла непредвиденная ошибка: {err}')
else:
print('Успех!')
# Вывод:
Успех!
Возникла ошибка HTTP: <class 'requests.exceptions.HTTPError'>
Заголовки ответов
Мы можем просматривать заголовки ответа сервера:
import requests
response = requests.get('https://www.google.ru/')
print(response.headers)
# Вывод:
{'Date': 'Sun, 27 Jun 2021 13:43:17 GMT', 'Expires': '-1', 'Cache-Control': 'private, max-age=0', 'Content-Type': 'text/html; charset=windows-1251', 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."', 'Content-Encoding': 'gzip', 'Server': 'gws', 'X-XSS-Protection': '0', 'X-Frame-Options': …
Cookies
Можно просмотреть файлы cookie, которые сервер отправляет вам обратно с помощью атрибута .cookies. Запросы также позволяют отправлять свои собственные cookie-файлы.
Чтобы добавить куки в запрос, Вы должны использовать dict, переданный в параметр cookie.
import requests
url = 'https://www.google.ru/'
headers = {'user-agent': 'your-own-user-agent/0.0.1'}
cookies = {'visit-month': 'February'}
response = requests.get(url, headers=headers, cookies=cookies)
print(response.request.headers)
# Вывод:
{'user-agent': 'your-own-user-agent/0.0.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Cookie': 'visit-month=February'}
Редиректы и история
По умолчанию модуль Requests выполняет редиректы для всех HTTP глаголов, кроме HEAD.
Существует возможность использовать параметр history объекта Response, чтобы отслеживать редиректы.
Например, GitHub перенаправляет все запросы HTTP на HTTPS:
import requests
response = requests.get('https://www.google.ru/')
print(response.url)
print(response.status_code)
print(response.history)
# Вывод:
https://www.google.ru/
200
[]
Тайм ауты
Так же легко можно управлять тем, сколько программа будет ждать возврат response. Время ожидания задаётся параметром timeout. Это очень важный параметр, так как, если его не использовать, написанный Вами скрипт может «зависнуть» в вечном ожидании ответа от сервера. Используем предыдущий код:
import requests
response = requests.get(‘https://www.google.ru/’, timeout=0.001)
print(response.url)
print(response.status_code)
print(response.history)
# Вывод:…
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host=’www.google.ru’, port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001E331681C70>, ‘Connection to www.google.ru timed out. (connect timeout=0.001)’))
Модуль не ждёт полной загрузки ответа. Исключение возникает, если сервер не отвечает (хотя бы один байт) за указанное время.
Ошибки и исключения
Если возникнет непредвиденная ситуация – ошибка соединения, модуль Requests выбросит эксепшн ConnectionError.
response.raise_for_status() возвращает объект HTTPError, если в процессе произошла ошибка. Его применяют для отладки модуля и, поэтому, он является неотъемлемой частью запросов Python.
Если выйдет время запроса, вызывается исключение Timeout. Если слишком много перенаправлений, то появится исключение TooManyRedirects.
Недавно я работал с парсером для клиента, которому нужно было получить около миллиона записей с сайта недвижимости. После определенной отметки, парсер перестал работать по той причине, что я забыл внедрить определенные проверки, так как я думал, что клиент не пойдет в этом направлении, но он пошел!
Через несколько дней я задумался над написанием парсера в Python при помощи Beautifulsoup. В этой статье я хочу обсудить, как сделать ваш парсер более удобным для людей, не особо знакомых с технической частью.
Есть вопросы по Python?
На нашем форуме вы можете задать любой вопрос и получить ответ от всего нашего сообщества!
Telegram Чат & Канал
Вступите в наш дружный чат по Python и начните общение с единомышленниками! Станьте частью большого сообщества!
Паблик VK
Одно из самых больших сообществ по Python в социальной сети ВК. Видео уроки и книги для вас!
1. Проверьте код состояния 200
Всегда хорошо проверять код состояния HTTP заранее и делать это периодически. Вот хороший пример:
import requests r = requests.get(‘https://google.com’) if r.status_code == 200: print(‘Все в норме!’) if r.status_code == 404: print(‘Страница не существует!’) |
Да, особенно если вы не можете его контролировать. Веб скрепинг зависит от HTML DOM, простое изменение в элементе или классе может сломать весь скрипт. Лучший способ справится с этим – узнать, возвращает ли None или нет.
page_count = soup.select(‘.pager-pages > li > a’) if page_count: # Все в норме, работаем дальше… else: # Ошибка! Отправляем уведомление админу. |
Здесь я проверяю, вернул ли CSS селектор что-нибудь законное, если да – то продолжаем дальше.
3. Настройте заголовки
Python Requests не заставляет вас использовать заголовки при отправке запросов, однако есть несколько умных сайтов, которые не дадут вам прочитать ничего важного, если определенные заголовки не присутствуют.
Однажды я столкнулся с ситуацией: HTML, который я видел в браузере отличался от того, который был в моем скрипте. Так что делать запросы настолько правильными, насколько вы можете – очень хорошая практика. Меньшее, что вы должны сделать – это установить User-Agent.
headers = { ‘user-agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36’ } r = requests.get(url, headers=headers, timeout=5) |
4. Настройка таймаута
Одна из проблем Python Requests – это то, что вы не указываете таймаут, так что он будет ждать ответа от сайта до последнего. Это может быть хорошо при определенных условиях, но не в большинстве случаев. Тем не менее, всегда хорошо настроить значение таймаута для каждого запроса. Здесь я установлю таймаут на 5 секунд.
r = requests.get(url, headers=headers, timeout=5) |
5. Обработка ошибок
Всегда хорошо реализовать обработку ошибок. Это не только поможет избежать неожиданного выхода скрипта, но также помоет вести журнал ошибок и уведомлений. Используя запросы Python, я предпочитаю ловить ошибки следующим образом:
Попробуйте:
try: # Логика нашего парсера. r = requests.get(‘https://python-scripts.com’) except requests.ConnectionError as e: print(«OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.n») print(str(e)) except requests.Timeout as e: print(«OOPS!! Timeout Error») print(str(e)) except requests.RequestException as e: print(«OOPS!! General Error») print(str(e)) except KeyboardInterrupt: print(«Someone closed the program») |
Проверьте последнюю часть кода. В ней программе указывается, что если кто-нибудь хочет завершить программу через Ctrl+C, то содержимое сначала оборачивается, после чего выполняется. Эта ситуация хороша, если вы храните информацию в файле и хотите сбросить все в момент выхода.
6. Эффективная обработка файлов
Одна из функций парсера – это хранение данных как в базе данных, так и в обычных файлах, таких как CSV/Text. Если собираете большой объем данных, это не означает, что операция ввода-вывода будет в цикле. Давайте рассмотрим, как это делается.
Пробуем:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
try: a_list_variable = [] a_list_variable.extend(a_func_return_record()) except requests.ConnectionError as e: print(«Упс!! Ошибка подключения к интернету.») print(str(e)) except requests.Timeout as e: print(«Упс!! Время ожидания истекло.») print(str(e)) except requests.RequestException as e: print(«Упс!! Возникла непредвиденная ошибка!») print(str(e)) except KeyboardInterrupt: print(«Кто-то закрыл принудительно программу.») finally: print(«Total Records = « + str(len(property_urls))) try: # файл для хранения URL record_file = open(‘records_file.txt’, ‘a+’) record_file.write(«n».join(property_urls)) record_file.close() except Exception as ex: print(«Возникла ошибка при сохранении данных, текст ошибки:») print(str(e)) |
Здесь я вызываю функцию (хотя вы не обязаны делать то же самое), которая добавляет записи в список. Как только это будет сделано, или программа будет остановлена, перед завершением она просто сохранит весь список в файл за раз. Намного лучше, чем несколько операций ввода-вывода
Надеюсь, эта статья была для вас полезной. Пожалуйста, Поделитесь своим опытом о том, как сделать парсер более эффективным!
Являюсь администратором нескольких порталов по обучению языков программирования Python, Golang и Kotlin. В составе небольшой команды единомышленников, мы занимаемся популяризацией языков программирования на русскоязычную аудиторию. Большая часть статей была адаптирована нами на русский язык и распространяется бесплатно.
E-mail: vasile.buldumac@ati.utm.md
Образование
Universitatea Tehnică a Moldovei (utm.md)
- 2014 — 2018 Технический Университет Молдовы, ИТ-Инженер. Тема дипломной работы «Автоматизация покупки и продажи криптовалюты используя технический анализ»
- 2018 — 2020 Технический Университет Молдовы, Магистр, Магистерская диссертация «Идентификация человека в киберпространстве по фотографии лица»