Основная функциональность
Пример конфигурации
user www www; worker_processes 2; error_log /var/log/nginx-error.log info; events { use kqueue; worker_connections 2048; } ...
Директивы
Синтаксис: |
accept_mutex
|
---|---|
Умолчание: |
accept_mutex off; |
Контекст: |
events
|
Если accept_mutex
включён,
рабочие процессы будут принимать новые соединения по очереди.
В противном случае о новых соединениях будет сообщаться сразу всем рабочим
процессам, и при низкой интенсивности поступления новых соединений
часть рабочих процессов может работать вхолостую.
Нет необходимости включать
accept_mutex
на системах, поддерживающих
флаг EPOLLEXCLUSIVE (1.11.3), или
при использовании reuseport.
До версии 1.11.3 по умолчанию использовалось значение
on
.
Синтаксис: |
accept_mutex_delay
|
---|---|
Умолчание: |
accept_mutex_delay 500ms; |
Контекст: |
events
|
При включённом accept_mutex задаёт максимальное время,
в течение которого рабочий процесс вновь попытается начать принимать
новые соединения, если в настоящий момент новые соединения принимает
другой рабочий процесс.
Синтаксис: |
daemon
|
---|---|
Умолчание: |
daemon on; |
Контекст: |
main
|
Определяет, будет ли nginx запускаться в режиме демона.
Используется в основном для разработки.
Синтаксис: |
debug_connection
|
---|---|
Умолчание: |
— |
Контекст: |
events
|
Включает отладочный лог для отдельных клиентских соединений.
Для остальных соединений используется уровень лога, заданный директивой
error_log.
Отлаживаемые соединения задаются IPv4 или IPv6 (1.3.0, 1.2.1)
адресом или сетью.
Соединение может быть также задано при помощи имени хоста.
Отладочный лог для соединений через UNIX-сокеты (1.3.0, 1.2.1)
включается параметром “unix:
”.
events { debug_connection 127.0.0.1; debug_connection localhost; debug_connection 192.0.2.0/24; debug_connection ::1; debug_connection 2001:0db8::/32; debug_connection unix:; ... }
Для работы директивы необходимо сконфигурировать nginx с параметром
--with-debug
,
см. “Отладочный лог”.
Синтаксис: |
debug_points
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Эта директива используется для отладки.
В случае обнаружения внутренней ошибки, например, утечки сокетов в момент
перезапуска рабочих процессов, включение debug_points
приводит к созданию core-файла (abort
)
или остановке процесса (stop
) с целью последующей
диагностики с помощью системного отладчика.
Синтаксис: |
env
|
---|---|
Умолчание: |
env TZ; |
Контекст: |
main
|
По умолчанию nginx удаляет все переменные окружения, унаследованные
от своего родительского процесса, кроме переменной TZ.
Эта директива позволяет сохранить часть унаследованных переменных,
поменять им значения или же создать новые переменные окружения.
Эти переменные затем:
-
наследуются во время
обновления исполняемого файла на лету; -
используются модулем
ngx_http_perl_module; -
используются рабочими процессами.
Следует иметь в виду, что управление поведением системных библиотек
подобным образом возможно не всегда, поскольку зачастую библиотеки используют
переменные только во время инициализации, то есть ещё до того, как их
можно задать с помощью данной директивы.
Исключением из этого является упомянутое выше
обновление исполняемого файла на лету.
Если переменная TZ не описана явно, то она всегда наследуется
и всегда доступна модулю
ngx_http_perl_module.
Пример использования:
env MALLOC_OPTIONS; env PERL5LIB=/data/site/modules; env OPENSSL_ALLOW_PROXY_CERTS=1;
Переменная окружения NGINX используется для внутренних целей nginx
и не должна устанавливаться непосредственно самим пользователем.
Синтаксис: |
error_log
|
---|---|
Умолчание: |
error_log logs/error.log error; |
Контекст: |
main , http , mail , stream , server , location
|
Конфигурирует запись в лог.
На одном уровне конфигурации может использоваться несколько логов (1.5.2).
Если на уровне конфигурации main
запись лога в файл
явно не задана, то используется файл по умолчанию.
Первый параметр задаёт файл
, который будет хранить лог.
Специальное значение stderr
выбирает стандартный файл ошибок.
Запись в syslog настраивается указанием префикса
“syslog:
”.
Запись в
кольцевой буфер в памяти
настраивается указанием префикса “memory:
” и
размера
буфера и как правило используется для отладки (1.7.11).
Второй параметр определяет уровень
лога
и может принимать одно из следующих значений:
debug
, info
, notice
,
warn
, error
, crit
,
alert
или emerg
.
Уровни лога, указанные выше, перечислены в порядке возрастания важности.
При установке определённого уровня в лог попадают все сообщения
указанного уровня и уровней большей важности.
Например, при стандартном уровне error
в лог попадают
сообщения уровней error
, crit
,
alert
и emerg
.
Если этот параметр не задан, используется error
.
Для работы уровня лога
debug
необходимо сконфигурировать
nginx с--with-debug
,
см. “Отладочный лог”.
Директива может быть указана на
уровнеstream
начиная с версии 1.7.11
и на уровне
начиная с версии 1.9.0.
Синтаксис: |
events { ... }
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Предоставляет контекст конфигурационного файла, в котором указываются
директивы, влияющие на обработку соединений.
Синтаксис: |
include
|
---|---|
Умолчание: |
— |
Контекст: |
любой
|
Включает в конфигурацию другой файл
или файлы,
подходящие под заданную маску.
Включаемые файлы должны содержать синтаксически верные директивы и блоки.
Пример использования:
include mime.types; include vhosts/*.conf;
Синтаксис: |
load_module
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Эта директива появилась в версии 1.9.11.
Загружает динамический модуль.
Пример:
load_module modules/ngx_mail_module.so;
Синтаксис: |
lock_file
|
---|---|
Умолчание: |
lock_file logs/nginx.lock; |
Контекст: |
main
|
Для реализации accept_mutex и сериализации доступа к
разделяемой памяти nginx использует механизм блокировок.
На большинстве систем блокировки реализованы с помощью атомарных
операций, и эта директива игнорируется.
Для остальных систем применяется механизм файлов блокировок.
Эта директива задаёт префикс имён файлов блокировок.
Синтаксис: |
master_process
|
---|---|
Умолчание: |
master_process on; |
Контекст: |
main
|
Определяет, будут ли запускаться рабочие процессы.
Эта директива предназначена для разработчиков nginx.
Синтаксис: |
multi_accept
|
---|---|
Умолчание: |
multi_accept off; |
Контекст: |
events
|
Если multi_accept
выключен, рабочий процесс
за один раз будет принимать только одно новое соединение.
В противном случае рабочий процесс
за один раз будет принимать сразу все новые соединения.
Директива игнорируется в случае использования метода обработки соединений
kqueue, т.к. данный метод сам сообщает
число новых соединений, ожидающих приёма.
Синтаксис: |
pcre_jit
|
---|---|
Умолчание: |
pcre_jit off; |
Контекст: |
main
|
Эта директива появилась в версии 1.1.12.
Разрешает или запрещает использование JIT-компиляции (PCRE JIT)
для регулярных выражений, известных на момент парсинга конфигурации.
Использование PCRE JIT способно существенно ускорить обработку
регулярных выражений.
Для работы JIT необходима библиотека PCRE версии 8.20 или выше,
собранная с параметром конфигурации--enable-jit
.
При сборке библиотеки PCRE вместе с nginx (--with-pcre=
),
для включения поддержки JIT необходимо использовать параметр
конфигурации--with-pcre-jit
.
Синтаксис: |
pid
|
---|---|
Умолчание: |
pid logs/nginx.pid; |
Контекст: |
main
|
Задаёт файл
, в котором будет храниться номер (PID) главного процесса.
Синтаксис: |
ssl_engine
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Задаёт название аппаратного SSL-акселератора.
Синтаксис: |
thread_pool
|
---|---|
Умолчание: |
thread_pool default threads=32 max_queue=65536; |
Контекст: |
main
|
Эта директива появилась в версии 1.7.11.
Задаёт имя
и параметры пула потоков,
используемого для многопоточной обработки операций чтения и отправки файлов
без блокирования
рабочего процесса.
Параметр threads
задаёт число потоков в пуле.
Если все потоки из пула заняты выполнением заданий,
новое задание будет ожидать своего выполнения в очереди.
Параметр max_queue
ограничивает число заданий,
ожидающих своего выполнения в очереди.
По умолчанию в очереди может находиться до 65536 заданий.
При переполнении очереди задание завершается с ошибкой.
Синтаксис: |
timer_resolution
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Уменьшает разрешение таймеров времени в рабочих процессах, за счёт
чего уменьшается число системных вызовов gettimeofday()
.
По умолчанию gettimeofday()
вызывается после каждой
операции получения событий из ядра.
При уменьшении разрешения gettimeofday()
вызывается только
один раз за указанный интервал
.
Пример использования:
timer_resolution 100ms;
Внутренняя реализация интервала зависит от используемого метода:
-
фильтр
EVFILT_TIMER
при использованииkqueue
; -
timer_create()
при использованииeventport
; -
и
setitimer()
во всех остальных случаях.
Синтаксис: |
use
|
---|---|
Умолчание: |
— |
Контекст: |
events
|
Задаёт метод
, используемый для
обработки соединений.
Обычно нет необходимости задавать его явно, поскольку по умолчанию
nginx сам выбирает наиболее эффективный метод.
Синтаксис: |
user
|
---|---|
Умолчание: |
user nobody nobody; |
Контекст: |
main
|
Задаёт пользователя и группу, с правами которого будут работать
рабочие процессы.
Если группа
не задана, то используется группа, имя
которой совпадает с именем пользователя.
Синтаксис: |
worker_aio_requests
|
---|---|
Умолчание: |
worker_aio_requests 32; |
Контекст: |
events
|
Эта директива появилась в версиях 1.1.4 и 1.0.7.
При использовании aio
совместно с методом обработки соединений
epoll,
задаёт максимальное число
ожидающих обработки операций
асинхронного ввода-вывода для одного рабочего процесса.
Синтаксис: |
worker_connections
|
---|---|
Умолчание: |
worker_connections 512; |
Контекст: |
events
|
Задаёт максимальное число соединений, которые одновременно
может открыть рабочий процесс.
Следует иметь в виду, что в это число входят все соединения
(в том числе, например, соединения с проксируемыми серверами),
а не только соединения с клиентами.
Стоит также учитывать, что фактическое число одновременных
соединений не может превышать действующего ограничения на
максимальное число открытых файлов,
которое можно изменить с помощью worker_rlimit_nofile.
Синтаксис: |
worker_cpu_affinity worker_cpu_affinity
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Привязывает рабочие процессы к группам процессоров.
Каждая группа процессоров задаётся битовой маской
разрешённых к использованию процессоров.
Для каждого рабочего процесса должна быть задана отдельная группа.
По умолчанию рабочие процессы не привязаны к конкретным процессорам.
Например,
worker_processes 4; worker_cpu_affinity 0001 0010 0100 1000;
привязывает каждый рабочий процесс к отдельному процессору, тогда как
worker_processes 2; worker_cpu_affinity 0101 1010;
привязывает первый рабочий процесс к CPU0/CPU2,
а второй — к CPU1/CPU3.
Второй пример пригоден для hyper-threading.
Специальное значение auto
(1.9.10) позволяет
автоматически привязать рабочие процессы к доступным процессорам:
worker_processes auto; worker_cpu_affinity auto;
С помощью необязательной маски можно ограничить процессоры,
доступные для автоматической привязки:
worker_cpu_affinity auto 01010101;
Директива доступна только на FreeBSD и Linux.
Синтаксис: |
worker_priority
|
---|---|
Умолчание: |
worker_priority 0; |
Контекст: |
main
|
Задаёт приоритет планирования рабочих процессов подобно тому,
как это делается командой nice
: отрицательное
число
означает более высокий приоритет.
Диапазон возможных значений, как правило, варьируется от -20 до 20.
Пример использования:
worker_priority -10;
Синтаксис: |
worker_processes
|
---|---|
Умолчание: |
worker_processes 1; |
Контекст: |
main
|
Задаёт число рабочих процессов.
Оптимальное значение зависит от множества факторов, включая
(но не ограничиваясь ими) число процессорных ядер, число
жёстких дисков с данными и картину нагрузок.
Если затрудняетесь в выборе правильного значения, можно начать
с установки его равным числу процессорных ядер
(значение “auto
” пытается определить его
автоматически).
Параметр
auto
поддерживается только начиная
с версий 1.3.8 и 1.2.5.
Синтаксис: |
worker_rlimit_core
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Изменяет ограничение на наибольший размер core-файла
(RLIMIT_CORE
) для рабочих процессов.
Используется для увеличения ограничения без перезапуска главного процесса.
Синтаксис: |
worker_rlimit_nofile
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Изменяет ограничение на максимальное число открытых файлов
(RLIMIT_NOFILE
) для рабочих процессов.
Используется для увеличения ограничения без перезапуска главного процесса.
Синтаксис: |
worker_shutdown_timeout
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Эта директива появилась в версии 1.11.11.
Задаёт таймаут в секундах для плавного завершения рабочих процессов.
По истечении указанного времени
nginx попытается закрыть все открытые соединения
для ускорения завершения.
Синтаксис: |
working_directory
|
---|---|
Умолчание: |
— |
Контекст: |
main
|
Задаёт каталог, который будет текущим для рабочего процесса.
Основное применение — запись core-файла, в этом случае рабочий
процесс должен иметь права на запись в этот каталог.
NGINX logging is often overlooked as a critical part of the web service and is commonly referenced only when an error occurs. But it’s important to understand from the beginning how to configure NGINX logs and what information is considered most important.
Within NGINX there are two types of logs available, the error log and the access log. How then would you configure the error and access logs and in what format should be used? Read on to learn all about how NGINX logging works!
Prerequisites
To follow along with this tutorial, it is necessary to have a recent working NGINX installation, ideally version 1.21.1 or higher. In this tutorial, Ubuntu is used to host the NGINX installation. To view formatted JSON log file output in the terminal you may want to install the jq
utility
Learning the NGINX Logging System
The NGINX logging system has quite a few moving parts. Logging is made up of log formats (how logs are stored) and an NGNIX configuration file (nginx.conf) to enable and tune how logs are generated.
First, let’s cover the NGINX configuration file. An NGNIX configuration file defines a hierarchy of sections that are referred to as contexts within the NGINX documentation. These contexts are made up of a combination of the following, although not all available contexts are listed below.
- The “main” context is the root of the nginx.conf file
- The
http
context - Multiple
server
contexts - Multiple
location
contexts
Inside one or more of these contexts is where you can define access_log
and error_log
configuration items, or directives. A logging directive defines how NGINX is supposed to record logs under each context.
NGINX Logs Logging Directive Structure
Logging directives are defined under each context with the log name, the location to store the log, and the level of log data to store.
<log name> <log location> <logging level>;
- Log Location – You can store logs in three different areas; a file e.g.
/var/log/nginx/error.log
, syslog e.g.syslog:server=unix:/var/log/nginx.sock
or cyclic memory buffer e.g.memory:32m
. - Logging Levels – The available levels are
debug
,info
,notice
,warn
,error
,crit
,alert
, oremerg
with the default beingerror
. Thedebug
level may not be available unless NGINX was compiled with the--with-debug
flag.
Allowed Logging Directive Contexts
Both the error_log
and access_log
directives are allowed in only certain contexts. error_log
is allowed in the main
, http
, mail
, stream
, server
, and location
contexts. While the access_log
directive is allowed in http
, server
, location
, if
in location
, and limit_exept
contexts.
Logging directives override higher-up directives. For example, the
error_log
directive specified in alocation
context will override the same directive specified in thehttp
context.
You can see an example configuration below that contains various defined directives below.
# Log to a file on disk with all errors of the level warn and higher
error_log /var/log/nginx/error.log warn;
http {
access_log /var/log/nginx/access.log combined;
server {
access_log /var/log/nginx/domain1.access.log combined;
location {
# Log to a local syslog server as a local7 facility, tagged as nginx, and with the level of notice and higher
error_log syslog:server=unix:/var/log/nginx.sock,facility=local7,tag=nginx notice;
}
}
server {
access_log /var/log/nginx/domain2.access.log combined;
location {
# Log all info and higher error messages directly into memory, but max out at 32 Mb
error_log memory:32m info;
}
}
}
Log Formats and the Access Log Directive
Beyond just NGINX error logs, each access request to NGINX is logged. An access request could be anything from requesting a web page to a specific image. As you might surmise, there is a lot of data that can be included in the logged requests.
To record general NGINX request activity, NGNIX relies on access logs using the access_log
directive. Unlike the error_log
directive which has a standard format, you can configure NGINX access logs to store in a particular format.
The Default access_log
Log Format
NGNIX can record access log data in many different ways through log formats. By default, that log format is called combined. When you don’t specify a log format in the NGINX configuration file, NGNIX will log all requested according to the following schema.
'$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"'
Below, you can see an example of the combined format in practice.
127.0.0.1 - - [10/Oct/2020:15:10:20 -0600] "HEAD / HTTP/1.1" 200 0 "<https://example.com>" "Mozilla/5.0..."
Defining Custom access_log
Formats via log_format
Directive
The default combined NGINX log format may work perfectly well for your needs, but what if you would like to add additional data, such as upstream service information, or use this in JSON format instead? You’ll need to define a custom log format using the log_format
directive.
The log_format
directive allows you to define multiple different access_log
formats to be used across the various contexts in the configuration file.
An example of defining a log format is below which specifies many different fields and variables. This example defines a JSON logging format, you may choose to display various fields.
Check out all available variables via the NGINX documentation.
The json
text displayed after the log_format
directive is merely the name that is referenced by any access_log
directive that wishes to use this format. By using log_format
, multiple logging output formats may be defined and used by any combination of access_log
directives throughout the NGINX configuration file.
log_format json escape=json '{ "time": "$time_iso8601", '
'"remote_addr": "$remote_addr", '
'"remote_user": "$remote_user", '
'"ssl_protocol_cipher": "$ssl_protocol/$ssl_cipher", '
'"body_bytes_sent": "$body_bytes_sent", '
'"request_time": "$request_time", '
'"status": "$status", '
'"request": "$request", '
'"request_method": "$request_method", '
'"http_referrer": "$http_referer", '
'"http_x_forwarded_for": "$http_x_forwarded_for", '
'"http_cf_ray": "$http_cf_ray", '
'"host": "$host", '
'"server_name": "$server_name", '
'"upstream_address": "$upstream_addr", '
'"upstream_status": "$upstream_status", '
'"upstream_response_time": "$upstream_response_time", '
'"upstream_response_length": "$upstream_response_length", '
'"upstream_cache_status": "$upstream_cache_status", '
'"http_user_agent": "$http_user_agent" }';
The
log_format
may only be used in thehttp
context, but referenced by anyaccess_log
directive regardless of location.
Escaping Log Output
When you define log format via JSON, for example, you’ll sometimes need to escape variables defined in JSON to be treated as literal elements in the NGNIX configuration file. To do that, you can use various escape
formats such as default
, json
, and none
. If the escape
command is omitted, the default
format is used.
default
– Double-quotes, “”, and all characters with ISO values less than 32 and greater than 126 will be escaped as “x##”. If no variable value is found, then a hyphen (-
) will be logged.json
– All disallowed characters in the JSON string format will be escaped.none
– All escaping of values is disabled.
You’ll see a great example of NGNIX escaping all JSON variables in the example above using the json
escape
format (escape=json
).
Configuring access_log
Directives
For NGNIX to become recording access activity using the fancy log format you defined earlier, you must enable it using the access_log
directive
Once you’ve defined the log format, you must enable the log inside of the NGINX configuration file much like the error_log
directive.
An example of a typical access_log
directive is shown below where it sends access logs in the json
log_format
, as previously defined, and to a file (/var/log/nginx/access.log
). Then the special off
parameter disables access logging in a specific context where the directive is included.
access_log /var/log/nginx/domain.access.log json;
access_log off;
Perhaps you have defined an access_log
for a domain. How would you go about seeing the output from the below directive?
access_log /var/log/nginx/domain.access.log json;
To demonstrate NGINX sending log output as defined by the access_log
directive, first run the Linux cat
command to grab the file contents and pipe the output to the tail
command to show only a single line. Then finally, pass the single line to the jq
utility to nicely format the JSON output.
cat /var/log/nginx/domain.access.log | tail -n 1 | jq
Like the
error_log
, both thememory
andsyslog
formats work in addition to the standard file output.
Configuring NGINX to Buffer Disk Writes
Since there is typically far more information output from access logging than error logging, additional abilities for compression and buffering of the log data to disk writes are included, but enabled by default. To avoid constant disk writes and potential request blocking of the webserver while waiting for disk IO, tell NGINX to buffer disk writes.
An example of an access_log
directive defining the gzip
, buffer
, and flush
parameters is shown below.
access_log /var/log/nginx/domain.access.log gzip=7 buffer=64k flush=3m;
buffer
– A buffer temporarily stores data, before sending it elsewhere. The default buffer size is64k
which you can redefine by specifying a size along with the directive, i.e.buffer=32k
instead of justbuffer
.gzip
– Defines a level of GZIP compression to use from1
to9
, with 9 being the slowest but highest level of compression. For example,gzip
defaults to1
but you will set (gzip=9
) the compression to the highest.
If you use
gzip
but notbuffer
, you’ll buffer the writes by default. Since the nature of GZIP compression means log entries cannot be streamed to disk, disk buffering is required.
flush
– To avoid holding on to in-memory logs indefinitely for infrequently accessed sites, you’ll specify aflush
time to write any logging data to disk after that time threshold is met. For example, withflush=5m
you force all logged data to be written to disk, even if the buffer has not filled.
Logging Access Entries Conditionally
There are times when you will only want to log a particular access request. For example, instead of logging all requests including HTTP/200 (successful requests), perhaps you’d like to only log HTTP/404 (file not found requests). If so, you can define a logging condition in the access_log
directive using the if
parameter.
The if=
parameter of the access_log
directive looks for values passed in by the associated variable that are not “0” or an empty string to continue with logging.
As an example, perhaps you’d like to force NGNIX to only log only HTTP access requests starting with a 4 for the HTTP code.
In the NGINX configuration file:
Define a map
directive to assign a variable with the value of either 0
or 1
depending n the evaluated condition. The first regular expression looks for all HTTP statuses that do not start with 4
. The default
condition is the fallback for all values that do not meet that requirement.
map $status $logged {
~^[1235] 0;
default 1;
}
The
map
directive must be defined at thehttp
context level. You may use themap
directive output variable, shown below as$logged
, further in the configuration file and not confined to thehttp
context level.
Once you have defined the map
directive which will assign a value of 0 or 1 to the $logged
variable, you can then use this variable in conditions as shown below. Here, using the if
parameter, you’re telling NGINX to only log activity to the access_404.log file if it sees a request starting with 4.
access_log /var/log/nginx/access_404.log json if=$logged;
Conclusion
Now that you know how to log errors and access requests in a variety of ways, you can start monitoring your NGINX installation for issues and also for user-facing problems.
What’s next? Try taking the results of the logs and ingesting those into a SIEM application for analysis!
Nginx — это высокопроизводительный HTTP- сервер с открытым исходным кодом и обратный прокси-сервер, отвечающий за обработку нагрузки некоторых из крупнейших сайтов в Интернете. При управлении веб-серверами NGINX одной из наиболее частых задач, которые вы будете выполнять, является проверка файлов журналов.
Знание того, как настраивать и читать журналы, очень полезно при устранении неполадок сервера или приложений, поскольку они предоставляют подробную информацию об отладке.
Nginx записывает свои события в журналы двух типов: журналы доступа и журналы ошибок. Журналы доступа записывают информацию о клиентских запросах, а журналы ошибок записывают информацию о проблемах сервера и приложений.
В этой статье рассказывается, как настроить и прочитать журналы доступа и ошибок Nginx.
Настройка журнала доступа
Каждый раз, когда клиентский запрос обрабатывается, Nginx генерирует новое событие в журнале доступа. Каждая запись события содержит отметку времени и включает различную информацию о клиенте и запрошенном ресурсе. Журналы доступа могут показать вам местоположение посетителей, страницу, которую они посещают, сколько времени они проводят на странице и многое другое.
Директива log_format
позволяет вам определять формат регистрируемых сообщений. Директива access_log
включает и устанавливает расположение файла журнала и используемый формат.
Самый простой синтаксис директивы access_log
следующий:
access_log log_file log_format;
Где log_file
— это полный путь к файлу журнала, а log_format
— формат, используемый файлом журнала.
Журнал доступа можно включить в блоке http
, server
или location
.
По умолчанию журнал доступа глобально включен в директиве http
в основном файле конфигурации Nginx.
/etc/nginx/nginx.conf
http {
...
access_log /var/log/nginx/access.log;
...
}
Для удобства чтения рекомендуется создавать отдельный файл журнала доступа для каждого серверного блока. Директива access_log
установленная в директиве server
access_log
директиву, установленную в директиве http
(более высокого уровня).
/etc/nginx/conf.d/domain.com.conf
http {
...
access_log /var/log/nginx/access.log;
...
server {
server_name domain.com
access_log /var/log/nginx/domain.access.log;
...
}
}
Если формат журнала не указан, Nginx использует предопределенный комбинированный формат, который выглядит следующим образом:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
Чтобы изменить формат ведения журнала, отмените настройку по умолчанию или определите новую. Например, чтобы определить новый формат ведения журнала с именем custom, который расширит комбинированный формат значением, показывающим заголовок X-Forwarded-For
добавьте следующее определение в директиву http
или server
:
log_format custom '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
Чтобы использовать новый формат, укажите его имя после файла журнала, как показано ниже:
access_log /var/log/nginx/access.log custom;
Хотя журнал доступа предоставляет очень полезную информацию, он занимает дисковое пространство и может повлиять на производительность сервера. Если на вашем сервере мало ресурсов и у вас загруженный веб-сайт, вы можете отключить журнал доступа. Чтобы сделать это, установите значение access_log
директиву off
:
Настройка журнала ошибок
Nginx записывает сообщения об ошибках приложения и общих ошибках сервера в файл журнала ошибок. Если вы испытываете ошибки в своем веб-приложении, журнал ошибок — это первое место, с которого можно начать поиск и устранение неисправностей.
Директива error_log
включает и устанавливает расположение и уровень серьезности журнала ошибок. Он имеет следующую форму и может быть установлен в блоке http
, server
или location
:
error_log log_file log_level
Параметр log_level
устанавливает уровень ведения журнала. Ниже перечислены уровни в порядке их серьезности (от низкого до высокого):
debug
—debug
сообщения.-
info
— Информационные сообщения. -
notice
— Уведомления. -
warn
— Предупреждения. -
error
— Ошибки при обработке запроса. -
crit
— Критические проблемы. Требуется быстрое действие. -
alert
— Оповещения. Действия должны быть предприняты немедленно. -
emerg
— Чрезвычайная ситуация. Система находится в непригодном для использования состоянии.
Каждый уровень журнала включает в себя более высокие уровни. Например, если вы установите уровень журнала , чтобы warn
, Nginx будет также регистрировать error
, crit
, alert
и emerg
сообщения.
Если параметр log_level
не указан, по умолчанию используется error
.
По умолчанию директива error_log
определена в директиве http
внутри основного файла nginx.conf:
/etc/nginx/nginx.conf
http {
...
error_log /var/log/nginx/error.log;
...
}
Как и в случае с журналами доступа, рекомендуется создать отдельный файл журнала ошибок для каждого блока сервера, который переопределяет настройку, унаследованную от более высоких уровней.
Например, чтобы настроить журнал ошибок domain.com на warn
вы должны использовать:
http {
...
error_log /var/log/nginx/error.log;
...
server {
server_name domain.com
error_log /var/log/nginx/domain.error.log warn;
...
}
}
Каждый раз, когда вы изменяете файл конфигурации, вам необходимо перезапустить службу Nginx, чтобы изменения вступили в силу.
Расположение файлов журнала
По умолчанию в большинстве дистрибутивов Linux, таких как Ubuntu , CentOS и Debian , журналы доступа и ошибок расположены в каталоге /var/log/nginx
.
Чтение и понимание файлов журнала Nginx
Вы можете открывать и анализировать файлы журнала, используя стандартные команды, такие как cat
, less
, grep
, cut
, awk
и т. Д.
Вот пример записи из файла журнала доступа, в котором используется стандартный формат журнала Nginx для объединения:
192.168.33.1 - - [15/Oct/2019:19:41:46 +0000] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
Давайте разберемся, что означает каждое поле записи:
$remote_addr
—192.168.33.1
— IP-адрес клиента, выполняющего запрос.-
$remote_user
—-
— Пользователь,$remote_user
аутентификацию по HTTP. Если имя пользователя не задано, в этом поле отображается-
. -
[$time_local]
—[15/Oct/2019:19:41:46 +0000]
— Время на локальном сервере. -
"$request"
—"GET / HTTP/1.1"
— тип запроса, путь и протокол. -
$status
—200
— Код ответа сервера. -
$body_bytes_sent
—396
— Размер ответа сервера в байтах. -
"$http_referer"
—"-"
— URL перехода. -
"$http_user_agent"
—Mozilla/5.0 ...
— Пользовательский агент клиента (веб-браузер).
Используйте команду tail
для просмотра файла журнала в режиме реального времени:
tail -f access.log
Выводы
Файлы журналов содержат полезную информацию о проблемах с сервером и о том, как посетители взаимодействуют с вашим сайтом.
Nginx позволяет настроить журналы доступа и ошибок в соответствии с вашими потребностями.
Если у вас есть какие-либо вопросы или отзывы, не стесняйтесь оставлять комментарии.
NGINX is one of the most widely used reverse proxy servers, web servers, and load balancers. It has capabilities like TLS offloading, can do health checks for backends, and offers support for HTTP2, gRPC, WebSocket, and most TCP-based protocols.
When running a tool like NGINX, which generally sits in front of your applications, it’s important to understand how to debug issues. And because you need to see the logs, you have to understand the different NGINX logging mechanisms. In addition to the errors in your application or web server, you need to look into NGINX performance issues, as they can lead to SLA breaches, negative user experience, and more.
In this article, we’ll explore the types of logs that NGINX provides and how to properly configure them to make troubleshooting easier.
What Are NGINX Logs?
NGINX logs are the files that contain information related to the tasks performed by the NGINX server, such as who tried to access which resources and whether there were any errors or issues that occured.
NGINX provides two types of logs: access logs and error logs. Before we show you how to configure them, let’s look at the possible log types and different log levels.
Here is the most basic NGINX configuration:
http{ server { listen 80; server_name example.com www.example.com; access_log /var/log/nginx/access.log combined; root /var/www/virtual/big.server.com/htdocs; } }
For this server, we opened port 80. The server name is “example.com www.example.com.” You can see the access and error log configurations, as well as the root of the directive, which defines from where to serve the files.
What Are NGINX Access Logs?
NGINX access logs are files that have the information of all the resources that a client is accessing on the NGINX server, such as details about what is being accessed and how it responded to the requests, including client IP address, response status code, user agent, and more. All requests sent to NGINX are logged into NGINX logs just after the requests are processed.
Here are some important NGINX access log fields you should be aware of:
- remote_addr: The IP address of the client that requested the resource
- http_user_agent: The user agent in use that sent the request
- time_local: The local time zone of the server
- request: What resource was requested by the client (an API path or any file)
- status: The status code of the response
- body_bytes_sent: The size of the response in bytes
- request_time: The total time spent processing the request
- remote_user: Information about the user making the request
- http_referer: The IP address of the HTTP referer
- gzip_ratio: The compression ratio of gzip, if gzip is enabled
NGINX Access Log Location
You can find the access logs in the logs/access.log file and change their location by using the access_log directive in the NGINX configuration file.
access_log path [format [buffer=size] [gzip[=level]] [flush=time] [if=condition]]; access_log /var/log/nginx/access.log combined
By changing the path field in the access_log directive, you can also change where you want to save your access logs.
An NGINX access log configuration can be overridden by another configuration at a lower level. For example:
http { access_log /var/log/nginx/access.log main; server { listen 8000; location /health { access_log off; # <----- this WILL work proxy_pass http://app1server; } } }
Here, any calls to /health will not be logged, as the access logs are disabled for this path. All the other calls will be logged to the access log. There is a global config, as well as different local configs. The same goes for the other configurations that are in the NGINX config files.
How to Enable NGINX Access Logs
Most of the time, NGINX access logs are enabled by default. To enable them manually, you can use the access_log directive as follows:
access_log /var/log/nginx/access.log combined
The first parameter is the location of the file, and the second is the log format. If you put the access_log directive in any of the server directories, it will start the access logging.
Setting Up NGINX Custom Log Format
To easily predefine the NGINX access log format and use it along with the access_log directive, use the log_format directive:
log_format upstream_time '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"' 'rt=$request_time uct="$upstream_connect_time" uht="$upstream_header_time" urt="$upstream_response_time"';
Most of the fields here are self explanatory, but if you want to learn more, look up NGINX configurations for logging. You can specify the log formats in an HTTP context in the /etc/nginx/nginx.conf file and then use them in a server context.
By default, NGINX access logs are written in a combined format, which looks something like this:
log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';
Once you have defined the log formats, you can use them with the access_log directive, like in the following examples:
server { access_log /var/log/nginx/access.log combined access_log /var/log/nginx/access.log upstream_time #defined in the first format … }
Formatting the Logs as JSON
Logging to JSON is useful when you want to ship the NGINX logs, as JSON makes log parsing very easy. Since you have key-value information, it will be simpler for the consumer to understand. Otherwise, the parse has to understand the format NGINX is logging.
NGINX 1.11.8 comes with an escape=json setting, which helps you define the NGINX JSON log format. For example:
log_format json_combined escape=json '{' '"time_local":"$time_local",' '"remote_addr":"$remote_addr",' '"remote_user":"$remote_user",' '"request":"$request",' '"status": "$status",' '"body_bytes_sent":"$body_bytes_sent",' '"http_referrer":"$http_referer",' '"http_user_agent":"$http_user_agent",' '"request_time":"$request_time"' '}';
You can now use this predefined log format in JSON with the access_log directive to get the logs in JSON.
You can also use an open-source NGINX module, like https://github.com/jiaz/nginx-http-json-log, to do the JSON logging.
Configuring NGINX Conditional Logging
Sometimes, you want to write logs only when a certain condition is met. NGINX calls this conditional logging. For example:
map $remote_addr $log_enable { "192.168.4.1" 0; "192.168.4.2" 0; "192.168.4.3" 0; "192.168.4.4" 0; default 1; } access_log /var/log/nginx/access.log combined if=$log_enable
This means that whenever the request comes from the IPs 192.168.4.1 to 192.168.4.4, the access logs will not be populated. For every other IP, the logs will be recorded.
You can use conditional logging with NGINX in multiple scenarios. For example, if you are under attack and can identify the IPs of the attacker, you can log the requests to a different file. This allows you to process the file and get relevant information about the attack later.
How to View NGINX Access Logs
Linux utilities, like LESS or TAIL, allow you to view NGINX logs easily. You can also see the NGINX access logs’ location from the configuration files. With newer systems that are running systemd, the journalctl feature can tail the logs. To see the logs, use this command:
journalctl -fu nginx.service
You can also tail the log locations, as shown here:
tail -f /var/log/nginx/access.log
It’s also possible to use journalctl, but this will show all the logs together, which can be a bit confusing.
How to Disable Access Logs
To disable an NGINX access log, pass the off argument to the access_log directive:
access_log off;
This can be useful when there are too many logs, which can overload the disk IO and, in rare cases, impact the performance of your NGINX server. However, disabling NGINX access logs is not usually recommended, as it can make troubleshooting difficult.
What Are NGINX Error Logs?
NGINX error logs are the files where all information about errors will be logged, including permission errors or any NGINX configuration-related access errors. While access logs are used to see the HTTP requests received by the server, error logs bring more value, as when there is an issue, they will show exactly what happened and provide detailed information about the issue.
Whenever there is an error with the requests, or when there are NGINX glitches, these issues will be recorded in the error log files configured in the NGINX configuration file.
Where Are the NGINX Error Logs Stored?
The location of NGINX error logs can be configured in the error_log directive in the NGINX configuration. By default, these logs are in the /var/log/nginx directory. You can configure the location separately for different server components that you can run in the NGINX configuration.
The default location is:
/var/log/nginx/error.log
NGINX Error Logs Configuration
NGINX error logs configuration is in the same place as access_log. You can use the error_log directive to enable and configure the log levels and the location of the log file. Here is the configuration line to enable the error_log:
error_log log_file_location log_level;
NGINX Error Log Levels
NGINX has eight log levels for different degrees of severity and verbosity:
- emerg: These are the emergency logs. They mean that the system is unusable.
- alert: An immediate action is required.
- crit: A critical condition occurred.
- error: An error or failure occurred while processing a request.
- warn: There was an unexpected event, or something needs to be fixed, but NGINX fulfilled the request as expected.
- notice: Something normal, but important, has happened, and it needs to be noted.
- info: These are messages that give you information about the process.
- debug: These are messages that help with debugging and troubleshooting. They are generally not enabled unless needed because they create a lot of noise.
Note that the log_level parameter is a threshold, as every log level includes the previous log levels as well. For example, if your log level is 6 (notice), your logs will contain entries from levels 1 through 6.
Enable Debug Logging and Other Levels
You can specify the log level with the error_log directive using the log_level argument. As the log level number increases, the logs will contain more information. If the application misbehaves, you can enable the debug logs to aid you in the troubleshooting process. With the extra information they provide, you will be able to pinpoint the issue more easily. You can read about this more in the NGINX documentation.
Keeping NGINX debug logs enabled continuously is not recommended, as it will make logs very noisy and large by printing information that is generally unnecessary. If you see an issue, you can change the log level on the fly, solve the problem, then revert it back to a stricter severity.
Logging to Multiple Files
You can forward NGINX error logs to separate files based on the different log levels. In the configuration below, you send logs to all the specified log directives based on the log severity level.
error_log /var/log/nginx/error.info info; error_log /var/log/nginx/error.crit crit;
This configuration can be very useful when looking at the different log levels separately or if you want your logging agent to label these logs based on filenames. You can selectively discard the error logs based on their severity.
How to Check NGINX Error Logs
You can view NGINX error logs the same way as access logs: for example, by using TAIL, LESS, or other utilities. Below is an example of how to do it with TAIL using the location of the error_logs that you have set. These logs are also present in journalctl logs, but there, they will be a combination of access_log and error_logs.
tail -f /var/log/nginx/error.log
How to Disable Error Logs
Disabling NGINX error logs can be tricky, as there is no off option in error_log. Similar to access_log in the lower configuration levels, you can use error_log false at the higher level configurations.
error_log off;
For the lower levels, you can forward the logs to /dev/null:
error_log /dev/null;
How to Send NGINX Logs to Syslog
NGINX can also ship your logs to log aggregators using syslog. This can be useful when you are logging other system/service logs in syslog or using syslog to export the logs. You can implement this with the syslog: prefix, which can be used with both access_log and error_logs. You can also use this prefix instead of the file path in the access_log and error_log directives.
Syslog can help you concentrate your NGINX logs in one place by forwarding them to a centralized logging solution:
error_log syslog:unix/var/log/nginx.sock debug
You can also send the logs to different syslog servers by defining the syslog server parameter to point to the IP or hostname and port of the syslog server.
error_log syslog:server=192.168.100.1 debug access_log syslog:server=[127.0.0.1]:9992, facility=local1,tag=nginx,severity=debug;
In the above configuration for access_log, the logs are forwarded to the local syslog server, with the service name as local1, since syslog doesn’t have an option for NGINX.
Syslog has various options for keeping the forwarded logs segregated:
- Facility: Identifies who is logging to syslog.
- Severity: Specifies the log levels.
- Tag: Identifies the message sender or any other information that you want to send; default is NGINX.
NGINX Logging in Kubernetes Environments
In Kubernetes, NGINX Ingress runs as a pod. All the logs for the NGINX Ingress pods are sent to standard output and error logs. However, if you want to see the logs, you have to log in to the pod or use the kubectl commands, which is not a very practical solution.
You also have to find a way to ship the logs from the containers. You can do this with any logging agent that is running in the Kubernetes environment. These agents run as pods and mount the file system that NGINX runs on, reading the logs from there.
How to See the NGINX Ingress Logs
Use the kubectl logs command to see the NGINX logs as streams:
$ kubectl logs -f nginx-ingress-pod-name -n namespace.
It’s important to understand that pods can come and go, so the approach to debugging issues in the Kubernetes environment is a bit different than in VM or baremetal-based environments. In Kubernetes, the logging agent should be able to discover the NGINX Ingress pods, then scrape the logs from there. Also, the log aggregator should show the logs of the pods that were killed and discover any new pod that comes online.
NGINX Logging and Analysis with Sematext
NGINX log integration with Sematext
Sematext Logs is a log aggregation and management tool with great support for NGINX logs. Its auto-discovery feature is helpful, particularly when you have multiple machines. Simply create an account with Sematext, create the NGINX Logs App and install the Sematext Agent. Once you’re set up, you get pre-built, out-of-the-box dashboards and the option to build your own custom dashboards.
Sematext Logs is part of Sematext Cloud, a full-stack monitoring solution that gives you all you need when it comes to observability. By correlating NGINX logs and metrics, you’ll get a more holistic view of your infrastructure, which helps you identify and solve issues quickly.
Using anomaly-detection algorithms, Sematext Cloud informs you in advance of any potential issues. These insights into your infrastructure help you prevent issues and troubleshoot more efficiently. With Sematext Cloud, you can also collect logs and metrics from a wide variety of tools, including HAProxy, Apache Tomcat, JVM, and Kubernetes. By integrating with other components of your infrastructure, this tool is a one-stop solution for all your logging and monitoring needs.
If you’d like to learn more about Sematext Logs, and how they can help you manage your NGINX logs, then check out this short video below:
If you’re interested in how Sematext compares to other log management tools, read our review of the top NGINX log analyzers.
Conclusion
Managing, troubleshooting, and debugging large-scale NGINX infrastructures can be challenging, especially if you don’t have a proper way of looking into logs and metrics. It’s important to understand NGINX access and error logs, but if you have hundreds of machines, this will take a substantial amount of time. You need to be able to see the logs aggregated in one place.
Performance issues are also more common than you think. For example, you may not see anything in the error logs, but your APIs continue to degrade. To look into this properly, you need effective dashboarding around NGINX performance metrics, like response code and response time.
Sematext Logs can help you tackle these problems so you can troubleshoot more quickly. Sign up for our free trial today.
Author Bio
Gaurav Yadav
Gaurav has been involved with systems and infrastructure for almost 6 years now. He has expertise in designing underlying infrastructure and observability for large-scale software. He has worked on Docker, Kubernetes, Prometheus, Mesos, Marathon, Redis, Chef, and many more infrastructure tools. He is currently working on Kubernetes operators for running and monitoring stateful services on Kubernetes. He also likes to write about and guide people in DevOps and SRE space through his initiatives Learnsteps and Letusdevops.
(britespanbuildings)
In this tutorial, you will learn everything you need to know about logging in
NGINX and how it can help you troubleshoot and quickly resolve any problem you
may encounter on your web server. We will discuss where the logs are stored and
how to access them, how to customize their format, and how to centralize them in
one place with Syslog or a log management service.
Here’s an outline of what you will learn by following through with this tutorial:
- Where NGINX logs are stored and how to access them.
- How to customize the NGINX log format and storage location to fit your needs.
- How to utilize a structured format (such as JSON) for your NGINX logs.
- How to centralize NGINX logs through Syslog or a managed cloud-based service.
Prerequisites
To follow through with this tutorial, you need the following:
- A Linux server that includes a non-root user with
sudo
privileges. We tested
the commands shown in this guide on an Ubuntu 20.04 server. - The
NGINX web server installed
and enabled on your server.
🔭 Want to centralize and monitor your NGINX logs?
Head over to Logtail and start ingesting your logs in 5 minutes.
Step 1 — Locating the NGINX log files
NGINX writes logs of all its events in two different log files:
- Access log: this file contains information about incoming requests and
user visits. - Error log: this file contains information about errors encountered while
processing requests, or other diagnostic messages about the web server.
The location of both log files is dependent on the host operating system of the
NGINX web server and the mode of installation. On most Linux distributions, both
files will be found in the /var/log/nginx/
directory as access.log
and
error.log
, respectively.
A typical access log entry might look like the one shown below. It describes an
HTTP GET request to the server for a favicon.ico
file.
Output
217.138.222.101 - - [11/Feb/2022:13:22:11 +0000] "GET /favicon.ico HTTP/1.1" 404 3650 "http://135.181.110.245/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36" "-"
Similarly, an error log entry might look like the one below, which was generated
due to the inability of the server to locate the favicon.ico
file that was
requested above.
Output
2022/02/11 13:12:24 [error] 37839#37839: *7 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 113.31.102.176, server: _, request: "GET /favicon.ico HTTP/1.1", host: "192.168.110.245:80"
In the next section, you’ll see how to view both NGINX log files from the
command line.
Step 2 — Viewing the NGINX log files
Examining the NGINX logs can be done in a variety of ways. One of the most
common methods involves using the tail
command to view logs entries in
real-time:
sudo tail -f /var/log/nginx/access.log
You will observe the following output:
Output
107.189.10.196 - - [14/Feb/2022:03:48:55 +0000] "POST /HNAP1/ HTTP/1.1" 404 134 "-" "Mozila/5.0"
35.162.122.225 - - [14/Feb/2022:04:11:57 +0000] "GET /.env HTTP/1.1" 404 162 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0"
45.61.172.7 - - [14/Feb/2022:04:16:54 +0000] "GET /.env HTTP/1.1" 404 197 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
45.61.172.7 - - [14/Feb/2022:04:16:55 +0000] "POST / HTTP/1.1" 405 568 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
45.137.21.134 - - [14/Feb/2022:04:18:57 +0000] "GET /dispatch.asp HTTP/1.1" 404 134 "-" "Mozilla/5.0 (iPad; CPU OS 7_1_2 like Mac OS X; en-US) AppleWebKit/531.5.2 (KHTML, like Gecko) Version/4.0.5 Mobile/8B116 Safari/6531.5.2"
23.95.100.141 - - [14/Feb/2022:04:42:23 +0000] "HEAD / HTTP/1.0" 200 0 "-" "-"
217.138.222.101 - - [14/Feb/2022:07:38:40 +0000] "GET /icons/ubuntu-logo.png HTTP/1.1" 404 197 "http://168.119.119.25/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
217.138.222.101 - - [14/Feb/2022:07:38:42 +0000] "GET /favicon.ico HTTP/1.1" 404 197 "http://168.119.119.25/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
217.138.222.101 - - [14/Feb/2022:07:44:02 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
217.138.222.101 - - [14/Feb/2022:07:44:02 +0000] "GET /icons/ubuntu-logo.png HTTP/1.1" 404 197 "http://168.119.119.25/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
The tail
command prints the last 10 lines from the selected file. The -f
option causes it to continue displaying subsequent lines that are added to the
file in real-time.
To examine the entire contents of an NGINX log file, you can use the cat
command or open it in your text editor:
sudo cat /var/log/nginx/error.log
If you want to filter the lines that contain a specific term, you can use the
grep
command as shown below:
sudo grep "GET /favicon.ico" /var/log/nginx/access.log
The command above will print all the lines that contain GET /favicon.ico
so we
can see how many requests were made for that resource.
Step 3 — Configuring NGINX access logs
The NGINX access log stores data about incoming client requests to the server
which is beneficial when deciphering what users are doing in the application,
and what resources are being requested. In this section, you will learn how to
configure what data is stored in the access log.
One thing to keep in mind while following through with the instructions below is
that you’ll need to restart the nginx
service after modifying the config file
so that the changes can take effect.
sudo systemctl restart nginx
Enabling the access log
The NGINX access Log should be enabled by default. However, if this is not the
case, you can enable it manually in the Nginx configuration file
(/etc/nginx/nginx.conf
) using the access_log
directive within the http
block.
Output
http {
access_log /var/log/nginx/access.log;
}
This directive is also applicable in the server
and location
configuration
blocks for a specific website:
Output
server {
access_log /var/log/nginx/app1.access.log;
location /app2 {
access_log /var/log/nginx/app2.access.log;
}
}
Disabling the access log
In cases where you’d like to disable the NGINX access log, you can use the
special off
value:
You can also disable the access log on a virtual server or specific URIs by
editing its server
or location
block configuration in the
/etc/nginx/sites-available/
directory:
Output
server {
listen 80;
access_log off;
location ~* .(woff|jpg|jpeg|png|gif|ico|css|js)$ {
access_log off;
}
}
Logging to multiple access log files
If you’d like to duplicate the access log entries in separate files, you can do
so by repeating the access_log
directive in the main config file or in a
server
block as shown below:
Output
access_log /var/log/nginx/access.log;
access_log /var/log/nginx/combined.log;
Don’t forget to restart the nginx
service afterward:
sudo systemctl restart nginx
Explanation of the default access log format
The access log entries produced using the default configuration will look like
this:
Output
127.0.0.1 alice Alice [07/May/2021:10:44:53 +0200] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4531.93 Safari/537.36"
Here’s a breakdown of the log message above:
127.0.0.1
: the IP address of the client that made the request.alice
: remote log name (name used to log in a user).Alice
: remote username (username of logged-in user).[07/May/2021:10:44:53 +0200]
: date and time of the request."GET / HTTP/1.1"
: request method, path and protocol.200
: the HTTP response code.396
: the size of the response in bytes."-"
: the IP address of the referrer (-
is used when the it is not
available)."Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4531.93 Safari/537.36"
—
detailed user agent information.
Step 4 — Creating a custom log format
Customizing the format of the entries in the access log can be done using the
log_format
directive, and it can be placed in the http
, server
or
location
blocks as needed. Here’s an example of what it could look like:
Output
log_format custom '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';
This yields a log entry in the following format:
Output
217.138.222.109 - - [14/Feb/2022:10:38:35 +0000] "GET /favicon.ico HTTP/1.1" 404 197 "http://192.168.100.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
The syntax for configuring an access log format is shown below. First, you need
to specify a nickname for the format that will be used as its identifier, and
then the log format string that represents the details and formatting for each
log message.
Output
log_format <nickname> '<formatting_variables>';
Here’s an explanation of each variable used in the custom
log format shown
above:
$remote_addr
: the IP address of the client$remote_user
: information about the user making the request$time_local
: the server’s date and time.$request
: actual request details like path, method, and protocol.$status
: the response code.$body_bytes_sent
: the size of the response in bytes.$http_referer
: the IP address of the HTTP referrer.$http_user_agent
: detailed user agent information.
You may also use the following variables in your custom log format
(see here for the complete list):
$upstream_connect_time
: the time spent establishing a connection with an
upstream server.$upstream_header_time
: the time between establishing a connection and
receiving the first byte of the response header from the upstream server.$upstream_response_time
: the time between establishing a connection and
receiving the last byte of the response body from the upstream server.$request_time
: the total time spent processing a request.$gzip_ratio
: ration of gzip compression (if gzip is enabled).
After you create a custom log format, you can apply it to a log file by
providing a second parameter to the access_log
directive:
Output
access_log /var/log/nginx/access.log custom;
You can use this feature to log different information in to separate log files.
Create the log formats first:
Output
log_format custom '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer"';
log_format agent "$http_user_agent";
Then, apply them as shown below:
Output
access_log /var/log/nginx/access.log custom;
access_log /var/log/nginx/agent_access.log agent;
This configuration ensures that user agent information for all incoming requests
are logged into a separate access log file.
Step 5 — Formatting your access logs as JSON
A common way to customize NGINX access logs is to format them as JSON. This is
quite straightforward to achieve by combining the log_format
directive with
the escape=json
parameter introduced in Nginx 1.11.8 to escape characters that
are not valid in JSON:
Output
log_format custom_json escape=json
'{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent"'
'}';
After applying the custom_json
format to a log file and restarting the nginx
service, you will observe log entries in the following format:
{
"time_local": "14/Feb/2022:11:25:44 +0000",
"remote_addr": "217.138.222.109",
"remote_user": "",
"request": "GET /icons/ubuntu-logo.png HTTP/1.1",
"status": "404",
"body_bytes_sent": "197",
"request_time": "0.000",
"http_referrer": "http://192.168.100.1/",
"http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
}
Step 6 — Configuring NGINX error logs
Whenever NGINX encounters an error, it stores the event data in the error log so
that it can be referred to later by a system administrator. This section will
describe how to enable and customize the error logs as you see fit.
Enabling the error log
The NGINX error log should be enabled by default. However, if this is not the
case, you can enable it manually in the relevant NGINX configuration file
(either at the http
, server
, or location
levels) using the error_log
directive.
Output
error_log /var/log/nginx/error.log;
The error_log
directive can take two parameters. The first one is the location
of the log file (as shown above), while the second one is optional and sets the
severity level of the log. Events with a lower severity level than set one will
not be logged.
Output
error_log /var/log/nginx/error.log info;
These are the possible levels of severity (from lowest to highest) and their
meaning:
debug
: messages used for debugging.info
: informational messages.notice
: a notable event occurred.warn
: something unexpected happened.error
: something failed.crit
: critical conditions.alert
: errors that require immediate action.emerg
: the system is unusable.
Disabling the error log
The NGINX error log can be disabled by setting the error_log
directive to
off
or by redirecting it to /dev/null
:
Output
error_log off;
error_log /dev/null;
Logging errors into multiple files
As is the case with access logs, you can log errors into multiple files, and you
can use different severity levels too:
Output
error_log /var/log/nginx/error.log info;
error_log /var/log/nginx/emerg_error.log emerg;
This configuration will log every event except those at the debug
level event
to the error.log
file, while emergency events are placed in a separate
emerg_error.log
file.
Step 7 — Sending NGINX logs to Syslog
Apart from logging to a file, it’s also possible to set up NGINX to transport
its logs to the syslog
service especially if you’re already using it for other
system logs. Logging to syslog
is done by specifying the syslog:
prefix to
either the access_log
or error_log
directive:
Output
error_log syslog:server=unix:/var/log/nginx.sock debug;
access_log syslog:server=[127.0.0.1]:1234,facility=local7,tag=nginx,severity=info;
Log messages are sent to a server
which can be specified in terms of a domain
name, IPv4 or IPv6 address or a UNIX-domain socket path.
In the example above, error log messages are sent to a UNIX domain socket at the
debug
logging level, while the access log is written to a syslog
server with
an IPv4 address and port 1234
. The facility=
parameter specifies the type of
program that is logging the message, the tag=
parameter applies a custom tag
to syslog
messages, and the severity=
parameter sets the severity level of
the syslog
entry for access log messages.
For more information on using Syslog to manage your logs, you can check out our
tutorial on viewing and configuring system logs on
Linux.
Step 8 — Centralizing your NGINX logs
In this section, we’ll describe how you can centralize your NGINX logs in a log
management service through Vector, a
high-performance tool for building observability pipelines. This is a crucial
step when administrating multiple servers so that you can monitor all your logs
in one place (you can also centralize your logs with an Rsyslog
server).
The following instructions assume that you’ve signed up for a free
Logtail account and retrieved your source
token. Go ahead and follow the relevant
installation instructions for Vector
for your operating system. For example, on Ubuntu, you may run the following
commands to install the Vector CLI:
curl -1sLf 'https://repositories.timber.io/public/vector/cfg/setup/bash.deb.sh' | sudo -E bash
$ sudo apt install vector
After Vector is installed, confirm that it is up and running through
systemctl
:
You should observe that it is active and running:
Output
● vector.service - Vector
Loaded: loaded (/lib/systemd/system/vector.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-02-08 10:52:59 UTC; 48s ago
Docs: https://vector.dev
Process: 18586 ExecStartPre=/usr/bin/vector validate (code=exited, status=0/SUCCESS)
Main PID: 18599 (vector)
Tasks: 3 (limit: 2275)
Memory: 6.8M
CGroup: /system.slice/vector.service
└─18599 /usr/bin/vector
Otherwise, go ahead and start it with the command below.
sudo systemctl start vector
Afterward, change into a root shell and append your Logtail vector configuration
for NGINX into the /etc/vector/vector.toml
file using the command below. Don’t
forget to replace the <your_logtail_source_token>
placeholder below with your
source token.
sudo -s
$ wget -O ->> /etc/vector/vector.toml
https://logtail.com/vector-toml/nginx/<your_logtail_source_token>
Then restart the vector
service:
sudo systemctl restart vector
You will observe that your NGINX logs will start coming through in Logtail:
Conclusion
In this tutorial, you learned about the different types of logs that the NGINX
web server keeps, where you can find them, how to understand their formatting.
We also discussed how to create your own custom log formats (including a
structured JSON format), and how to log into multiple files at once. Finally, we
demonstrated the process of sending your logs to Syslog or a log management
service so that you can monitor them all in one place.
Thanks for reading, and happy logging!
Centralize all your logs into one place.
Analyze, correlate and filter logs with SQL.
Create actionable
dashboards.
Share and comment with built-in collaboration.
Got an article suggestion?
Let us know
Next article
How to Get Started with Logging in Node.js
Learn how to start logging with Node.js and go from basics to best practices in no time.
→
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NGINX Basics
Go to the Table of Contents or What’s next? section.
- ≡ NGINX Basics
- Directories and files
- Commands
- Processes
- CPU pinning
- Shutdown of worker processes
- Configuration syntax
- Comments
- End of lines
- Variables, Strings, and Quotes
- Directives, Blocks, and Contexts
- External files
- Measurement units
- Regular expressions with PCRE
- Enable syntax highlighting
- Connection processing
- Event-Driven architecture
- Multiple processes
- Simultaneous connections
- HTTP Keep-Alive connections
- sendfile, tcp_nodelay, and tcp_nopush
- Request processing stages
- Server blocks logic
- Handle incoming connections
- Matching location
- rewrite vs return
- URL redirections
- try_files directive
- if, break, and set
- root vs alias
- internal directive
- External and internal redirects
- allow and deny
- uri vs request_uri
- Compression and decompression
- What is the best NGINX compression gzip level?
- Hash tables
- Server names hash table
- Log files
- Conditional logging
- Manually log rotation
- Error log severity levels
- How to log the start time of a request?
- How to log the HTTP request body?
- NGINX upstream variables returns 2 values
- Reverse proxy
- Passing requests
- Trailing slashes
- Passing headers to the backend
- Importance of the Host header
- Redirects and X-Forwarded-Proto
- A warning about the X-Forwarded-For
- Improve extensibility with Forwarded
- Response headers
- Load balancing algorithms
- Backend parameters
- Upstream servers with SSL
- Round Robin
- Weighted Round Robin
- Least Connections
- Weighted Least Connections
- IP Hash
- Generic Hash
- Other methods
- Rate limiting
- Variables
- Directives, keys, and zones
- Burst and nodelay parameters
- NAXSI Web Application Firewall
- OWASP ModSecurity Core Rule Set (CRS)
- Core modules
- ngx_http_geo_module
- 3rd party modules
- ngx_set_misc
- ngx_http_geoip_module
Directories and files
If you compile NGINX with default parameters all files and directories are available from
/usr/local/nginx
location.
For upstream NGINX packaging paths can be as follows (it depends on the type of system/distribution):
-
/etc/nginx
— is the default configuration root for the NGINX service- other locations:
/usr/local/etc/nginx
,/usr/local/nginx/conf
- other locations:
-
/etc/nginx/nginx.conf
— is the default configuration entry point used by the NGINX services, includes the top-level http block and all other configuration contexts and files- other locations:
/usr/local/etc/nginx/nginx.conf
,/usr/local/nginx/conf/nginx.conf
- other locations:
-
/usr/share/nginx
— is the default root directory for requests, containshtml
directory and basic static files- other locations:
html/
in root directory
- other locations:
-
/var/log/nginx
— is the default log (access and error log) location for NGINX- other locations:
logs/
in root directory
- other locations:
-
/var/cache/nginx
— is the default temporary files location for NGINX- other locations:
/var/lib/nginx
- other locations:
-
/etc/nginx/conf
— contains custom/vhosts configuration files- other locations:
/etc/nginx/conf.d
,/etc/nginx/sites-enabled
(I can’t stand this debian/apache-like convention)
- other locations:
-
/var/run/nginx
— contains information about NGINX process(es)- other locations:
/usr/local/nginx/logs
,logs/
in root directory
- other locations:
See also Installation and Compile-Time Options — Files and Permissions.
Commands
🔖 Use reload option to change configurations on the fly — Base Rules — P2
nginx -h
— shows the helpnginx -v
— shows the NGINX versionnginx -V
— shows the extended information about NGINX: version, build parameters, and configuration argumentsnginx -t
— tests the NGINX configurationnginx -c <filename>
— sets configuration file (default:/etc/nginx/nginx.conf
)nginx -p <directory>
— sets prefix path (default:/etc/nginx/
)nginx -T
— tests the NGINX configuration and prints the validated configuration on the screennginx -s <signal>
— sends a signal to the NGINX master process:stop
— discontinues the NGINX process immediatelyquit
— stops the NGINX process after it finishes processing
inflight requestsreload
— reloads the configuration without stopping processesreopen
— instructs NGINX to reopen log files
nginx -g <directive>
— sets global directives out of configuration file
Some useful snippets for management of the NGINX daemon:
-
testing configuration:
/usr/sbin/nginx -t -c /etc/nginx/nginx.conf /usr/sbin/nginx -t -q -g 'daemon on; master_process on;' # ; echo $? /usr/local/etc/rc.d/nginx status
-
starting daemon:
/usr/sbin/nginx -g 'daemon on; master_process on;' service nginx start systemctl start nginx /usr/local/etc/rc.d/nginx start # You can also start NGINX from start-stop-daemon script: /sbin/start-stop-daemon --quiet --start --exec /usr/sbin/nginx --background --retry QUIT/5 --pidfile /run/nginx.pid
-
stopping daemon:
# graceful shutdown (waiting for the worker processes to finish serving current requests) /usr/sbin/nginx -s quit # fast shutdown (kill connections immediately) /usr/sbin/nginx -s stop service nginx stop systemctl stop nginx /usr/local/etc/rc.d/nginx stop # You can also stop NGINX from start-stop-daemon script: /sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
-
reloading daemon:
/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload service nginx reload systemctl reload nginx /usr/local/etc/rc.d/nginx reload kill -HUP $(cat /var/run/nginx.pid) kill -HUP $(pgrep -f "nginx: master")
-
restarting daemon:
service nginx restart systemctl restart nginx /usr/local/etc/rc.d/nginx restart
Something about testing configuration:
You cannot test half-baked configurations. For example, you defined a server section for your domain in a separate file. Any attempt to test such a file will throw errors. The file has to be complete in all respects.
Configuration syntax
🔖 Organising Nginx configuration — Base Rules — P2
🔖 Format, prettify and indent your Nginx code — Base Rules — P2
NGINX uses a micro programming language in the configuration files. This language’s design is heavily influenced by Perl and Bourne Shell. Configuration syntax, formatting and definitions follow a so-called C-style convention. For me, NGINX configuration has a simple and very transparent structure.
Comments
NGINX configuration files don’t support comment blocks, they only accept #
at the beginning of a line for a comment.
End of lines
Lines containing directives must end with a semicolon (;
), otherwise NGINX will fail to load the configuration and report an error.
Variables, Strings, and Quotes
Variables start with $
and that get set automaticaly for each request. The ability to set variables at runtime and control logic flow based on them is part of the rewrite module and not a general feature of NGINX. By default, we cannot modify built-in variables like $host
or $request_uri
.
There are some directives that do not support variables, e.g.
access_log
(is really the exception because can contain variables with restrictions) orerror_log
. Variables probably can’t be (and shouldn’t be because they are evaluated in the run-time during the processing of each request and rather costly compared to plain static configuration) declared anywhere, with very few exceptions:root
directive can contains variables,server_name
directive only allows strict$hostname
built-in value as a variable-like notation (but it’s more like a magic constant). If you use variables inif
context, you can only set them inif
conditions (and maybe rewrite directives). Don’t try to use them elsewhere.
To assign value to the variable you should use a set
directive:
See
if
,break
, andset
section to learn more about variables.
Some interesting things about variables:
Make sure to read the agentzh’s Nginx Tutorials — it’s about NGINX tips & tricks. This guy is a NGINX Guru and creator of the OpenResty. In these tutorials he describes, amongst other things, variables in great detail. I also recommend nginx built-in variables post.
- the most variables in NGINX only exist at runtime, not during configuration time
- the scope of variables spreads out all over configuration
- variable assignment occurs when requests are actually being served
- variable have exactly the same lifetime as the corresponding request
- each request does have its own version of all those variables’ containers (different containers values)
- requests do not interfere with each other even if they are referencing a variable with the same name
- the assignment operation is only performed in requests that access location
Strings may be inputted without quotes unless they include blank spaces, semicolons or curly braces, then they need to be escaped with backslashes or enclosed in single/double quotes.
Quotes are required for values which are containing space(s) and/or some other special characters, otherwise NGINX will not recognize them. You can either quote or -escape some special characters like
" "
or ";"
in strings (characters that would make the meaning of a statement ambiguous). So the following instructions are the same:
# 1) add_header My-Header "nginx web server;"; # 2) add_header My-Header nginx web server;;
Variables in quoted strings are expanded normally unless the $
is escaped.
Directives, Blocks, and Contexts
Read this great article about the NGINX configuration inheritance model by Martin Fjordvald.
Configuration options are called directives. We have four types of directives:
-
standard directive — one value per context:
-
array directive — multiple values per context:
error_log /var/log/nginx/localhost/localhost-error.log warn;
-
action directive — something which does not just configure:
rewrite ^(.*)$ /msie/$1 break;
-
try_files
directive:try_files $uri $uri/ /test/index.html;
Valid directives begin with a variable name and then state an argument or series of arguments separated by spaces.
Directives are organised into groups known as blocks or contexts. Generally, context is a block directive that can have other directives inside braces. It appears to be organised in a tree-like structure, defined by sets of brackets — {
and }
.
The curly braces actually denote a new configuration context.
As a general rule, if a directive is valid in multiple nested scopes, a declaration in a broader context will be passed on to any child contexts as default values. The children contexts can override these values at will.
Directives placed in the configuration file outside of any contexts are considered to be in the global/main context.
Special attention should be paid to some strange behavior associated with some directives. For more information please see Set the HTTP headers with add_header and proxy_*_header directives properly rule.
Directives can only be used in the contexts that they were designed for. NGINX will error out on reading a configuration file with directives that are declared in the wrong context.
If you want to review all directives see alphabetical index of directives.
Contexts can be layered within one another (a level of inheritance). Their structure looks like this:
Global/Main Context
|
|
+-----» Events Context
|
|
+-----» HTTP Context
| |
| |
| +-----» Server Context
| | |
| | |
| | +-----» Location Context
| |
| |
| +-----» Upstream Context
|
|
+-----» Mail Context
The most important contexts are shown in the following description. These will be the ones that you will be dealing with for the most part:
-
global
— contains global configuration directives; is used to set the settings for NGINX globally and is the only context that is not surrounded by curly braces -
events
— configuration for the events module; is used to set global options for connection processing; contains directives that affect connection processing are specified -
http
— controls all the aspects of working with the HTTP module and holds directives for handling HTTP and HTTPS traffic; directives in this context can be grouped into:- HTTP client directives
- HTTP file I/O directives
- HTTP hash directives
- HTTP socket directives
-
server
— defines virtual host settings and describes a logical separation of a set of resources associated with a particular domain or IP address -
location
— define directives to handle client request and indicates a URI that comes either from the client or from an internal redirect -
upstream
— define a pool of back-end servers that NGINX can proxy the request; commonly used for defining either a web server cluster for load balancing
NGINX also provides other contexts (e.g. used for mapping) such as:
-
map
— is used to set the value of a variable depending on the value of another variable. It provides a mapping of one variable’s values to determine what the second variable should be set to -
geo
— is used to specify a mapping. However, this mapping is specifically used to categorize client IP addresses. It sets the value of a variable depending on the connecting IP address -
types
— is used to map MIME types to the file extensions that should be associated with them -
if
— provide conditional processing of directives defined within, execute the instructions contained if a given test returnstrue
-
limit_except
— is used to restrict the use of certain HTTP methods within a location context
Look also at the graphic below. It presents the most important contexts with reference to the configuration:
For HTTP, NGINX lookup starts from the http block, then through one or more server blocks, followed by the location block(s).
External files
include
directive may appear inside any contexts to perform conditional inclusion. It attaching another file, or files matching the specified mask:
include /etc/nginx/proxy.conf; # or: include /etc/nginx/conf/*.conf;
You cannot use variables in NGINX config file includes. This is because includes are processed before any variables are evaluated.
See also this:
Variables should not be used as template macros. Variables are evaluated in the run-time during the processing of each request, so they are rather costly compared to plain static configuration. Using variables to store static strings is also a bad idea. Instead, a macro expansion and «include» directives should be used to generate configs more easily and it can be done with the external tools, e.g. sed + make or any other common template mechanism.
Measurement units
It is recommended to always specify a suffix for the sake of clarity and consistency.
Sizes can be specified in:
- without a suffix: Bytes
k
orK
: Kilobytesm
orM
: Megabytesg
orG
: Gigabytes
Time intervals can be specified in:
- without a suffix: Seconds
ms
: Millisecondss
: Secondsm
: Minutesh
: Hoursd
: Daysw
: WeeksM
: Months (30 days)y
: Years (365 days)
proxy_read_timeout 20; # =20s, default
Some of the time intervals can be specified only with a seconds resolution. You should also remember about this:
Multiple units can be combined in a single value by specifying them in the order from the most to the least significant, and optionally separated by whitespace. For example,
1h 30m
specifies the same time as90m
or5400s
.
Regular expressions with PCRE
🔖 Enable PCRE JIT to speed up processing of regular expressions — Performance — P2
Before start reading next chapters you should know what regular expressions are and how they works (they are not a black magic really). I recommend two great and short write-ups about regular expressions created by Jonny Fox:
- Regex tutorial — A quick cheatsheet by examples
- Regex cookbook — Top 10 Most wanted regex
Why? Regular expressions can be used in both the server_name
and location
(also in other) directives, and sometimes you must have a great skills of reading them. I think you should create the most readable regular expressions that do not become spaghetti code — impossible to debug and maintain.
NGINX uses the PCRE library to perform complex manipulations with your location
blocks and use the powerful rewrite
directive. To use a regular expression for string matching, it first needs to be compiled, which is usually done at the configuration phase.
You can also enable pcre_jit
to dynamic translation during execution (at run time) rather than prior to execution. This option can improve performance, however, in some cases pcre_jit
may have a negative effect. So, before enabling it, I recommend you to read this great document: PCRE Performance Project.
Below is also something interesting about regular expressions and PCRE:
- Learn PCRE in Y minutes
- PCRE Regex Cheatsheet
- Regular Expression Cheat Sheet — PCRE
- Regex cheatsheet
- Regular expressions in Perl
- Regexp Security Cheatsheet
- A regex cheatsheet for all those regex haters (and lovers)
You can also use external tools for testing regular expressions. For more please see online tools chapter.
If you’re good at it, check these very nice and brainstorming regex challenges:
- RegexGolf
- Regex Crossword
Enable syntax highlighting
vi/vim
# 1) Download vim plugin for NGINX: # Official NGINX vim plugin: mkdir -p ~/.vim/syntax/ wget "http://www.vim.org/scripts/download_script.php?src_id=19394" -O ~/.vim/syntax/nginx.vim # Improved NGINX vim plugin (incl. syntax highlighting) with Pathogen: mkdir -p ~/.vim/{autoload,bundle}/ curl -LSso ~/.vim/autoload/pathogen.vim https://tpo.pe/pathogen.vim echo -en "nexecute pathogen#infect()n" >> ~/.vimrc git clone https://github.com/chr4/nginx.vim ~/.vim/bundle/nginx.vim # 2) Set location of NGINX config files: cat > ~/.vim/filetype.vim << __EOF__ au BufRead,BufNewFile /etc/nginx/*,/etc/nginx/conf.d/*,/usr/local/nginx/conf/*,*/conf/nginx.conf if &ft == '' | setfiletype nginx | endif __EOF__
It may be interesting for you: Highlight insecure SSL configuration in Vim.
Sublime Text
Install cabal
— system for building and packaging Haskell libraries and programs (on Ubuntu):
add-apt-repository -y ppa:hvr/ghc apt-get update apt-get install -y cabal-install-1.22 ghc-7.10.2 # Add this to the main configuration file of your shell: export PATH=$HOME/.cabal/bin:/opt/cabal/1.22/bin:/opt/ghc/7.10.2/bin:$PATH source $HOME/.<shellrc> cabal update
-
nginx-lint
:git clone https://github.com/temoto/nginx-lint cd nginx-lint && cabal install --global
-
sublime-nginx
+SublimeLinter-contrib-nginx-lint
:Bring up the Command Palette and type
install
. Among the commands you should see Package Control: Install Package. Typenginx
to install sublime-nginx and after that do the above again for install SublimeLinter-contrib-nginx-lint: typeSublimeLinter-contrib-nginx-lint
.
Processes
🔖 Adjust worker processes — Performance — P3
🔖 Improve debugging by disable daemon, master process, and all workers except one — Debugging — P4
NGINX has one master process and one or more worker processes. It has also cache loader and cache manager processes but only if you enable caching.
The main purposes of the master process is to read and evaluate configuration files, as well as maintain the worker processes (respawn when a worker dies), handle signals, notify workers, opens log files, and, of course binding to ports.
Master process should be started as root user, because this will allow NGINX to open sockets below 1024 (it needs to be able to listen on port 80 for HTTP and 443 for HTTPS).
To defines the number of worker processes you should set
worker_processes
directive.
The worker processes do the actual processing of requests and get commands from master process. They runs in an event loop (registering events and responding when one occurs), handle network connections, read and write content to disk, and communicate with upstream servers. These are spawned by the master process, and the user and group will as specified (unprivileged).
The worker processes spend most of the time just sleeping and waiting for new events (they are in
S
state intop
).
The following signals can be sent to the NGINX master process:
SIGNAL | NUM | DESCRIPTION |
---|---|---|
TERM , INT |
15, 2 | quick shutdown |
QUIT |
3 | graceful shutdown |
KILL |
9 | halts a stubborn process |
HUP |
1 | configuration reload, start new workers, gracefully shutdown the old worker processes |
USR1 |
10 | reopen the log files |
USR2 |
12 | upgrade executable on the fly |
WINCH |
28 | gracefully shutdown the worker processes |
There’s no need to control the worker processes yourself. However, they support some signals too:
SIGNAL | NUM | DESCRIPTION |
---|---|---|
TERM , INT |
15, 2 | quick shutdown |
QUIT |
3 | graceful shutdown |
USR1 |
10 | reopen the log files |
CPU pinning
Moreover, it is important to mention about worker_cpu_affinity
directive (it’s only supported on GNU/Linux). CPU affinity is used to control which CPUs NGINX utilizes for individual worker processes. By default, worker processes are not bound to any specific CPUs. What’s more, system might schedule all worker processes to run on the same CPU which may not be efficient enough.
CPU affinity is represented as a bitmask (given in hexadecimal), with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU.
Here you will find an amazing explanation of this. There is a worker_cpu_affinity configuration generator for NGINX. After all, I would recommend to let the OS scheduler to do the work because there is no reason to ever set it up during normal operation.
Shutdown of worker processes
This should come in useful if you want to tweak NGINX’s shutdown process, particularly if other servers or load balancers are relying upon predictable restart times or if it takes a long time to close worker processes.
The worker_shutdown_timeout
directive configures a timeout to be used when gracefully shutting down worker processes. When the timer expires, NGINX will try to close all the connections currently open to facilitate shutdown.
NGINX’s Maxim Dounin explains:
The
worker_shutdown_timeout
directive is not expected to delay shutdown if there are no active connections. It was introduced to limit possible time spent in shutdown, that is, to ensure fast enough shutdown even if there are active connections.
When a worker process enters the «exiting» state, it does a few things:
- mark itself as an exiting process
- set a shutdown timer, if
worker_shutdown_timeout
is defined - close listening sockets
- close idle connections
Then, if the shutdown timer was set, after the worker_shutdown_timeout
interval, all connections are closed.
By default, NGINX to wait for and process additional data from a client before fully closing a connection, but only if heuristics suggests that a client may be sending more data.
Sometimes, you can see nginx: worker process is shutting down
in your log file. The problem occurs when reloading the configuration — where NGINX usually exits the existing worker processes gracefully, but at times, it takes hours to close these processes. Every config reload may dropping a zombie workers, permanently eating up all of your system’s memory. In this case, fast shutdown of worker processes might be a solution.
In addition, setting worker_shutdown_timeout
also solve the issue:
worker_shutdown_timeout 60s;
Test connection timeouts and how long your request is processed by a server, next adjust the worker_shutdown_timeout
value to these values. 60 seconds is a value with a solid supply and nothing valid should last longer than that.
In my experience, if you have multiple workers in a shutting down state, maybe you should first look at the loaded modules that may cause problems with hanging worker processes.
Connection processing
NGINX supports a variety of connection processing methods which depends on the platform used.
In general there are four types of event multiplexing:
select
— is anachronism and not recommended but installed on all platforms as a fallbackpoll
— is anachronism and not recommended
And the most efficient implementations of non-blocking I/O:
epoll
— recommend if you’re using GNU/Linuxkqueue
— recommend if you’re using BSD (it is technically superior toepoll
)
The select
method can be enabled or disabled using the --with-select_module
or --without-select_module
configuration parameter. Similarly, the poll
method can be enabled or disabled using the --with-poll_module
or --without-poll_module
configuration parameter.
epoll
is an efficient method of processing connections available on Linux 2.6+.kqueue
is an efficient method of processing connections available on FreeBSD 4.1+, OpenBSD 2.9+, and NetBSD 2.0+.
There is normally no need to specify it explicitly, because NGINX will by default use the most efficient method. But if you want to set this:
There are also great resources (also makes comparisons) about them:
- Kqueue: A generic and scalable event notification facility
- poll vs select vs event-based
- select/poll/epoll: practical difference for system architects
- Scalable Event Multiplexing: epoll vs. kqueue
- Async IO on Linux: select, poll, and epoll
- A brief history of select(2)
- Select is fundamentally broken
- Epoll is fundamentally broken
- I/O Multiplexing using epoll and kqueue System Calls
- Benchmarking BSD and Linux
- The C10K problem
Look also at libevent benchmark (read about libevent – an event notification library):
This infographic comes from daemonforums — An interesting benchmark (kqueue vs. epoll).
You may also view why big players uses NGINX on FreeBSD instead of on GNU/Linux:
- FreeBSD NGINX Performance
- Why did Netflix use NGINX and FreeBSD to build their own CDN?
NGINX means connections as follows (the following status information is provided by ngx_http_stub_status_module
):
-
Active connections — the current number of active (open) client connections including waiting connections and connections to backends
- accepts — the total number of accepted client connections
- handled — the total number of handled connections. Generally, the parameter value is the same as
accepts
unless some resource limits have been reached (for example, theworker_connections
limit) - requests — the total number of client requests
-
Reading — the current number of connections where NGINX is reading the request header
-
Writing — the current number of connections where NGINX is writing the response back to the client (reads request body, processes request, or writes response to a client)
-
Waiting — the current number of idle client connections waiting for a request, i.e. connection still opened waiting for either a new request, or the keepalive expiration (actually it is Active — (Reading + Writing))
Waiting connections those are keepalive connections. They are usually not a problem but if you want to reduce them set the lower value of the
keepalive_timeout
directive.
Be sure to recommend to read this:
Writing connections counter increasing might indicate one of the following:
- crashed or killed worker processes. This is unlikely in your case though, as this would also result in other values growing as well, notably
Waiting
- a real socket leak somewhere. These usually results in sockets in
CLOSE_WAIT
state (in a waiting state for the FIN packet terminating the connection), try looking atnetstat
output withoutgrep -v CLOSE_WAIT
filter. Leaked sockets are reported by NGINX during graceful shutdown of a worker process (for example, after a configuration reload) — if there are any leaked sockets, NGINX will writeopen socket ... left in connection ...
alerts to the error logTo further investigate things, please do the following:
- upgrade to the latest mainline versions, without any 3rd party modules, and check if you are able to reproduce the issue
- try disabling HTTP/2 to see if it fixes the issue
- check if you are seeing
open socket ... left in connection ...
(socket leaks) alerts on configuration reload
See also Debugging socket leaks (from this handbook).
Event-Driven architecture
Thread Pools in NGINX Boost Performance 9x! — this official article is an amazing explanation about thread pools and generally about handling connections. I also recommend Inside NGINX: How We Designed for Performance & Scale. Both are really great.
NGINX uses Event-Driven architecture which heavily relies on Non-Blocking I/O. One advantage of non-blocking/asynchronous operations is that you can maximize the usage of a single CPU as well as memory because is that your thread can continue it’s work in parallel. The end result is that even as load increases, memory and CPU usage remain manageable.
There is a perfectly good and brief summary about non-blocking I/O and multi-threaded blocking I/O by Werner Henze. I also recommend asynchronous vs non-blocking by Daniel Earwicker.
Take a look at this simple drawing:
This infographic comes from Kansas State Polytechnic website.
Blocking I/O system calls (a) do not return until the I/O is complete. Nonblocking I/O system calls return immediately. The process is later notified when the I/O is complete.
There are forms of I/O and examples of POSIX functions:
Blocking | Non-blocking | Asynchronous |
---|---|---|
write , read |
write , read + poll/select |
aio_write , aio_read |
Look also what the official documentation says about it:
It’s well known that NGINX uses an asynchronous, event‑driven approach to handling connections. This means that instead of creating another dedicated process or thread for each request (like servers with a traditional architecture), it handles multiple connections and requests in one worker process. To achieve this, NGINX works with sockets in a non‑blocking mode and uses efficient methods such as epoll and kqueue.
Because the number of full‑weight processes is small (usually only one per CPU core) and constant, much less memory is consumed and CPU cycles aren’t wasted on task switching. The advantages of such an approach are well‑known through the example of NGINX itself. It successfully handles millions of simultaneous requests and scales very well.
I must not forget to mention here about Non-Blocking and 3rd party modules (also from official documentation):
Unfortunately, many third‑party modules use blocking calls, and users (and sometimes even the developers of the modules) aren’t aware of the drawbacks. Blocking operations can ruin NGINX performance and must be avoided at all costs.
To handle concurrent requests with a single worker process NGINX uses the reactor design pattern. Basically, it’s a single-threaded but it can fork several processes to utilize multiple cores.
However, NGINX is not a single threaded application. Each of worker processes is single-threaded and can handle thousands of concurrent connections. Workers are used to get request parallelism across multiple cores. When a request blocks, that worker will work on another request.
NGINX does not create a new process/thread for each connection/requests but it starts several worker threads during start. It does this asynchronously with one thread, rather than using multi-threaded programming (it uses an event loop with asynchronous I/O).
That way, the I/O and network operations are not a very big bottleneck (remember that your CPU would spend a lot of time waiting for your network interfaces, for example). This results from the fact that NGINX only use one thread to service all requests. When requests arrive at the server, they are serviced one at a time. However, when the code serviced needs other thing to do it sends the callback to the other queue and the main thread will continue running (it doesn’t wait).
Now you see why NGINX can handle a large amount of requests perfectly well (and without any problems).
For more information take a look at following resources:
- Asynchronous, Non-Blocking I/O
- Asynchronous programming. Blocking I/O and non-blocking I/O
- Blocking I/O and non-blocking I/O
- Non-blocking I/O
- About High Concurrency, NGINX architecture and internals
- A little holiday present: 10,000 reqs/sec with Nginx!
- Nginx vs Apache: Is it fast, if yes, why?
- How is Nginx handling its requests in terms of tasks or threading?
- Why nginx is faster than Apache, and why you needn’t necessarily care
- How we scaled nginx and saved the world 54 years every day
Finally, look at these great preview:
Both infographic comes from Inside NGINX: How We Designed for Performance & Scale.
Multiple processes
NGINX uses only asynchronous I/O, which makes blocking a non-issue. The only reason NGINX uses multiple processes is to make full use of multi-core, multi-CPU, and hyper-threading systems. NGINX requires only enough worker processes to get the full benefit of symmetric multiprocessing (SMP).
From official documentation:
The NGINX configuration recommended in most cases — running one worker process per CPU core — makes the most efficient use of hardware resources.
NGINX uses a custom event loop which was designed specifically for NGINX — all connections are processed in a highly efficient run-loop in a limited number of single-threaded processes called workers. Worker processes accept new requests from a shared listen socket and execute a loop. There’s no specialized distribution of connections to the workers in NGINX; this work is done by the OS kernel mechanisms which notifies a workers.
Upon startup, an initial set of listening sockets is created. workers then continuously accept, read from and write to the sockets while processing HTTP requests and responses. — from The Architecture of Open Source Applications — NGINX.
Multiplexing works by using a loop to increment through a program chunk by chunk operating on one piece of data/new connection/whatever per connection/object per loop iteration. It is all based on events multiplexing like epoll()
or kqueue()
. Within each worker NGINX can handle many thousands of concurrent connections and requests per second.
See Nginx Internals presentation as a lot of great stuff about the internals of the NGINX.
NGINX does not fork a process or thread per connection (like Apache) so memory usage is very conservative and extremely efficient in the vast majority of cases. NGINX is a faster and consumes less memory than Apache and performs very well under load. It is also very friendly for CPU because there’s no ongoing create-destroy pattern for processes or threads.
Finally and in summary:
- uses Non-Blocking «Event-Driven» architecture
- uses the single-threaded reactor pattern to handle concurrent requests
- uses highly efficient loop for connection processing
- is not a single threaded application because it starts multiple worker processes (to handle multiple connections and requests) during start
Simultaneous connections
Okay, so how many simultaneous connections can be processed by NGINX?
worker_processes * worker_connections = max connections
According to this: if you are running 4 worker processes with 4,096 worker connections per worker, you will be able to serve 16,384 connections. Of course, these are the NGINX settings limited by the kernel (number of connections, number of open files, or number of processes).
At this point, I would like to mention about Understanding socket and port in TCP. It is a great and short explanation. I also recommend to read Theoretical maximum number of open TCP connections that a modern Linux box can have.
I’ve seen some admins does directly translate the sum of worker_processes
and worker_connections
into the number of clients that can be served simultaneously. In my opinion, it is a mistake because certain of clients (e.g. browsers which have different values for this) opens a number of parallel connections (see this to confirm my words). Clients typically establish 4-8 TCP connections so that they can download resources in parallel (to download various components that compose a web page, for example, images, scripts, and so on). This increases the effective bandwidth and reduces latency.
That is a HTTP/1.1 limit (6-8) of concurrent HTTP calls. The best solution to improve performance (without upgrade the hardware and use cache at the middle (e.g. CDN, Varnish)) is using HTTP/2 (RFC 7540 [IETF]) instead of HTTP/1.1.
HTTP/2 multiplex many HTTP requests on a single connection. When HTTP/1.1 has a limit of 6-8 roughly, HTTP/2 does not have a standard limit but say: «It is recommended that this value (
SETTINGS_MAX_CONCURRENT_STREAMS
) be no smaller than 100» (RFC 7540). That number is better than 6-8.
Additionally, you must know that the worker_connections
directive includes all connections per worker (e.g. connection structures are used for listen sockets, internal control sockets between NGINX processes, connections with proxied servers, and for upstream connections), not only incoming connections from clients.
Be aware that every worker connection (in the sleeping state) needs 256 bytes of memory, so you can increase it easily.
The number of connections is especially limited by the maximum number of open files (RLIMIT_NOFILE
) on your system (you can read about file descriptors and file handlers on this great explanation). The reason is that the operating system needs memory to manage each open file, and memory is a limited resource. This limitation only affects the limits for the current process. The limits of the current process are bequeathed to children processes too, but each process has a separate count.
To change the limit of the maximum file descriptors (that can be opened by a single worker process) you can also edit the worker_rlimit_nofile
directive. With this, NGINX provides very powerful dynamic configuration capabilities with no service restarts.
The number of file descriptors is not the only one limitation of the number of connections — remember also about the kernel network (TCP/IP stack) parameters and the maximum number of processes.
I don’t like this piece of the NGINX documentation. Maybe I’m missing something but it says the worker_rlimit_nofile
is a limit on the maximum number of open files for worker processes. I believe it is associated to a single worker process.
If you set RLIMIT_NOFILE
to 25,000 and worker_rlimit_nofile
to 12,000, NGINX sets (only for workers) the maximum open files limit as a worker_rlimit_nofile
. But the master process will have a set value of RLIMIT_NOFILE
. Default value of worker_rlimit_nofile
directive is none
so by default NGINX sets the initial value of maximum open files from the system limits.
# On GNU/Linux (or /usr/lib/systemd/system/nginx.service): grep "LimitNOFILE" /lib/systemd/system/nginx.service LimitNOFILE=5000 grep "worker_rlimit_nofile" /etc/nginx/nginx.conf worker_rlimit_nofile 256; PID SOFT HARD 24430 5000 5000 24431 256 256 24432 256 256 24433 256 256 24434 256 256 # To check fds on FreeBSD: sysctl kern.maxfiles kern.maxfilesperproc kern.openfiles kern.maxfiles: 64305 kern.maxfilesperproc: 57870 kern.openfiles: 143
This is also controlled by the OS because the worker is not the only process running on the server. It would be very bad if your workers used up all of the file descriptors available to all processes, don’t set your limits so that is possible.
In my opinion, relying on the RLIMIT_NOFILE
(and alternatives on other systems) than worker_rlimit_nofile
value is more understandable and predictable. To be honest, it doesn’t really matter which method is used to set, but you should keep a constant eye on the priority of the limits.
If you don’t set the
worker_rlimit_nofile
directive manually, then the OS settings will determine how many file descriptors can be used by NGINX.
I think that the chance of running out of file descriptors is minimal, but it might be a big problem on a high traffic websites.
Ok, so how many fds are opens by NGINX?
- one file handler for the client’s active connection
- one file handler for the proxied connection (that will open a socket handling these requests to remote or local host/process)
- one file handler for opening file (e.g. static file)
- other file handlers for internal connections, shared libraries, log files, and sockets
Also important is:
NGINX can use up to two file descriptors per full-fledged connection.
Look also at these diagrams:
-
1 file handler for connection with client and 1 file handler for static file being served by NGINX:
# 1 connection, 2 file handlers +-----------------+ +----------+ | | | | 1 | | | CLIENT <---------------> NGINX | | | | ^ | +----------+ | | | | 2 | | | | | | | | | +------v------+ | | | STATIC FILE | | | +-------------+ | +-----------------+
-
1 file handler for connection with client and 1 file handler for a open socket to the remote or local host/process:
# 2 connections, 2 file handlers +-----------------+ +----------+ | | +-----------+ | | 1 | | 2 | | | CLIENT <---------------> NGINX <---------------> BACKEND | | | | | | | +----------+ | | +-----------+ +-----------------+
-
2 file handlers for two simultaneous connections from the same client (1, 4), 1 file handler for connection with other client (3), 2 file handlers for static files (2, 5), and 1 file handler for a open socket to the remote or local host/process (6), so in total it is 6 file descriptors:
# 4 connections, 6 file handlers 4 +-----------------------+ | +--------|--------+ +-----v----+ | | | | | 1 | v | 6 | CLIENT <-----+---------> NGINX <---------------+ | | | | ^ | +-----v-----+ +----------+ | | | | | | 3 | | 2 | 5 | | BACKEND | +----------+ | | | | | | | | | | | | +-----------+ | CLIENT <----+ | +------v------+ | | | | | STATIC FILE | | +----------+ | +-------------+ | +-----------------+
In the first two examples: we can take that NGINX needs 2 file handlers for full-fledged connection (but still uses 2 worker connections). In the third example NGINX can take still 2 file handlers for every full-fledged connection (also if client uses parallel connections).
So, to conclude, I think that the correct value of worker_rlimit_nofile
per all connections of worker should be greater than worker_connections
.
In my opinion, the safe value of worker_rlimit_nofile
(and system limits) is:
# 1 file handler for 1 connection:
worker_connections + (shared libs, log files, event pool, etc.) = worker_rlimit_nofile
# 2 file handlers for 1 connection:
(worker_connections * 2) + (shared libs, log files, event pool, etc.) = worker_rlimit_nofile
That is probably how many files can be opened by each worker and should have a value greater than to the number of connections per worker (according to the above formula).
In the most articles and tutorials we can see that this parameter has a value similar to the maximum number (or even more) of all open files by the NGINX. If we assume that this parameter applies to each worker separately these values are altogether excessive.
However, after a deeper reflection they are rational because they allow one worker to use all the file descriptors so that they are not confined to other workers if something happens to them. Remember though that we are still limited by the connections per worker. May I remind you that any connection opens at least one file.
So, moving on, the maximum number of open files by the NGINX should be:
(worker_processes * worker_connections * 2) + (shared libs, log files, event pool, etc.) = max open files
To serve 16,384 connections by all workers (4,096 connections for each worker), and bearing in mind about the other handlers used by NGINX, a reasonably value of max files handlers in this case may be 35,000. I think it’s more than enough.
Given the above to change/improve the limitations you should:
-
Edit the maximum, total, global number of file descriptors the kernel will allocate before choking (this step is optional, I think you should change this only for a very very high traffic):
# Find out the system-wide maximum number of file handles: sysctl fs.file-max # Shows the current number of all file descriptors in kernel memory: # first value: <allocated file handles> # second value: <unused-but-allocated file handles> # third value: <the system-wide maximum number of file handles> # fs.file-max sysctl fs.file-nr # Set it manually and temporarily: sysctl -w fs.file-max=150000 # Set it permanently: echo "fs.file-max = 150000" > /etc/sysctl.d/99-fs.conf # And load new values of kernel parameters: sysctl -p # for /etc/sysctl.conf sysctl --system # for /etc/sysctl.conf and all of the system configuration files
-
Edit the system-wide value of the maximum file descriptor number that can be opened by a single process:
-
for non-systemd systems:
# Set the maximum number of file descriptors for the users logged in via PAM: # /etc/security/limits.conf nginx soft nofile 35000 nginx hard nofile 35000
-
for systemd systems:
# Set the maximum number (hard limit) of file descriptors for the services started via systemd: # /etc/systemd/system.conf - global config (default values for all units) # /etc/systemd/user.conf - this specifies further per-user restrictions # /lib/systemd/system/nginx.service - default unit for the NGINX service # /etc/systemd/system/nginx.service - for your own instance of the NGINX service [Service] # ... LimitNOFILE=35000 # Reload a unit file and restart the NGINX service: systemctl daemon-reload && systemct restart nginx
-
-
Adjusts the system limit on number of open files for the NGINX worker. The maximum value can not be greater than
LimitNOFILE
(in this example: 35,000). You can change it at any time:# Set the limit for file descriptors for a single worker process (change it as needed): # nginx.conf within the main context worker_rlimit_nofile 10000; # You need to reload the NGINX service: nginx -s reload
To show the current hard and soft limits applying to the NGINX processes (with nofile
, LimitNOFILE
, or worker_rlimit_nofile
):
for _pid in $(pgrep -f "nginx: [master,worker]") ; do echo -en "$_pid " grep "Max open files" /proc/${_pid}/limits | awk '{print $4" "$5}' done | xargs printf '%6s %10st%sn%6s %10st%sn' "PID" "SOFT" "HARD"
or use the following:
# To determine the OS limits imposed on a process, read the file /proc/$pid/limits. # $pid corresponds to the PID of the process: for _pid in $(pgrep -f "nginx: [master,worker]") ; do echo -en ">>> $_pid\n" cat /proc/$_pid/limits done
To list the current open file descriptors for each NGINX process:
for _pid in $(pgrep -f "nginx: [master,worker]") ; do _fds=$(find /proc/${_pid}/fd/*) _fds_num=$(echo "$_fds" | wc -l) echo -en "nn##### PID: $_pid ($_fds_num fds) #####nn" # List all files from the proc/{pid}/fd directory: echo -en "$_fdsnn" # List all open files (log files, memory mapped files, libs): lsof -as -p $_pid | awk '{if(NR>1)print}' done
You should also remember about the following rules:
-
worker_rlimit_nofile
serves to dynamically change the maximum file descriptors the NGINX worker processes can handle, which is typically defined with the system’s soft limit (ulimit -Sn
) -
worker_rlimit_nofile
works only at the process level, it’s limited to the system’s hard limit (ulimit -Hn
) -
if you have SELinux enabled, you will need to run
setsebool -P httpd_setrlimit 1
so that NGINX has permissions to set its rlimit. To diagnose SELinux denials and attempts you can usesealert -a /var/log/audit/audit.log
, oraudit2why
andaudit2allow
tools
To sum up this example:
- each of the NGINX processes (master + workers) have the ability to create up to 35,000 files
- for all workers, the maximum number of file descriptors is 140,000 (
LimitNOFILE
per worker) - for each worker, the initial/current number of file descriptors is 10,000 (
worker_rlimit_nofile
)
nginx: master process = LimitNOFILE (35,000)
_ nginx: worker process = LimitNOFILE (35,000), worker_rlimit_nofile (10,000)
_ nginx: worker process = LimitNOFILE (35,000), worker_rlimit_nofile (10,000)
_ nginx: worker process = LimitNOFILE (35,000), worker_rlimit_nofile (10,000)
_ nginx: worker process = LimitNOFILE (35,000), worker_rlimit_nofile (10,000)
= master (35,000), all workers:
- 140,000 by LimitNOFILE
- 40,000 by worker_rlimit_nofile
Look also at this great article about Optimizing Nginx for High Traffic Loads.
HTTP Keep-Alive connections
🔖 Activate the cache for connections to upstream servers — Performance — P2
Before starting this section I recommend to read the following articles:
- HTTP Keepalive Connections and Web Performance
- Optimizing HTTP: Keep-alive and Pipelining
- Evolution of HTTP — HTTP/0.9, HTTP/1.0, HTTP/1.1, Keep-Alive, Upgrade, and HTTPS
The original model of HTTP, and the default one in HTTP/1.0, is short-lived connections. Each HTTP request is completed on its own connection; this means a TCP handshake happens before each HTTP request, and these are serialized. The client creates a new TCP connection for each transaction (and the connection is torn down after the transaction completes).
HTTP Keep-Alive connection or persistent connection is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses (Keep Alive’s work between requests), as opposed to opening a new connection for every single request/response pair.
When using keep alive the browser does not have to make multiple connections (keep in mind that establishing connections is expensive) but uses the already established connection and controls how long that stays active/open. So, the keep alive is a way to reduce the overhead of creating the connection, as, most of the time, a user will navigate through the site etc. (plus the multiple requests from a single page, to download css, javascript, images etc.).
It takes a 3-way handshake to establish a TCP connection, so, when there is a perceivable latency between the client and the server, keepalive would greatly speed things up by reusing existing connections.
This mechanism hold open the TCP connection between the client and the server after an HTTP transaction has completed. It’s important because NGINX needs to close connections from time to time, even if you configure NGINX to allow infinite keep alive timeouts and a huge amount of acceptable requests per connection, to return results and as well errors and success messages.
Persistent connection model keeps connections opened between successive requests, reducing the time needed to open new connections. The HTTP pipelining model goes one step further, by sending several successive requests without even waiting for an answer, reducing much of the latency in the network.
This infographic comes from Mozilla MDN — Connection management in HTTP/1.x.
However, at present, browsers are not using pipelined HTTP requests. For more information please see Why is pipelining disabled in modern browsers?.
Look also at this example that shows how a Keep-Alive header could be used:
Client Proxy Server
| | |
+- Keep-Alive: timeout=600 -->| |
| Connection: Keep-Alive | |
| +- Keep-Alive: timeout=1200 -->|
| | Connection: Keep-Alive |
| | |
| |<-- Keep-Alive: timeout=300 --+
| | Connection: Keep-Alive |
|<- Keep-Alive: timeout=5000 -+ |
| Connection: Keep-Alive | |
| | |
NGINX official documentation says:
All connections are independently negotiated. The client indicates a timeout of 600 seconds (10 minutes), but the proxy is only prepared to retain the connection for at least 120 seconds (2 minutes). On the link between proxy and server, the proxy requests a timeout of 1200 seconds and the server reduces this to 300 seconds. As this example shows, the timeout policies maintained by the proxy are different for each connection. Each connection hop is independent.
Keepalive connections reduce overhead, especially when SSL/TLS is in use but they also have drawbacks; even when idling they consume server resources, and under heavy load, DoS attacks can be conducted. In such cases, using non-persistent connections, which are closed as soon as they are idle, can provide better performance. So, Keep-Alives will improve SSL/TLS performance by quite a big deal if clients are doing multiple requests but if you don’t have the resources to handle them then they kill your servers.
NGINX closes keepalive connections when the
worker_connections
limit is reached (connections are kept in the cache till the origin server closes them).
To better understand how Keep-Alive works, please see amazing explanation by Barry Pollard.
NGINX provides the two layers to enable Keep-Alive:
Client layer
-
the maximum number of keepalive requests a client can make over a given connection, which means a client can make e.g. 256 successfull requests inside one keepalive connection:
# Default: 100 keepalive_requests 256;
-
server will close connection after this time. A higher number may be required when there is a larger amount of traffic to ensure there is no frequent TCP connection re-initiated. If you set it lower, you are not utilizing keep-alives on most of your requests slowing down client:
# Default: 75s keepalive_timeout 10s; # Or tell the browser when it should close the connection by adding an optional second timeout # in the header sent to the browser (some browsers do not care about the header): keepalive_timeout 10s 25s;
Increase this to allow the keepalive connection to stay open longer, resulting in faster subsequent requests. However, setting this too high will result in the waste of resources (mainly memory) as the connection will remain open even if there is no traffic, potentially: significantly affecting performance. I think this should be as close to your average response time as possible. You could also decrease little by little the timeout (75s -> 50s, then later 25s…) and see how the server behaves.
Upstream layer
-
the number of idle keepalive connections that remain open for each worker process. The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process (when this number is exceeded, the least recently used connections are closed):
# Default: disable keepalive 32;
NGINX, by default, only talks on HTTP/1.0 to the upstream servers. To keep TCP connection alive both upstream section and origin server should be configured to not finalise the connection.
Please keep in mind that keepalive is a feature of HTTP 1.1, NGINX uses HTTP 1.0 per default for upstreams.
Connection won’t be reused by default because keepalive in the upstream section means no keepalive (each time you can see TCP stream number increases per every request to origin server).
HTTP keepalive enabled in NGINX upstream servers reduces latency thus improves performance and it reduces the possibility that the NGINX runs out of ephemeral ports.
The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.
Update your upstream configuration to use keepalive:
upstream bk_x8080 { ... # Sets the maximum number of idle keepalive connections to upstream servers # that are preserved in the cache of each worker process. keepalive 16; }
And enable the HTTP/1.1 protocol in all upstream requests:
server { ... location / { # Default is HTTP/1, keepalive is only enabled in HTTP/1.1: proxy_http_version 1.1; # Remove the Connection header if the client sends it, # it could be "close" to close a keepalive connection: proxy_set_header Connection ""; proxy_pass http://bk_x8080; } } ... }
There are two basic cases when keeping connections alive is really beneficial:
- fast backends, which produce responses is a very short time, comparable to a TCP handshake
- distant backends, when a TCP handshake takes a long time, comparable to a backend response time
Look at the test:
- without keepalive for upstream:
wrk -c 500 -t 6 -d 60s -R 15000 -H "Host: example.com" https://example.com/ Running 1m test @ https://example.com/ 6 threads and 500 connections Thread Stats Avg Stdev Max +/- Stdev Latency 24.13s 10.68s 49.55s 59.06% Req/Sec 679.21 42.44 786.00 78.95% 228421 requests in 1.00m, 77.98MB read Socket errors: connect 0, read 0, write 0, timeout 1152 Non-2xx or 3xx responses: 4 Requests/sec: 3806.96 Transfer/sec: 1.30MB
- with keepalive for upstream:
wrk -c 500 -t 6 -d 60s -R 15000 -H "Host: example.com" https://example.com/ Running 1m test @ https://example.com/ 6 threads and 500 connections Thread Stats Avg Stdev Max +/- Stdev Latency 23.40s 9.53s 47.25s 60.67% Req/Sec 0.86k 50.19 0.94k 60.00% 294148 requests in 1.00m, 100.41MB read Socket errors: connect 0, read 0, write 0, timeout 380 Requests/sec: 4902.24 Transfer/sec: 1.67MB
sendfile
, tcp_nodelay
, and tcp_nopush
Before you start reading please review:
- Nginx optimization, understanding SENDFILE, TCP_NODELAY and TCP_NOPUSH
- Nginx Tutorial #2: Performance
As you’re making these changes, keep careful watch on your network traffic and see how each tweak impacts congestion.
sendfile
By default, NGINX handles file transmission itself and copies the file into the buffer before sending it. Enabling the
sendfile
directive eliminates the step of copying the data into the buffer and enables direct copying data from one file descriptor to another.
Normally, when a file needs to be sent, the following steps are required:
malloc
— allocate a local buffer for storing object dataread
— retrieve and copy the object into the local bufferwrite
— copy the object from the local buffer into the socket buffer
Look at this great explanation (from Nginx Tutorial #2: Performance):
This involves two context switches (read, write) which make a second copy of the same object unnecessary. As you may see, it is not the optimal way. Thankfully, there is another system call that improves sending files, and it’s called (surprise, surprise!):
sendfile(2)
. This call retrieves an object to the file cache, and passes the pointers (without copying the whole object) straight to the socket descriptor. Netflix states that usingsendfile(2)
increased the network throughput from 6Gbps to 30Gbps.
When a file is transferred by a process, the kernel first buffers the data and then sends the data to the process buffers. The process, in turn, sends the data to the destination.
NGINX employs a solution that uses the sendfile
system call to perform a zero-copy data flow from disk to socket and saves context switching from userspace on read/write. sendfile
tell how NGINX buffers or reads the file (trying to stuff the contents directly into the network slot, or buffer its contents first).
This method is an improved method of data transfer, in which data is copied between file descriptors within the OS kernel space, that is, without transferring data to the application buffers. No additional buffers or data copies are required, and the data never leaves the kernel memory address space.
In my opinion enabling this really won’t make any difference unless NGINX is reading from something which can be mapped into the virtual memory space like a file (i.e. the data is in the cache). But please… do not let me influence you — you should in the first place be keeping an eye on this document: Optimizing TLS for High–Bandwidth Applications in FreeBSD [pdf].
By default NGINX disable the use of sendfile
:
# http, server, location, if in location contexts # To turn on sendfile (my recommendation): sendfile on; # To turn off sendfile: sendfile off; # default
Look also at sendfile_max_chunk
directive. NGINX documentation say:
When set to a non-zero value, limits the amount of data that can be transferred in a single
sendfile()
call. Without the limit, one fast connection may seize the worker process entirely.
On fast local connection sendfile()
in Linux may send tens of megabytes per one syscall blocking other connections. sendfile_max_chunk
allows to limit the maximum size per one sendfile()
operation. So, with this NGINX can reduce the maximum time spent in blocking sendfile()
calls, since NGINX won’t try to send the whole file at once, but will do it in chunks. For example:
sendfile on; sendfile_max_chunk 512k;
tcp_nodelay
I recommend to read The Caveats of TCP_NODELAY and Rethinking the TCP Nagle Algorithm [pdf]. These great papers describes very interesting topics about TCP_NODELAY
and TCP_NOPUSH
.
tcp_nodelay
is used to manage Nagle’s algorithm which is one mechanism for improving TCP efficiency by reducing the number of small packets sent over the network. If you set tcp_nodelay on;
, NGINX adds the TCP_NODELAY
options when opening a new socket.
The option only affects keep-alive connections. Otherwise there is 100ms delay when NGINX sends response tail in the last incomplete TCP packet. Additionally, it is enabled on SSL connections, for unbuffered proxying, and for WebSocket proxying.
Maybe you should think about enabling Nagle’s algorithm (tcp_nodelay off;
) but it really depends on what is your specific workload and dominant traffic patterns on a service. tcp_nodelay on;
is more reasonable for the modern web, the whole delay business of TCP was reasonable for terminals. Typically LANs have less issues with traffic congestion as compared to the WANs. The Nagle algorithm is most effective if TCP/IP traffic is generated sporadically by user input, not by applications using stream oriented protocols like a HTTP traffic.
So, for me, the recipe is simple:
- bulk sends or HTTP traffic
- applications that require lower latency
- non-interactive type of traffic
There is no need for using Nagle’s algorithm.
You should also know the Nagle’s algorithm author’s interesting comment:
If you’re doing bulk file transfers, you never hit that problem. If you’re sending enough data to fill up outgoing buffers, there’s no delay. If you send all the data and close the TCP connection, there’s no delay after the last packet. If you do send, reply, send, reply, there’s no delay. If you do bulk sends, there’s no delay. If you do send, send, reply, there’s a delay.
The real problem is ACK delays. The 200ms «ACK delay» timer is a bad idea that someone at Berkeley stuck into BSD around 1985 because they didn’t really understand the problem. A delayed ACK is a bet that there will be a reply from the application level within 200ms. TCP continues to use delayed ACKs even if it’s losing that bet every time.
I think if you are dealing with non-interactive type of traffic or bulk transfers such as HTTP/web traffic then enabling TCP_NODELAY
to disable Nagle’s algorithm may be useful (is the default behavior of the NGINX). This is especially relevant if you’re running applications or environments that only sometimes have highly interactive traffic and chatty protocols.
By default NGINX enable the use of TCP_NODELAY
option:
# http, server, location contexts # To turn on tcp_nodelay and at the same time to disable Nagle’s algorithm # (my recommendation, unless you turn tcp_nopush on): tcp_nodelay on; # default # To turn off tcp_nodelay and at the same time to enable Nagle’s algorithm: tcp_nodelay off;
tcp_nopush
This option is only available if you are using sendfile
(NGINX uses tcp_nopush
for requests served with sendfile
). It causes NGINX to attempt to send its HTTP response head in one packet, instead of using partial frames. This is useful for prepending headers before calling sendfile
, or for throughput optimization.
Normally, using
tcp_nopush
along withsendfile
is very good. However, there are some cases where it can slow down things (specially from cache systems), so, run your own tests and find if it’s useful in that way.
tcp_nopush
enables TCP_CORK
(more specifically, the TCP_NOPUSH
socket option on FreeBSD or the TCP_CORK
socket option on Linux) which aggressively accumulates data and which tells TCP to wait for the application to remove the cork before sending any packets.
If TCP_NOPUSH/TCP_CORK
(are not the same!) is enabled in a socket, it will not send data until the buffer fills to a fixed limit (allows application to control building of packet, e.g pack a packet with full HTTP response). To read more about it and get into the details of this option please read TCP_CORK: More than you ever wanted to know.
Once, I read that tcp_nopush
is opposite to tcp_nodelay
. I don’t agree with that because, as I understand it, the first one aggregates data based on buffer pressure instead whereas Nagle’s algorithm aggregates data while waiting for a return ACK, which the latter option disables.
It may appear that tcp_nopush
and tcp_nodelay
are mutually exclusive but if all directives are turned on, NGINX manages them very wisely:
- ensure packages are full before sending them to the client
- for the last packet,
tcp_nopush
will be removed, allowing TCP to send it immediately, without the 200ms delay
And let’s also remember (take a look at Tony Finch notes — this guy developed a kernel patch for FreeBSD which makes TCP_NOPUSH
work like TCP_CORK
):
- on Linux,
sendfile()
depends on theTCP_CORK
socket option to avoid undesirable packet boundaries - FreeBSD has a similar option called
TCP_NOPUSH
- when
TCP_CORK
is turned off any buffered data is sent immediately, but this is not the case forTCP_NOPUSH
By default NGINX disable the use of TCP_NOPUSH
option:
# http, server, location contexts # To turn on tcp_nopush (my recommendation): tcp_nopush on; # To turn off tcp_nopush: tcp_nopush off; # default
Mixing all together
There are many opinions on this. My recommendation is to set all to on
. However, I quote an interesting comment (Mixing sendfile, tcp_nodelay and tcp_nopush illogical?) that should dispel any doubts:
When set indicates to always queue non-full frames. Later the user clears this option and we transmit any pending partial frames in the queue. This is meant to be used alongside
sendfile()
to get properly filled frames when the user (for example) must write out headers with awrite()
call first and then usesendfile
to send out the data parts.TCP_CORK
can be set together withTCP_NODELAY
and it is stronger thanTCP_NODELAY
.
Summarizing:
tcp_nodelay on;
is generaly at the odds withtcp_nopush on;
as they are mutually exclusive- NGINX has special behavior that if you have
sendfile on;
, it usesTCP_NOPUSH
for everything but the last package - and then turns
TCP_NOPUSH
off and enablesTCP_NODELAY
to avoid 200ms ACK delay
So in fact, the most important changes are listed below:
sendfile on; tcp_nopush on; # with this, the tcp_nodelay does not really matter
Request processing stages
When building filtering rules (e.g. with
allow/deny
) you should always remember to test them and to know what happens at each of the phases (which modules are used). For additional information about the potential problems, look at allow and deny section and Take care about your ACL rules — Hardening — P1.
There can be altogether 11 phases when NGINX handles (processes) a request:
-
NGX_HTTP_POST_READ_PHASE
— first phase, read the request header- example modules: ngx_http_realip_module
-
NGX_HTTP_SERVER_REWRITE_PHASE
— implementation of rewrite directives defined in a server block; to change request URI using PCRE regular expressions, return redirects, and conditionally select configurations- example modules: ngx_http_rewrite_module
-
NGX_HTTP_FIND_CONFIG_PHASE
— replace the location according to URI (location lookup) -
NGX_HTTP_REWRITE_PHASE
— URI transformation on location level- example modules: ngx_http_rewrite_module
-
NGX_HTTP_POST_REWRITE_PHASE
— URI transformation post-processing (the request is redirected to a new location)- example modules: ngx_http_rewrite_module
-
NGX_HTTP_PREACCESS_PHASE
— authentication preprocessing request limit, connection limit (access restriction)- example modules: ngx_http_limit_req_module, ngx_http_limit_conn_module, ngx_http_realip_module
-
NGX_HTTP_ACCESS_PHASE
— verification of the client (the authentication process, limiting access)- example modules: ngx_http_access_module, ngx_http_auth_basic_module
-
NGX_HTTP_POST_ACCESS_PHASE
— access restrictions check post-processing phase, the certification process, processingsatisfy any
directive- example modules: ngx_http_access_module, ngx_http_auth_basic_module
-
NGX_HTTP_PRECONTENT_PHASE
— generating content- example modules: ngx_http_try_files_module
-
NGX_HTTP_CONTENT_PHASE
— content processing- example modules: ngx_http_index_module, ngx_http_autoindex_module, ngx_http_gzip_module
-
NGX_HTTP_LOG_PHASE
— log processing- example modules: ngx_http_log_module
You may feel lost now (me too…) so I let myself put this great and simple preview:
This infographic comes from Inside NGINX official library.
On every phase you can register any number of your handlers. Each phase has a list of handlers associated with it.
I recommend to read a great explanation about HTTP request processing phases in Nginx and, of course, official Development guide. I have also prepared a simple diagram that can help you understand what modules are used in each phase. It also contains short descriptions from official development guide:
Server blocks logic
NGINX does have server blocks (like a virtual hosts in an Apache) that use
listen
directive to bind to TCP sockets andserver_name
directive to identify virtual hosts.
It’s a short example of two server block contexts with several regular expressions:
http { index index.html; root /var/www/example.com/default; server { listen 10.10.250.10:80; server_name www.example.com; access_log logs/example.access.log main; root /var/www/example.com/public; location ~ ^/(static|media)/ { ... } location ~* /[0-9][0-9](-.*)(.html)$ { ... } location ~* .(jpe?g|png|gif|ico)$ { ... } location ~* (?<begin>.*app)/(?<end>.+.php)$ { ... } ... } server { listen 10.10.250.11:80; server_name "~^(api.)?example.com api.de.example.com"; access_log logs/example.access.log main; location ~ ^(/[^/]+)/api(.*)$ { ... } location ~ ^/backend/id/([a-z].[a-z]*) { ... } ... } }
Handle incoming connections
🔖 Define the listen directives with address:port pair — Base Rules — P1
🔖 Prevent processing requests with undefined server names — Base Rules — P1
🔖 Never use a hostname in a listen or upstream directives — Base Rules — P1
🔖 Use exact names in a server_name directive if possible — Performance — P2
🔖 Separate listen directives for 80 and 443 ports — Base Rules — P3
🔖 Use only one SSL config for the listen directive — Base Rules — P3
NGINX uses the following logic to determining which virtual server (server block) should be used:
-
Match the
address:port
pair to thelisten
directive — that can be multiple server blocks withlisten
directives of the same specificity that can handle the requestNGINX use the
address:port
combination for handle incoming connections. This pair is assigned to thelisten
directive.The
listen
directive can be set to:-
an IP address/port combination (
127.0.0.1:80;
) -
a lone IP address, if only address is given, the port
80
is used (127.0.0.1;
) — becomes127.0.0.1:80;
-
a lone port which will listen to every interface on that port (
80;
or*:80;
) — becomes0.0.0.0:80;
-
the path to a UNIX domain socket (
unix:/var/run/nginx.sock;
)
If the
listen
directive is not present then either*:80
is used (runs with the superuser privileges), or*:8000
otherwise.To play with
listen
directive NGINX must follow the following steps:-
NGINX translates all incomplete
listen
directives by substituting missing values with their default values (see above) -
NGINX attempts to collect a list of the server blocks that match the request most specifically based on the
address:port
-
If any block that is functionally using
0.0.0.0
, will not be selected if there are matching blocks that list a specific IP -
If there is only one most specific match, that server block will be used to serve the request
-
If there are multiple server blocks with the same level of matching, NGINX then begins to evaluate the
server_name
directive of each server block
Look at this short example:
# From client side: GET / HTTP/1.0 Host: api.random.com # From server side: server { # This block will be processed: listen 192.168.252.10; # --> 192.168.252.10:80 ... } server { listen 80; # --> *:80 --> 0.0.0.0:80 server_name api.random.com; ... }
-
-
Match the
Host
header field against theserver_name
directive as a string (the exact names hash table) -
Match the
Host
header field against theserver_name
directive with a
wildcard at the beginning of the string (the hash table with wildcard names starting with an asterisk)
If one is found, that block will be used to serve the request. If multiple matches are found, the longest match will be used to serve the request.
- Match the
Host
header field against theserver_name
directive with a
wildcard at the end of the string (the hash table with wildcard names ending with an asterisk)
If one is found, that block is used to serve the request. If multiple matches are found, the longest match will be used to serve the request.
- Match the
Host
header field against theserver_name
directive as a regular expression
The first
server_name
with a regular expression that matches theHost
header will be used to serve the request.
-
If all the
Host
headers doesn’t match, then direct to thelisten
directive marked asdefault_server
(makes the server block answer all the requests that doesn’t match any server block) -
If all the
Host
headers doesn’t match and there is nodefault_server
,
direct to the first server with alisten
directive that satisfies first step -
Finally, NGINX goes to the
location
context
This list is based on Mastering Nginx — The virtual server section.
Matching location
🔖 Make an exact location match to speed up the selection process — Performance — P3
For each request, NGINX goes through a process to choose the best location block that will be used to serve that request.
The location block enables you to handle several types of URIs/routes (Layer 7 routing based on URL), within a server block. Syntax looks like:
location optional_modifier location_match { ... }
location_match
in the above defines what NGINX should check the request URI against. The optional_modifier
below will cause the associated location block to be interpreted as follows (the order doesn’t matter at this moment):
-
(none)
: if no modifiers are present, the location is interpreted as a prefix match. To determine a match, the location will now be matched against the beginning of the URI -
=
: is an exact match, without any wildcards, prefix matching or regular expressions; forces a literal match between the request URI and the location parameter -
~
: if a tilde modifier is present, this location must be used for case sensitive matching (RE match) -
~*
: if a tilde and asterisk modifier is used, the location must be used for case insensitive matching (RE match) -
^~
: assuming this block is the best non-RE match, a carat followed by a tilde modifier means that RE matching will not take place
And now, a short introduction to determines location priority:
-
the exact match is the best priority (processed first); ends search if match
-
the prefix match is the second priority; there are two types of prefixes:
^~
and(none)
, if this match used the^~
prefix, searching stops -
the regular expression match has the lowest priority; there are two types of prefixes:
~
and~*
; in the order they are defined in the configuration file -
if regular expression searching yielded a match, that result is used, otherwise, the match from prefix searching is used
So, look at this example, it comes from the Nginx documentation — ngx_http_core_module:
location = / {
# Matches the query / only.
[ configuration A ]
}
location / {
# Matches any query, since all queries begin with /, but regular
# expressions and any longer conventional blocks will be
# matched first.
[ configuration B ]
}
location /documents/ {
# Matches any query beginning with /documents/ and continues searching,
# so regular expressions will be checked. This will be matched only if
# regular expressions don't find a match.
[ configuration C ]
}
location ^~ /images/ {
# Matches any query beginning with /images/ and halts searching,
# so regular expressions will not be checked.
[ configuration D ]
}
location ~* .(gif|jpg|jpeg)$ {
# Matches any request ending in gif, jpg, or jpeg. However, all
# requests to the /images/ directory will be handled by
# Configuration D.
[ configuration E ]
}
To help you understand how does location match works:
- Nginx location match tester
- Nginx location match visible
- NGINX Regular Expression Tester
The process of choosing NGINX location block is as follows (a detailed explanation):
- NGINX searches for an exact match. If a
=
modifier (e.g.location = foo { ... }
) exactly matches the request URI, this specific location block is chosen right away
- this block is processed
- match-searching stops
- Prefix-based NGINX location matches (no regular expression). Each location will be checked against the request URI. If no exact (meaning no
=
modifier) location block is found, NGINX will continue with non-exact prefixes. It starts with the longest matching prefix location for this URI, with the following approach:
-
In case the longest matching prefix location has the
^~
modifier (e.g.location ^~ foo { ... }
), NGINX will stop its search right away and choose this location- the block of the longest (most explicit) of those matches is processed
- match-searching stops
-
Assuming the longest matching prefix location doesn’t use the
^~
modifier, the match is temporarily stored and the process continues
I’m not sure about the order. In the official documentation it is not clearly indicated and external guides explain it differently. It seems logical to check the longest matching prefix location first.
- As soon as the longest matching prefix location is chosen and stored, NGINX continues to evaluate the case-sensitive (e.g.
location ~ foo { ... }
) and insensitive regular expression (e.g.location ~* foo { ... }
) locations. The first regular expression location that fits the URI is selected right away to process the request
- the block of the first matching regex found (when parsing the config-file top-to-bottom) is processed
- match-searching stops
- If no regular expression locations are found that match the request URI, the previously stored prefix location (e.g.
location foo { ... }
) is selected to serve the request
location /
kind of a catch all location- the block of the longest (most explicit) of those matches is processed
- match-searching stops
You should also know, that the non-regex match-types are fully declarative — order of definition in the config doesn’t matter — but the winning regex-match (if processing even gets that far) is entirely based on its order of entry in the config file.
In order, to better understand how this process work, please see this short cheatsheet that will allow you to design your location blocks in a predictable way:
I recommend to use external tools for testing regular expressions. For more please see online tools chapter.
Ok, so here’s a more complicated configuration:
server { listen 80; server_name xyz.com www.xyz.com; location ~ ^/(media|static)/ { root /var/www/xyz.com/static; expires 10d; } location ~* ^/(media2|static2) { root /var/www/xyz.com/static2; expires 20d; } location /static3 { root /var/www/xyz.com/static3; } location ^~ /static4 { root /var/www/xyz.com/static4; } location = /api { proxy_pass http://127.0.0.1:8080; } location / { proxy_pass http://127.0.0.1:8080; } location /backend { proxy_pass http://127.0.0.1:8080; } location ~ logo.xcf$ { root /var/www/logo; expires 48h; } location ~* .(png|ico|gif|xcf)$ { root /var/www/img; expires 24h; } location ~ logo.ico$ { root /var/www/logo; expires 96h; } location ~ logo.jpg$ { root /var/www/logo; expires 48h; } }
And look the table with the results:
URL | LOCATIONS FOUND | FINAL MATCH |
---|---|---|
/ |
1) prefix match for / |
/ |
/css |
1) prefix match for / |
/ |
/api |
1) exact match for /api |
/api |
/api/ |
1) prefix match for / |
/ |
/backend |
1) prefix match for / 2) prefix match for /backend |
/backend |
/static |
1) prefix match for / |
/ |
/static/header.png |
1) prefix match for / 2) case sensitive regex match for ^/(media|static)/ |
^/(media|static)/ |
/static/logo.jpg |
1) prefix match for / 2) case sensitive regex match for ^/(media|static)/ |
^/(media|static)/ |
/media2 |
1) prefix match for / 2) case insensitive regex match for ^/(media2|static2) |
^/(media2|static2) |
/media2/ |
1) prefix match for / 2) case insensitive regex match for ^/(media2|static2) |
^/(media2|static2) |
/static2/logo.jpg |
1) prefix match for / 2) case insensitive regex match for ^/(media2|static2) |
^/(media2|static2) |
/static2/logo.png |
1) prefix match for / 2) case insensitive regex match for ^/(media2|static2) |
^/(media2|static2) |
/static3/logo.jpg |
1) prefix match for /static3 2) prefix match for / 3) case sensitive regex match for logo.jpg$ |
logo.jpg$ |
/static3/logo.png |
1) prefix match for /static3 2) prefix match for / 3) case insensitive regex match for .(png|ico|gif|xcf)$ |
.(png|ico|gif|xcf)$ |
/static4/logo.jpg |
1) priority prefix match for /static4 2) prefix match for / |
/static4 |
/static4/logo.png |
1) priority prefix match for /static4 2) prefix match for / |
/static4 |
/static5/logo.jpg |
1) prefix match for / 2) case sensitive regex match for logo.jpg$ |
logo.jpg$ |
/static5/logo.png |
1) prefix match for / 2) case insensitive regex match for .(png|ico|gif|xcf)$ |
.(png|ico|gif|xcf)$ |
/static5/logo.xcf |
1) prefix match for / 2) case sensitive regex match for logo.xcf$ |
logo.xcf$ |
/static5/logo.ico |
1) prefix match for / 2) case insensitive regex match for .(png|ico|gif|xcf)$ |
.(png|ico|gif|xcf)$ |
rewrite
vs return
Generally there are two ways of implementing redirects in NGINX: with rewrite
and return
directives.
These directives (comes from the ngx_http_rewrite_module
) are very useful but (from the NGINX documentation) the only 100% safe things which may be done inside if in a location
context are:
return ...;
rewrite ... last;
Anything else may possibly cause unpredictable behaviour, including potential SIGSEGV
.
rewrite
directive
The rewrite
directives are executed sequentially in order of their appearance in the configuration file. It’s slower (but still extremely fast) than a return
and returns HTTP 302 in all cases, irrespective of permanent
.
The rewrite
directive just changes the request URI, not the response of request. Importantly only the part of the original url that matches the regex is rewritten. It can be used for temporary url changes.
I sometimes used rewrite
to capture elementes in the original URL, change or add elements in the path, and in general when I do something more complex:
location / { ... rewrite ^/users/(.*)$ /user.php?username=$1 last; # or: rewrite ^/users/(.*)/items$ /user.php?username=$1&page=items last; }
You must know that rewrite returns only code 301 or 302.
rewrite
directive accept optional flags:
-
break
— basically completes processing of rewrite directives, stops processing, and breakes location lookup cycle by not doing any location lookup and internal jump at all-
if you use
break
flag insidelocation
block:- no more parsing of rewrite conditions
- internal engine continues to parse the current
location
block
Inside a location block, with
break
, NGINX only stops processing anymore rewrite conditions. -
if you use
break
flag outsidelocation
block:- no more parsing of rewrite conditions
- internal engine goes to the next phase (searching for
location
match)
Outside a location block, with
break
, NGINX stops processing anymore rewrite conditions.
-
-
last
— basically completes processing of rewrite directives, stops processing, and starts a search for a new location matching the changed URI-
if you use
last
flag insidelocation
block:- no more parsing of rewrite conditions
- internal engine starts to look for another location match based on the result of the rewrite result
- no more parsing of rewrite conditions, even on the next location match
Inside a location block, with last, NGINX stops processing anymore rewrite conditions and then starts to look for a new matching of location block. NGINX also ignores any rewrites in the new location block.
-
if you use
last
flag outsidelocation
block:- no more parsing of rewrite conditions
- internal engine goes to the next phase (searching for
location
match)
Outside a location block, with
last
, NGINX stops processing anymore rewrite conditions.
-
-
redirect
— returns a temporary redirect with the 302 HTTP response code -
permanent
— returns a permanent redirect with the 301 HTTP response code
Note:
- that outside location blocks,
last
andbreak
are effectively the same - processing of rewrite directives at server level may be stopped via
break
, but the location lookup will follow anyway
This explanation is based on the awesome answer by Pothi Kalimuthu to nginx url rewriting: difference between break and last.
Official documentation has a great tutorials about Creating NGINX Rewrite Rules and Converting rewrite rules. I also recommend Clean Url Rewrites Using Nginx.
Finally, look at the difference between last
and break
flags in action:
last
directive:
break
directive:
This infographic comes from Internal rewrite — nginx by Ivan Dabic.
return
directive
🔖 Use return directive for URL redirection (301, 302) — Base Rules — P2
🔖 Use return directive instead of rewrite for redirects — Performance — P2
The other way is a return
directive. It’s faster than rewrite because there is no regexp that has to be evaluated. It’s stops processing and returns HTTP 301 (by default) to a client (tells NGINX to respond directly to the request), and the entire url is rerouted to the url specified.
I use return
directive in the following cases:
-
force redirect from http to https:
server { ... return 301 https://example.com$request_uri; }
-
redirect from www to non-www and vice versa:
server { ... # It's only example. You shouldn't use 'if' statement in the following case: if ($host = www.example.com) { return 301 https://example.com$request_uri; } }
-
close the connection and log it internally:
server { ... return 444; }
-
send 4xx HTTP response for a client without any other actions:
server { ... if ($request_method = POST) { return 405; } # or: if ($invalid_referer) { return 403; } # or: if ($request_uri ~ "^/app/(.+)$") { return 403; } # or: location ~ ^/(data|storage) { return 403; } }
-
and sometimes for reply with HTTP code without serving a file or response body:
server { ... # NGINX will not allow a 200 with no response body (200's need to be with a resource in the response. # '204 No Content' is meant to say "I've completed the request, but there is no body to return"): return 204 "it's all okay"; # Or without body: return 204; # Because default Content-Type is application/octet-stream, browser will offer to "save the file". # If you want to see reply in browser you should add properly Content-Type: # add_header Content-Type text/plain; }
To the last example: be careful if you’re using such a configuration to do a healthcheck. While a 204 HTTP code is semantically perfect for a healthcheck (success indication with no content), some services do not consider it a success.
URL redirections
🔖 Use return directive for URL redirection (301, 302) — Base Rules — P2
🔖 Use return directive instead of rewrite for redirects — Performance — P2
HTTP allows servers to redirect a client request to a different location. This is useful when moving content to a new URL, when deleting pages or when changing domain names or merging websites.
URL redirection is done for various reasons:
- for URL shortening
- to prevent broken links when web pages are moved
- to allow multiple domain names belonging to the same owner to refer to a single web site
- to guide navigation into and out of a website
- for privacy protection
- for hostile purposes such as phishing attacks or malware distribution
It comes from Wikipedia — URL redirection.
I recommend to read:
- Redirections in HTTP
- 301 101: How Redirects Work
- Modify 301/302 response body (from this handbook)
- Redirect POST request with payload to external endpoint (from this handbook)
try_files
directive
We have one more very interesting and important directive: try_files
(from the ngx_http_core_module
). This directive tells NGINX to check for the existence of a named set of files or directories (checks files conditionally breaking on success).
I think the best explanation comes from the official documentation:
try_files
checks the existence of files in the specified order and uses the first found file for request processing; the processing is performed in the current context. The path to a file is constructed from the file parameter according to the root and alias directives. It is possible to check directory’s existence by specifying a slash at the end of a name, e.g.$uri/
. If none of the files were found, an internal redirect to the uri specified in the last parameter is made.
Generally it may check files on disk, redirect to proxies or internal locations, and return error codes, all in one directive.
Take a look at the following example:
server { ... root /var/www/example.com; location / { try_files $uri $uri/ /frontend/index.html; } location ^~ /images { root /var/www/static; try_files $uri $uri/ =404; } ...
-
default root directory for all locations is
/var/www/example.com
-
location /
— matches all locations without more specific locations, e.g. exact names-
try_files $uri
— when you receive a URI that’s matched by this block try$uri
firstFor example:
https://example.com/tools/en.js
— NGINX will try to check if there’s a file inside/tools
calleden.js
, if found it, serve it in the first place. -
try_files $uri $uri/
— if you didn’t find the first condition try the URI as a directoryFor example:
https://example.com/backend/
— NGINX will try first check if a file calledbackend
exists, if can’t find it then goes to second check$uri/
and see if there’s a directory calledbackend
exists then it will try serving it. -
try_files $uri $uri/ /frontend/index.html
— if a file and directory not found, NGINX sends/frontend/index.html
-
-
location ^~ /images
— handle any query beginning with/images
and halts searching-
default root directory for this location is
/var/www/static
-
try_files $uri
— when you receive a URI that’s matched by this block try$uri
firstFor example:
https://example.com/images/01.gif
— NGINX will try to check if there’s a file inside/images
called01.gif
, if found it, serve it in the first place. -
try_files $uri $uri/
— if you didn’t find the first condition try the URI as a directoryFor example:
https://example.com/images/
— NGINX will try first check if a file calledimages
exists, if can’t find it then goes to second check$uri/
and see if there’s a directory calledimages
exists then it will try serving it. -
try_files $uri $uri/ =404
— if a file and directory not found, NGINX sendsHTTP 404
(Not Found)
-
On the other hand, try_files
is relatively primitive. When encountered, NGINX will look for any of the specified files physically in the directory matched by the location block. If they don’t exist, NGINX does an internal redirect to the last entry in the directive.
Additionally, think about dont’t check for the existence of directories:
# Use this to take out an extra filesystem stat(): try_files $uri @index; # Instead of this: try_files $uri $uri/ @index;
if
, break
and set
🔖 Avoid checks server_name with if directive — Performance — P2
The ngx_http_rewrite_module
also provides additional directives:
-
break
— stops processing, if is specified inside thelocation
, further processing of the request continues in this location:# It's useful for: if ($slow_resp) { limit_rate 50k; break; }
-
if
— you can useif
inside aserver
but not the other way around, also notice that you shouldn’t useif
insidelocation
as it may not work as desired. For example,if
statements aren’t a good way of setting custom headers because they may cause statements outside the if block to be ignored. The NGINX docs says:There are cases where you simply cannot avoid using an
if
, for example if you need to test a variable which has no equivalent directive.You should also remember about this:
The
if
context in NGINX is provided by the rewrite module and this is the primary intended use of this context. Since NGINX will test conditions of a request with many other purpose-made directives,if
should not be used for most forms of conditional execution. This is such an important note that the NGINX community has created a page called if is evil (yes, it’s really evil and in most cases not needed).A long time ago I found this:
That’s actually not true and shows you don’t understand the problem with it. When the
if
statement ends withreturn
directive, there is no problem and it’s safe to use.On the other hand, official documentation say:
Directive if has problems when used in location context, in some cases it doesn’t do what you expect but something completely different instead. In some cases it even segfaults. It’s generally a good idea to avoid it if possible.
-
set
— sets a value for the specified variable. The value can contain text, variables, and their combination
Example of usage if
and set
directives:
# It comes from: https://gist.github.com/jrom/1760790: if ($request_uri = /) { set $test A; } if ($host ~* example.com) { set $test "${test}B"; } if ($http_cookie !~* "auth_token") { set $test "${test}C"; } if ($test = ABC) { proxy_pass http://cms.example.com; break; }
root
vs alias
Placing a
root
oralias
directive in a location block overrides theroot
oralias
directive that was applied at a higher scope.
With alias
you can map to another file name. With root
forces you to name your file on the server. In the first case, NGINX replaces the string prefix e.g /robots.txt
in the URL path with e.g. /var/www/static/robots.01.txt
and then uses the result as a filesystem path. In the second, NGINX inserts the string e.g. /var/www/static/
at the beginning of the URL path and then uses the result as a file system path.
Look at this. There is a difference, when the alias
is for a whole directory will work:
location ^~ /data/ { alias /home/www/static/data/; }
But the following code won’t do:
location ^~ /data/ { root /home/www/static/data/; }
This would have to be:
location ^~ /data/ { root /home/www/static/; }
The root
directive is typically placed in server and location blocks. Placing a root
directive in the server block makes the root
directive available to all location blocks within the same server block.
This directive tells NGINX to take the request url and append it behind the specified directory. For example, with the following configuration block:
server { server_name example.com; listen 10.250.250.10:80; index index.html; root /var/www/example.com; location / { try_files $uri $uri/ =404; } location ^~ /images { root /var/www/static; try_files $uri $uri/ =404; } }
NGINX will map the request made to:
http://example.com/images/logo.png
into the file path/var/www/static/images/logo.png
http://example.com/contact.html
into the file path/var/www/example.com/contact.html
http://example.com/about/us.html
into the file path/var/www/example.com/about/us.html
Like you want to forward all requests which start /static
and your data present in /var/www/static
you should set:
- first path:
/var/www
- last path:
/static
- full path:
/var/www/static
location <last path> { root <first path>; ... }
NGINX documentation on the alias
directive suggests that it is better to use root
over alias
when the location matches the last part of the directive’s value.
The alias
directive can only be placed in a location block. The following is a set of configurations for illustrating how the alias
directive is applied:
server { server_name example.com; listen 10.250.250.10:80; index index.html; root /var/www/example.com; location / { try_files $uri $uri/ =404; } location ^~ /images { alias /var/www/static; try_files $uri $uri/ =404; } }
NGINX will map the request made to:
http://example.com/images/logo.png
into the file path/var/www/static/logo.png
http://example.com/images/ext/img.png
into the file path/var/www/static/ext/img.png
http://example.com/contact.html
into the file path/var/www/example.com/contact.html
http://example.com/about/us.html
into the file path/var/www/example.com/about/us.html
When location matches the last part of the directive’s value it is better to use the root directive (it seems like an arbitrary style choice because authors don’t justify that instruction at all). Look at this example from the official documentation:
location /images/ { alias /data/w3/images/; } # Better solution: location /images/ { root /data/w3; }
internal
directive
This directive specifies that the location block is internal. In other words,
the specified resource cannot be accessed by external requests.
On the other hand, it specifies how external redirections, i.e. locations like http://example.com/app.php/some-path
should be handled; while set, they should return 404, only allowing internal redirections. In brief, this tells NGINX it’s not accessible from the outside (it doesn’t redirect anything).
Conditions handled as internal redirections are listed in the documentation for internal
directive. Specifies that a given location can only be used for internal requests and are the following:
- requests redirected by the
error_page
,index
,random_index
, andtry_files
directives - requests redirected by the
X-Accel-Redirect
response header field from an upstream server - subrequests formed by the
include virtual
command of thengx_http_ssi_module module
, by thengx_http_addition_module
module directives, and byauth_request
andmirror
directives - requests changed by the
rewrite
directive
Example 1:
error_page 404 /404.html; location = /404.html { internal; }
Example 2:
The files are served from the directory /srv/hidden-files
by the path prefix /hidden-files/
. Pretty straightforward. The internal declaration tells NGINX that this path is accessible only through rewrites in the NGINX config, or via the X-Accel-Redirect
header in proxied responses.
To use this, just return an empty response which contains that header. The content of the header should be the location you want to redirect to:
location /hidden-files/ { internal; alias /srv/hidden-files/; }
Example 3:
Another use case for internal redirects in NGINX is to hide credentials. Often you need to make requests to 3rd party services. For example, you want to send text messages or access a paid maps server. It would be the most efficient to send these requests directly from your JavaScript front end. However, doing so means you would have to embed an access token in the front end. This means savvy users could extract this token and make requests on your account.
An easy fix is to make an endpoint in your back end which initiates the actual request. We could make use of an HTTP client library inside the back end. However, this will again tie up workers, especially if you expect a barrage of requests and the 3rd party service is responding very slowly.
location /external-api/ { internal; set $redirect_uri "$upstream_http_redirect_uri"; set $authorization "$upstream_http_authorization"; # For performance: proxy_buffering off; # Pass on secret from backend: proxy_set_header Authorization $authorization; # Use URI determined by backend: proxy_pass $redirect_uri; }
Examples 2 and 3 (both are great!) comes from How to use internal redirects in NGINX.
There is a limit of 10 internal redirects per request to prevent request processing cycles that can occur in incorrect configurations. If this limit is reached, the error HTTP 500 Internal Server Error is returned. In such cases, the
rewrite or internal redirection cycle
message can be seen in the error log.
Look also at Authentication Based on Subrequest Result from the official documentation.
External and internal redirects
External redirects originate directly from the client. So, if the client fetched https://example.com/directory
it would be directly fall into preceding location
block.
Internal redirect means that it doesn’t send a 302 response to the client, it simply performs an implicit rewrite of the url and attempts to process it as though the user typed the new url originally.
The internal redirect is different from the external redirect defined by HTTP response code 302 and 301, client browser won’t update its URI addresses.
To begin rewriting internally, we should explain the difference between redirects and internal rewrite. When source points to a destination that is out of source domain that is what we call redirect as your request will go from source to outside domain/destination.
With internal rewrite you would be, basically, doing the same only the destination is local path under same domain and not the outside location.
There is also great explanation about internal redirects:
The internal redirection (e.g. via the
echo_exec
orrewrite
directive) is an operation that makes NGINX jump from one location to another while processing a request (are very similar togoto
statement in the C language). This «jumping» happens completely within the server itself.
There are two different kinds of internal requests:
-
internal redirects — redirects the client requests internally. The URI is
changed, and the request may therefore match another location block and
become eligible for different settings. The most common case of internal
redirects is when using therewrite
directive, which allows you to rewrite the
request URI -
sub-requests — additional requests that are triggered internally to generate (insert or append to the body of the original request) content that is complementary to the main request (
addition
orssi
modules)
allow
and deny
🔖 Take care about your ACL rules — Hardening — P1
🔖 Reject unsafe HTTP methods — Hardening — P1
Both comes from the ngx_http_access_module
module and allows limiting access to certain client addresses. You can combining allow/deny
rules.
deny
will always return 403 error code.
The easiest path would be to start out by denying all access, then only granting access to those locations you want. For example:
location / { # without 'satisfy any' both should be passed: satisfy any; allow 192.168.0/0/16; deny all; # sh -c "echo -n 'user:' >> /etc/nginx/.secret" # sh -c "openssl passwd -apr1 >> /etc/nginx/.secret" auth_basic "Restricted Area"; auth_basic_user_file /etc/nginx/.secret; root /usr/share/nginx/html; index index.html index.htm; }
Putting satisfy any;
in your configuration tells NGINX to accept either http authentication, or IP restriction. By default, when you define both, it will expect both.
See also this answer:
As you’ve found, it isn’t advisable to but the auth settings at the server level because they will apply to all locations. While it is possible to turn basic auth off there doesn’t appear to be a way to clear an existing IP whitelist.
A better solution would be to add the authentication to the / location so that it isn’t inherited by /hello.
The problem comes if you have other locations that require the basic auth and IP whitelisting in which case it might be worth considering moving the auth components to an include file or nesting them under /.
Both directives may work unexpectedly! Look at the following example:
server { server_name example.com; deny all; location = /test { return 200 "it's all okay"; more_set_headers 'Content-Type: text/plain'; } }
If you generate a reqeust:
curl -i https://example.com/test
HTTP/2 200
date: Wed, 11 Nov 2018 10:02:45 GMT
content-length: 13
server: Unknown
content-type: text/plain
it's all okay
Why? Look at Request processing stages chapter. That’s because NGINX process request in phases, and rewrite
phase (where return
belongs) goes before access
phase (where deny
works).
uri
vs request_uri
🔖 Use
$request_uri
to avoid using regular expressions — Performance — P2
$request_uri
is the original request (for example /foo/bar.php?arg=baz
includes arguments and can’t be modified) but $uri
refers to the altered URI so $uri
is not equivalent to $request_uri
.
See this great and short explanation by Richard Smith:
The
$uri
variable is set to the URI that NGINX is currently processing — but it is also subject to normalisation, including:
- removal of the
?
and query string- consecutive
/
characters are replace by a single/
- URL encoded characters are decoded
The value of
$request_uri
is always the original URI and is not subject to any of the above normalisations.Most of the time you would use
$uri
, because it is normalised. Using$request_uri
in the wrong place can cause URL encoded characters to become doubly encoded.
Both excludes the schema (https://
and the port (implicit 443) in both examples above) as defined by RFC 2616 — http URL [IETF] for the URL:
http_URL = "http(s):" "//" host [ ":" port ] [ abs_path [ "?" query ]]
Take a look at the following table:
URL | $request_uri |
$uri |
---|---|---|
https://example.com/foo |
/foo |
/foo |
https://example.com/foo/bar |
/foo/bar |
/foo/bar |
https://example.com/foo/bar/ |
/foo/bar/ |
/foo/bar/ |
https://example.com/foo/bar? |
/foo/bar? |
/foo/bar |
https://example.com/foo/bar?do=test |
/foo/bar?do=test |
/foo/bar |
https://example.com/rfc2616-sec3.html#sec3.2 |
/rfc2616-sec3.html |
/rfc2616-sec3.html |
Another way to repeat the location is to use the proxy_pass
directive which is quite easy:
location /app/ { proxy_pass http://127.0.0.1:5000; # or: proxy_pass http://127.0.0.1:5000/api/app/; }
LOCATION | proxy_pass |
REQUEST | RECEIVED BY UPSTREAM |
---|---|---|---|
/app/ |
http://localhost:5000/api$request_uri |
/app/foo?bar=baz |
/api/webapp/foo?bar=baz |
/app/ |
http://localhost:5000/api$uri |
/app/foo?bar=baz |
/api/webapp/foo |
Compression and decompression
🔖 Mitigation of CRIME/BREACH attacks — Hardening Rules — P2
By default, NGINX compresses responses only with MIME type text/html using the gzip
method. So, if you send request with Accept-Encoding: gzip
header you will not see the Content-Encoding: gzip
in the response.
To enable gzip
compression:
To compress responses with other MIME types, include the gzip_types
directive and list the additional types:
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
Remember: by default, NGINX doesn’t compress image files using its per-request gzip module.
I also highly recommend you read this (it’s interesting observation about gzip and performance by Barry Pollard):
To be honest gzip is not very processor intensive these days and gzipping on the fly (and then unzipping in the browser) is often the norm. It’s something web browsers are very good at.
So unless you are getting huge volumes of traffic you’ll probably not notice any performance or CPU load impact due to on the fly gzipping for most web files.
To test HTTP and Gzip compression I recommend two external tools:
- HTTP Compression Test
- HTTP Gzip Compression Test
NGINX also compress large files and avoid the temptation to compress smaller files (such as images, executables, etc.), because very small files barely benefit from compression. You can tell NGINX not to compress files smaller than e.g. 128 bytes:
For more information see Finding the Nginx gzip_comp_level Sweet Spot.
Compressing resources on-the-fly adds CPU-load and latency (wait for the compression to be done) every time a resource is served. NGINX also provides static compression with static module. It is better, for 2 reasons:
- you don’t have to gzip for each request
- you can use a higher gzip level
For example:
# Enable static gzip compression: location ^~ /assets/ { gzip_static on; ... }
You should put the gzip_static on;
inside the blocks that configure static files, but if you’re only running one site, it’s safe to just put it in the http block.
NGINX does not automatically compress the files for you. You will have to do this yourself.
To compress files manually:
cd assets/ while IFS='' read -r -d '' _fd; do gzip -N4c ${_fd} > ${_fd}.gz done < <(find . -maxdepth 1 -type f -regex ".*.(css|js|jpg|gif|png|jpeg)" -print0)
So, for example, to service a request for /foo/bar/file
, NGINX tries to find and send the file /foo/bar/file.gz
that directly, so no extra CPU-cost or latency is added to your requests, speeding up the serving of your app.
What is the best NGINX compression gzip level?
The level of gzip compression simply determines how compressed the data is on a scale from 1-9, where 9 is the most compressed. The trade-off is that the most compressed data usually requires the most work to compress/decompress but look also at this great answer. Author explains that the level of gzip compression doesn’t affect the difficulty to decompress.
I think the ideal compression level seems to be between 4 and 6. The following directive set how much files will be compressed:
Hash tables
Before start reading this chapter I recommend Hash tables explained.
To assist with the rapid processing of requests, NGINX uses hash tables. NGINX hash, though in principle is same as typical hash lists, but it has significant differences.
They are not meant for applications that add and remove elements dynamicall but are specifically designed to hold set of init time elements arranged in hash list. All elements that are put in the hash list are known while creating the hash list itself. No dynamic addtion or deletion is possible here.
This hash table is constructed and compiled during restart or reload and afterwards it’s running very fast. Main purpose seems to be speeding up the lookup of one time added elements.
Look at the Setting up hashes from official documentation:
To quickly process static sets of data such as server names, map directive’s values, MIME types, names of request header strings, NGINX uses hash tables. During the start and each re-configuration NGINX selects the minimum possible sizes of hash tables such that the bucket size that stores keys with identical hash values does not exceed the configured parameter (hash bucket size). The size of a table is expressed in buckets. The adjustment is continued until the table size exceeds the hash max size parameter. Most hashes have the corresponding directives that allow changing these parameters.
I also recommend Optimizations section and nginx — Hashing scheme explanation.
Some important information (based on this amazing research by brablc):
-
the general recommendation would be to keep both values as small as possible and as less collisions as possible (during startup and with each reconfiguration, NGINX selects the smallest possible size for the hash tables)
-
it depends on your setup, you can reduce the number of server from the table and
reload
the NGINX instead ofrestart
-
if NGINX gave out communication about the need for increasing
hash_max_size
orhash_bucket_size
, then it is first necessary to increase the first parameter -
bigger
hash_max_size
uses more memory, biggerhash_bucket_size
uses more CPU cycles during lookup and more transfers from main memory to cache. If you have enough memory increasehash_max_size
and try to keephash_bucket_size
as low as possible -
each hash table entry consumes space in a bucket. The space required is the length of the key (with some overhead to store the domain’s actual length as well), e.g. domain name
Since
stage.api.example.com
is 21 characters, all entries consume at least 24 bytes in a bucket, and most consume 32 bytes or more. -
as you increase the number of entries, you have to increase the size of the hash table and/or the number of hash buckets in the table
If NGINX complains increase
hash_max_size
first as long as it complains. If the number exceeds some big number (32769 for instance), increasehash_bucket_size
to multiple of default value on your platform as long as it complains. If it does not complain anymore, decreasehash_max_size
back as long as it does not complain. Now you have the best setup for your set of server names (each set of server names may need different setup). -
with a hash bucket size of 64 or 128, a bucket is full after 4 or 5 entries hash to it
-
hash_max_size
is not related to number of server names directly, if number of servers doubles, you may need to increasehash_max_size
10 times or even more to avoid collisions. If you cannot avoid them, you have to increasehash_bucket_size
-
if you have
hash_max_size
less than 10000 and smallhash_bucket_size
, you can expect long loading time because NGINX would try to find optimal hash size in a loop (see src/core/ngx_hash.c) -
if you have
hash_max_size
bigger than 10000, there will be only 1000 loops performed before it would complain
Server names hash table
The hash with the names of servers are controlled by the following directives (inside http
context):
-
server_names_hash_max_size
— sets the maximum size of the server names hash tables; default value: 512 -
server_names_hash_bucket_size
— sets the bucket size for the server names hash tables; default values: 32, 64, or 128 (the default value depends on the size of the processor’s cache line)Parameter
server_names_hash_bucket_size
is always equalized to the size, multiple to the size of the line of processor cache.
If server name is defined as too.long.server.name.example.com
then NGINX will fail to start and display the error message like:
nginx: [emerg] could not build server_names_hash, you should increase server_names_hash_bucket_size: 64
To fix this, you should reload
the NGINX or increase the server_names_hash_bucket_size
directive value to the next power of two (in this case to 128).
If a large number of server names are defined, and NGINX complained with the following error:
nginx: [emerg] could not build the server_names_hash, you should increase either server_names_hash_max_size: 512 or server_names_hash_bucket_size: 32
Try to set the server_names_hash_max_size
to a number close to the number of server names. Only if this does not help, or if NGINX’s start time is unacceptably long, try to increase the server_names_hash_bucket_size
parameter.
Log files
🔖 Use custom log formats — Debugging — P4
Log files are a critical part of the NGINX management. It writes information about client requests in the access log right after the request is processed (in the last phase: NGX_HTTP_LOG_PHASE
).
By default:
- the access log is located in
logs/access.log
, but I suggest you take it to/var/log/nginx
directory - data is written in the predefined
combined/main
format access.log
stores record of each request and log format is fully configurableerror.log
contains important operational messages
It is the equivalent to the following configuration:
# In nginx.conf (default log format): http { ... log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; # but I suggest you change: log_format main '$remote_addr - $remote_user [$time_local] ' '"$request_method $scheme://$host$request_uri ' '$server_protocol" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' '$request_time'; }
For more information please see Configuring Logging.
Set
access log off;
to completely turns off logging.
If you don’t want 404 errors to show in your NGINX error logs, you should set
log_not_found off;
.
If you want to enable logging of subrequests into
access_log
, you should setlog_subrequest on;
and change the default logging format (you have to log$uri
to see the difference). There is great explanation about how to identify subrequests in NGINX log files.
I also recommend to read:
- ngx_http_log_module
- ngx_http_upstream_module
Conditional logging
Sometimes certain entries are there just to fill up the logs or are cluttering them. I sometimes exclude requests — by client IP or whatever else — when I want to debug log files more effective.
So, in this example, if the $error_codes
variable’s value is 0 — then log nothing (default action), but if 1 (e.g. 404
or 503
from backend) — to save this request to the log:
# Define map in the http context: http { ... map $status $error_codes { default 1; ~^[23] 0; } ... # Add if condition to the access log: access_log /var/log/nginx/example.com-access.log combined if=$error_codes; }
Manually log rotation
🔖 Configure log rotation policy — Base Rules — P1
NGINX will re-open its logs in response to the USR1
signal:
cd /var/log/nginx mv access.log access.log.0 kill -USR1 $(cat /var/run/nginx.pid) && sleep 1 # >= gzip-1.6: gzip -k access.log.0 # With any version: gzip < access.log.0 > access.log.0.gz # Test integrity and remove if test passed: gzip -t access.log.0 && rm -fr access.log.0
Error log severity levels
You can’t specify your own format, but in NGINX build-in several level’s of
error_log
-ing.
The following is a list of all severity levels:
TYPE | DESCRIPTION |
---|---|
debug |
information that can be useful to pinpoint where a problem is occurring |
info |
informational messages that aren’t necessary to read but may be good to know |
notice |
something normal happened that is worth noting |
warn |
something unexpected happened, however is not a cause for concern |
error |
something was unsuccessful, contains the action of limiting rules (default) |
crit |
important problems that need to be addressed |
alert |
severe situation where action is needed promptly |
emerg |
the system is in an unusable state and requires immediate attention |
For example: if you set crit
error log level, messages of crit
, alert
, and emerg
levels are logged.
For debug logging to work, NGINX needs to be built with
--with-debug
.
Default values for the error level:
- in the main section —
error
- in the HTTP section —
crit
- in the server section —
crit
How to log the start time of a request?
The most logging information requires the request to complete (status code, bytes sent, durations, etc). If you want to log the start time of a request in NGINX you should apply a patch that exposes request start time as a variable.
The $time_local
variable contains the time when the log entry is written so when the HTTP request header is read, NGINX does a lookup of the associated virtual server configuration. If the virtual server is found, the request goes through six phases:
- server rewrite phase
- location phase
- location rewrite phase (which can bring the request back to the previous phase)
- access control phase
try_files
phase- log phase
Since the log phase is the last one, $time_local
variable is much more close to the end of the request than it’s start.
How to log the HTTP request body?
Nginx doesn’t parse the client request body unless it really needs to, so it usually does not fill the $request_body
variable.
The exceptions are when:
- it sends the request to a proxy
- or a fastcgi server
So you really need to either add the proxy_pass
or fastcgi_pass
directives to your block.
# 1) Set log format: log_format req_body_logging '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" "$request_body"'; # 2) Limit the request body size: client_max_body_size 1k; client_body_buffer_size 1k; client_body_in_single_buffer on; # 3) Put the log format: server { ... location /api/v4 { access_log logs/access_req_body.log req_body_logging; proxy_pass http://127.0.0.1; ... } location = /post.php { access_log /var/log/nginx/postdata.log req_body_logging; fastcgi_pass php_cgi; ... } }
For this, you can also use echo module. To log a request body, what we need is to use the echo_read_request_body
directive and the $request_body
variable (contains the request body of the echo module).
echo_read_request_body
explicitly reads request body so that the$request_body
variable will always have non-empty values (unless the body is so big that it has been saved by NGINX to a local temporary file).
http { log_format req_body_logging '$request_body'; access_log /var/log/nginx/access.log req_body_logging; ... server { location / { echo_read_request_body; ... } ... } }
NGINX upstream variables returns 2 values
For example:
upstream_addr 192.168.50.201:8080 : 192.168.50.201:8080
upstream_bytes_received 427 : 341
upstream_connect_time 0.001 : 0.000
upstream_header_time 0.003 : 0.001
upstream_response_length 0 : 0
upstream_response_time 0.003 : 0.001
upstream_status 401 : 200
Below is a short description of each of them:
$upstream_addr
— keeps the IP address and port, or the path to the UNIX-domain socket of the upstream server. If several servers were contacted during request processing, their addresses are separated by commas, e.g.192.168.1.1:80, 192.168.1.2:80, unix:/tmp/sock
. If an internal redirect from one server group to another happens, initiated byX-Accel-Redirect
orerror_page
, then the server addresses from different groups are separated by colons, e.g.192.168.1.1:80, 192.168.1.2:80, unix:/tmp/sock : 192.168.10.1:80, 192.168.10.2:80
$upstream_cache_status
— keeps the status of accessing a response cache (0.8.3). The status can be eitherMISS
,BYPASS
,EXPIRED
,STALE
,UPDATING
,REVALIDATED
, orHIT
$upstream_connect_time
— time spent on establishing a connection with an upstream server$upstream_cookie_
— cookie with the specified name sent by the upstream server in theSet-Cookie
response header field (1.7.1). Only the cookies from the response of the last server are saved$upstream_header_time
— time between establishing a connection and receiving the first byte of the response header from the upstream server$upstream_http_
— keep server response header fields. For example, theServer
response header field is available through the$upstream_http_server
variable. The rules of converting header field names to variable names are the same as for the variables that start with the$http_
prefix. Only the header fields from the response of the last server are saved$upstream_response_length
— keeps the length of the response obtained from the upstream server (0.7.27); the length is kept in bytes. Lengths of several responses are separated by commas and colons like addresses in the$upstream_addr
variable$upstream_response_time
— time between establishing a connection and receiving the last byte of the response body from the upstream server$upstream_status
— keeps status code of the response obtained from the upstream server. Status codes of several responses are separated by commas and colons like addresses in the$upstream_addr
variable
Official documentation say:
[…] If several servers were contacted during request processing, their addresses are separated by commas. […] If an internal redirect from one server group to another happens, initiated by “X-Accel-Redirect” or error_page, then the server addresses from different groups are separated by colons
This means that it made multiple requests to a backend, most likely you either have a bare proxy_pass
host that resolves to different IPs (frequently the case with something like Amazon ELB as an origin), are you have a configured upstream that has multiple servers. Unless disabled, the proxy module will make round robin attempts against all healthy backends. This can be configured from proxy_next_upstream_*
directives.
For example if this is not the desired behavior, you can just do (specifies in which cases a request should be passed to the next server):
# One should bear in mind that passing a request to the next server is only possible # if nothing has been sent to a client yet. That is, if an error or timeout occurs # in the middle of the transferring of a response, fixing this is impossible. proxy_next_upstream off;
For more information please see ngx_http_upstream_module and proxy_next_upstream.
Reverse proxy
After reading this chapter, please see: Rules: Reverse Proxy.
This is one of the greatest feature of the NGINX. In simplest terms, a reverse proxy is a server that comes in-between internal applications and external clients, forwarding client requests to the appropriate server. It takes a client request, passes it on to one or more servers, and subsequently delivers the server’s response back to the client.
Official NGINX documentation says:
Proxying is typically used to distribute the load among several servers, seamlessly show content from different websites, or pass requests for processing to application servers over protocols other than HTTP.
You can also read a very good explanation about What’s the difference between proxy server and reverse proxy server.
A reverse proxy can off load much of the infrastructure concerns of a high-volume distributed web application.
This infographic comes from Jenkins with NGINX — Reverse proxy with https.
This allow you to have NGINX reverse proxy requests to unicorns, mongrels, webricks, thins, or whatever you really want to have run your servers.
Reverse proxy gives you number of advanced features such as:
- load balancing, failover, and transparent maintenance of the backend servers
- increased security (e.g. SSL termination, hide upstream configuration)
- increased performance (e.g. caching, load balancing)
- simplifies the access control responsibilities (single point of access and maintenance)
- centralised logging and auditing (single point of maintenance)
- add/remove/modify HTTP headers
In my opinion, the two most important things related to the reverse proxy are:
- the way of requests forwarded to the backend
- the type of headers forwarded to the backend
If we talking about security of the proxy server look at this recommendations about Guidelines on Securing Public Web Servers [NIST]. This document is a good starting point. Is old but still has interesting solutions and suggestions.
There is a great explanation about the benefits of improving security through the use of a reverse proxy server.
A reverse proxy gives you a couple things that may make your server more secure:
- a place to monitor and log what is going on separate from the web server
- a place to filter separate from your web server if you know that some area of your system is vulnerable. Depending on the proxy you may be able to filter at the application level
- another place to implement ACLs and rules if you cannot be expressive enough for some reason on your web server
- a separate network stack that will not be vulnerable in the same ways as your web server. This is particularly true if your proxy is from a different vendor
- a reverse proxy with no filtering does not automatically protect you against everything, but if the system you need to protect is high-value then adding a reverse proxy may be worth the costs support and performance costs
Another great answer about best practices for reverse proxy implementation:
In my experience some of the most important requirements and mitigations, in no particular order, are:
- make sure that your proxy, back-end web (and DB) servers cannot establish direct outbound (internet) connections (including DNS and SMTP, and particularly HTTP). This means (forward) proxies/relays for required outbound access, if required
- make sure your logging is useful (§9.1 in the above), and coherent. You may have logs from multiple devices (router, firewall/IPS/WAF, proxy, web/app servers, DB servers). If you can’t quickly, reliably and deterministically link records across each device together, you’re doing it wrong. This means NTP, and logging any or all of: PIDs, TIDs, session-IDs, ports, headers, cookies, usernames, IP addresses and maybe more (and may mean some logs contain confidential information)
- understand the protocols, and make deliberate, informed decisions: including cipher/TLS version choice, HTTP header sizes, URL lengths, cookies. Limits should be implemented on the reverse-proxy. If you’re migrating to a tiered architecture, make sure the dev team are in the loop so that problems are caught as early as possible
- run vulnerability scans from the outside, or get someone to do it for you. Make sure you know your footprint and that the reports highlight deltas, as well as the theoretical TLS SNAFU du-jour
- understand the modes of failure. Sending users a bare default «HTTP 500 — the wheels came off» when you have load or stability problems is sloppy
- monitoring, metrics and graphs: having normal and historic data is invaluable when investigating anomalies, and for capacity planning
- tuning: from TCP time-wait to listen backlog to SYN-cookies, again you need to make make deliberate, informed decisions
- follow basic OS hardening guidelines, consider the use of chroot/jails, host-based IDS, and other measures, where available
Passing requests
🔖 Use pass directive compatible with backend protocol — Reverse Proxy — P1
When NGINX proxies a request, it sends the request to a specified proxied server, fetches the response, and sends it back to the client.
It is possible to proxy requests to:
-
an HTTP servers (e.g. NGINX, Apache, or other) with
proxy_pass
directive:upstream bk_front { server 192.168.252.20:8080 weight=5; server 192.168.252.21:8080 } server { location / { proxy_pass http://bk_front; } location /api { proxy_pass http://192.168.21.20:8080; } location /info { proxy_pass http://localhost:3000; } location /ra-client { proxy_pass http://10.0.11.12:8080/guacamole/; } location /foo/bar/ { proxy_pass http://www.example.com/url/; } ... }
-
a non-HTTP servers (e.g. PHP, Node.js, Python, Java, or other) with
proxy_pass
directive (as a fallback) or directives specially designed for this:-
fastcgi_pass
which passes a request to a FastCGI server (PHP FastCGI Example):server { ... location ~ ^/.+.php(/|$) { fastcgi_pass 127.0.0.1:9000; include /etc/nginx/fcgi_params; } ... }
-
uwsgi_pass
which passes a request to a uWSGI server (Nginx support uWSGI):server { location / { root html; uwsgi_pass django_cluster; uwsgi_param UWSGI_SCRIPT testapp; include /etc/nginx/uwsgi_params; } ... }
-
scgi_pass
which passes a request to an SCGI server:server { location / { scgi_pass 127.0.0.1:4000; include /etc/nginx/scgi_params; } ... }
-
memcached_pass
which passes a request to a Memcached server:server { location / { set $memcached_key "$uri?$args"; memcached_pass memc_instance:4004; error_page 404 502 504 = @memc_fallback; } location @memc_fallback { proxy_pass http://backend; } ... }
-
redis_pass
which passes a request to a Redis server (HTTP Redis):server { location / { set $redis_key $uri; redis_pass redis_instance:6379; default_type text/html; error_page 404 = /fallback; } location @fallback { proxy_pass http://backend; } ... }
-
The proxy_pass
and other *_pass
directives specifies that all requests which match the location block should be forwarded to the specific socket, where the backend app is running.
However, more complex apps may need additional directives:
proxy_pass
— seengx_http_proxy_module
directives explanationfastcgi_pass
— seengx_http_fastcgi_module
directives explanationuwsgi_pass
— seengx_http_uwsgi_module
directives explanationscgi_pass
— seengx_http_scgi_module
directives explanationmemcached_pass
— seengx_http_memcached_module
directives explanationredis_pass
— seengx_http_redis_module
directives explanation
Trailing slashes
🔖 Be careful with trailing slashes in proxy_pass directive — Reverse Proxy — P3
If you have something like:
location /public/ { proxy_pass http://bck_testing_01; }
And go to http://example.com/public
, NGINX will automatically redirect you to http://example.com/public/
.
Look also at this example:
location /foo/bar/ { # proxy_pass http://example.com/url/; proxy_pass http://192.168.100.20/url/; }
If the URI is specified along with the address, it replaces the part of the request URI that matches the location parameter. For example, here the request with the /foo/bar/page.html
URI will be proxied to http://www.example.com/url/page.html
.
If the address is specified without a URI, or it is not possible to determine the part of URI to be replaced, the full request URI is passed (possibly, modified).
Here is an example with trailing slash in location, but no trailig slash in proxy_pass
:
location /foo/ { proxy_pass http://127.0.0.1:8080/bar; }
See how bar
and path
concatenates. If one go to http://yourserver.com/foo/path/id?param=1
NGINX will proxy request to http://127.0.0.1/barpath/id?param=1
.
As stated in NGINX documentation if proxy_pass
used without URI (i.e. without path after server:port
) NGINX will put URI from original request exactly as it was with all double slashes, ../
and so on.
Look also at the configuration snippets: Using trailing slashes.
Below are additional examples:
LOCATION | PROXY_PASS | REQUEST | RECEIVED BY UPSTREAM |
---|---|---|---|
/app/ |
http://localhost:5000/api/ |
/app/foo?bar=baz |
/api/foo?bar=baz |
/app/ |
http://localhost:5000/api |
/app/foo?bar=baz |
/apifoo?bar=baz |
/app |
http://localhost:5000/api/ |
/app/foo?bar=baz |
/api//foo?bar=baz |
/app |
http://localhost:5000/api |
/app/foo?bar=baz |
/api/foo?bar=baz |
/app |
http://localhost:5000/api |
/appfoo?bar=baz |
/apifoo?bar=baz |
In other words:
You usually always want a trailing slash, never want to mix with and without trailing slash, and only want without trailing slash when you want to concatenate a certain path component together (which I guess is quite rarely the case). Note how query parameters are preserved.
Passing headers to the backend
🔖 Set the HTTP headers with add_header and proxy_*_header directives properly — Base Rules — P1
🔖 Remove support for legacy and risky HTTP headers — Hardening — P1
🔖 Always pass Host, X-Real-IP, and X-Forwarded headers to the backend — Reverse Proxy — P2
🔖 Use custom headers without X- prefix — Reverse Proxy — P3
By default, NGINX redefines two header fields in proxied requests:
-
the
Host
header is re-written to the value defined by the$proxy_host
variable. This will be the IP address or name and port number of the upstream, directly as defined by theproxy_pass
directive -
the
Connection
header is changed toclose
. This header is used to signal information about the particular connection established between two parties. In this instance, NGINX sets this toclose
to indicate to the upstream server that this connection will be closed once the original request is responded to. The upstream should not expect this connection to be persistent
When NGINX proxies a request, it automatically makes some adjustments to the request headers it receives from the client:
-
NGINX drop empty headers. There is no point of passing along empty values to another server; it would only serve to bloat the request
-
NGINX, by default, will consider any header that contains underscores as invalid. It will remove these from the proxied request. If you wish to have NGINX interpret these as valid, you can set the
underscores_in_headers
directive toon
, otherwise your headers will never make it to the backend server. Underscores in header fields are allowed (RFC 7230, sec. 3.2.), but indeed uncommon
It is important to pass more than just the URI if you expect the upstream server handle the request properly. The request coming from NGINX on behalf of a client will look different than a request coming directly from a client.
Please read Managing request headers from the official wiki.
In NGINX does support arbitrary request header field. Last part of a variable name is the field name converted to lower case with dashes replaced by underscores:
$http_name_of_the_header_key
If you have X-Real-IP = 127.0.0.1
in header, you can use $http_x_real_ip
to get 127.0.0.1
.
Use the proxy_set_header
directive to sets headers that sends to the backend servers.
HTTP headers are used to transmit additional information between client and server.
add_header
sends headers to the client (browser) and will work on successful requests only, unless you set upalways
parameter.proxy_set_header
sends headers to the backend server. If the value of a header field is an empty string then this field will not be passed to a proxied server.
It’s also important to distinguish between request headers and response headers. Request headers are for traffic inbound to the webserver or backend app. Response headers are going the other way (in the HTTP response you get back using client, e.g. curl or browser).
Ok, so look at the following short explanation about proxy directives (for more information about valid header values please see this rule):
-
proxy_http_version
— defines the HTTP protocol version for proxying, by default it it set to 1.0. For Websockets and keepalive connections you need to use the version 1.1: -
proxy_cache_bypass
— sets conditions under which the response will not be taken from a cache:proxy_cache_bypass $http_upgrade;
-
proxy_intercept_errors
— means that any response with HTTP code 300 or greater is handled by theerror_page
directive and ensures that if the proxied backend returns an error status, NGINX will be the one showing the error page (as opposed to the error page on the backend side). If you want certain error pages still being delivered from the upstream server, then simply don’t specify theerror_page <code>
on the reverse proxy (without this, NGINX will forward the error page coming from the upstream server to the client):proxy_intercept_errors on; error_page 404 /404.html; # from proxy # To bypass error intercepting (if you have proxy_intercept_errors on): # 1 - don't specify the error_page 404 on the reverse proxy # 2 - go to the @debug location error_page 500 503 504 @debug; location @debug { proxy_intercept_errors off; proxy_pass http://backend; }
-
proxy_set_header
— allows redefining or appending fields to the request header passed to the proxied server-
Upgrade
andConnection
— these header fields are required if your application is using Websockets:proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade";
-
Host
— the$host
variable in the following order of precedence contains: host name from the request line, or host name from the Host request header field, or the server name matching a request: NGINX usesHost
header forserver_name
matching. It does not use TLS SNI. This means that for an SSL server, NGINX must be able to accept SSL connection, which boils down to having certificate/key. The cert/key can be any, e.g. self-signed:proxy_set_header Host $host;
-
X-Real-IP
— forwards the real visitor remote IP address to the proxied server:proxy_set_header X-Real-IP $remote_addr;
-
X-Forwarded-For
— is the conventional way of identifying the originating IP address of the user connecting to the web server coming from either a HTTP proxy or load balancer:proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
-
X-Forwarded-Proto
— identifies the protocol (HTTP or HTTPS) that a client used to connect to your proxy or load balancer:proxy_set_header X-Forwarded-Proto $scheme;
-
X-Forwarded-Host
— defines the original host requested by the client:proxy_set_header X-Forwarded-Host $host;
-
X-Forwarded-Port
— defines the original port requested by the client:proxy_set_header X-Forwarded-Port $server_port;
-
If you want to read about custom headers, take a look at Why we need to deprecate x prefix for HTTP headers? and this great answer by BalusC.
Importance of the Host
header
🔖 Set and pass Host header only with $host variable — Reverse Proxy — P2
The Host
header tells the webserver which virtual host to use (if set up). You can even have the same virtual host using several aliases (domains and wildcard-domains). This why the host header exists. The host header specifies which website or web application should process an incoming HTTP request.
In NGINX, $host
equals $http_host
, lowercase and without the port number (if present), except when HTTP_HOST
is absent or is an empty value. In that case, $host
equals the value of the server_name
directive of the server which processed the request.
But look at this:
An unchanged
Host
request header field can be passed with$http_host
. However, if this field is not present in a client request header then nothing will be passed. In such a case it is better to use the$host
variable — its value equals the server name in theHost
request header field or the primary server name if this field is not present.
For example, if you set Host: MASTER:8080
, $host
will be «master» (while $http_host
will be MASTER:8080
as it just reflects the whole header).
Look also at $10k host header and What is a Host Header Attack?.
Redirects and X-Forwarded-Proto
🔖 Don’t use X-Forwarded-Proto with $scheme behind reverse proxy — Reverse Proxy — P1
This header is very important because it prevent a redirect loop. When used inside HTTPS server block each HTTP response from the proxied server will be rewritten to HTTPS. Look at the following example:
- Client sends the HTTP request to the Proxy
- Proxy sends the HTTP request to the Server
- Server sees that the URL is
http://
- Server sends back 3xx redirect response telling the Client to connect to
https://
- Client sends an HTTPS request to the Proxy
- Proxy decrypts the HTTPS traffic and sets the
X-Forwarded-Proto: https
- Proxy sends the HTTP request to the Server
- Server sees that the URL is
http://
but also sees thatX-Forwarded-Proto
is https and trusts that the request is HTTPS - Server sends back the requested web page or data
This explanation comes from Purpose of the X-Forwarded-Proto HTTP Header.
In step 6 above, the Proxy is setting the HTTP header X-Forwarded-Proto: https
to specify that the traffic it received is HTTPS. In step 8, the Server then uses the X-Forwarded-Proto
to determine if the request was HTTP or HTTPS.
You can read about how to set it up correctly here:
- Set correct scheme passed in X-Forwarded-Proto
- Don’t use X-Forwarded-Proto with $scheme behind reverse proxy — Reverse Proxy — P1
A warning about the X-Forwarded-For
🔖 Set properly values of the X-Forwarded-For header — Reverse Proxy — P1
I think we should just maybe stop for a second. X-Forwarded-For
is a one of the most important header that has the security implications.
Where a connection passes through a chain of proxy servers, X-Forwarded-For
can give a comma-separated list of IP addresses with the first being the furthest downstream (that is, the user).
The HTTP X-Forwarded-For
accepts two directives as mentioned above and described below:
<client>
— it is the IP address of the client<proxy>
— it is the proxies that request has to go through. If there are multiple proxies then the IP addresses of each successive proxy is listed
Syntax:
X-Forwarded-For: <client>, <proxy1>, <proxy2>
X-Forwarded-For
should not be used for any Access Control List (ACL) checks because it can be spoofed by attackers. Use the real IP address for this type of restrictions. HTTP request headers such as X-Forwarded-For
, True-Client-IP
, and X-Real-IP
are not a robust foundation on which to build any security measures, such as access controls.
Set properly values of the X-Forwarded-For header (from this handbook) — see this for more detailed information on how to set properly values of the X-Forwarded-For
header.
But that’s not all. Behind a reverse proxy, the user IP we get is often the reverse proxy IP itself. If you use other HTTP server working between proxy and app server you should also set the correct mechanism for interpreting values of this header.
I recommend to read this amazing explanation by Nick M.
-
Pass headers from proxy to the backend layer:
- Always pass Host, X-Real-IP, and X-Forwarded headers to the backend
- Set properly values of the X-Forwarded-For header (from this handbook)
-
NGINX (backend) — modify the
set_real_ip_from
andreal_ip_header
directives:For this, the
http_realip_module
must be installed (--with-http_realip_module
).First of all, you should add the following lines to the configuration:
# Add these to the set_real_ip.conf, there are the real IPs where your traffic # is coming from (front proxy/lb): set_real_ip_from 192.168.20.10; # IP address of master set_real_ip_from 192.168.20.11; # IP address of slave # You can also add an entire subnet: set_real_ip_from 192.168.40.0/24; # Defines a request header field used to send the address for a replacement, # in this case we use X-Forwarded-For: real_ip_header X-Forwarded-For; # The real IP from your client address that matches one of the trusted addresses # is replaced by the last non-trusted address sent in the request header field: real_ip_recursive on; # Include it to the appropriate context: server { include /etc/nginx/set_real_ip.conf; ... }
-
NGINX — add/modify and set log format:
log_format combined-1 '$remote_addr forwarded for $http_x_real_ip - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"'; # or: log_format combined-2 '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/example.com/access.log combined-1;
This way, e.g. the
$_SERVER['REMOTE_ADDR']
will be correctly filled up in PHP fastcgi. You can test it with the following script:# tls_check.php <?php echo '<pre>'; print_r($_SERVER); echo '</pre>'; exit; ?>
And send request to it:
curl -H Cache-Control: no-cache -ks https://example.com/tls-check.php?${RANDOM} | grep "HTTP_X_FORWARDED_FOR|HTTP_X_REAL_IP|SERVER_ADDR|REMOTE_ADDR" [HTTP_X_FORWARDED_FOR] => 172.217.20.206 [HTTP_X_REAL_IP] => 172.217.20.206 [SERVER_ADDR] => 192.168.10.100 [REMOTE_ADDR] => 192.168.10.10
Improve extensibility with Forwarded
Since 2014, the IETF has approved a standard header definition for proxy, called Forwarded
, documented here [IETF] and here that should be use instead of X-Forwarded
headers. This is the one you should use reliably to get originating IP in case your request is handled by a proxy. Official NGINX documentation also gives you how to Using the Forwarded header.
In general, the proxy headers (Forwarded
or X-Forwarded-For
) are the right way to get your client IP only when you are sure they come to you via a proxy. If there is no proxy header or no usable value in, you should default to the REMOTE_ADDR
server variable.
Response headers
🔖 Set the HTTP headers with add_header and proxy_*_header directives properly — Base Rules — P1
add_header
directive allows you to define an arbitrary response header (mostly for informational/debugging purposes) and value to be included in all response codes which are equal to:
- 2xx series: 200, 201, 204, 206
- 3xx series: 301, 302, 303, 304, 307, 308
For example:
add_header Custom-Header Value;
To change (adding or removing) existing headers you should use a headers-more-nginx-module module.
There is one thing you must watch out for if you use add_header
directive (also applies to proxy_*_header
directives). See the following explanations:
- Nginx add_header configuration pitfall
- Be very careful with your add_header in Nginx! You might make your site insecure
This situation is described in the official documentation:
There could be several
add_header
directives. These directives are inherited from the previous level if and only if there are noadd_header
directives defined on the current level.
However — and this is important — as you now have defined a header in your server
context, all the remaining headers defined in the http
context will no longer be inherited. Means, you’ve to define them in your server
context again (or alternatively ignore them if they’re not important for your site).
At the end, summary about directives to manipulate headers:
proxy_set_header
is to sets or remove a request header (and pass it or not to the backend)add_header
is to add header to responseproxy_hide_header
is to hide a response header
We also have the ability to manipulate request and response headers using the headers-more-nginx-module module:
more_set_headers
— replaces (if any) or adds (if not any) the specified output headersmore_clear_headers
— clears the specified output headersmore_set_input_headers
— very much likemore_set_headers
except that it operates on input headers (or request headers)more_clear_input_headers
— very much likemore_clear_headers
except that it operates on input headers (or request headers)
The following figure describes the modules and directives responsible for manipulating HTTP request and response headers:
Load balancing algorithms
Load Balancing is in principle a wonderful thing really. You can find out about it when you serve tens of thousands (or maybe more) of requests every second. Of course, load balancing is not the only reason — think also about maintenance tasks without downtime.
Generally load balancing is a technique used to distribute the workload across multiple computing resources and servers. I think you should always use this technique also if you have a simple app or whatever else what you’re sharing with other.
The configuration is very simple. NGINX includes a ngx_http_upstream_module
to define backends (groups of servers or multiple server instances). More specifically, the upstream
directive is responsible for this.
upstream
defines the load balancing pool, only provide a list of servers, some kind of weight, and other parameters related to the backend layer.
Backend parameters
🔖 Tweak passive health checks — Load Balancing — P3
🔖 Don’t disable backends by comments, use down parameter — Load Balancing — P4
Before we start talking about the load balancing techniques you should know something about server
directive. It defines the address and other parameters of a backend servers.
This directive accepts the following options:
-
weight=<num>
— sets the weight of the origin server, e.g.weight=10
-
max_conns=<num>
— limits the maximum number of simultaneous active connections from the NGINX proxy server to an upstream server (default value:0
= no limit), e.g.max_conns=8
- if you set
max_conns=4
the 5th will be rejected - if the server group does not reside in the shared memory (
zone
directive), the limitation works per each worker process
- if you set
-
max_fails=<num>
— the number of unsuccessful attempts to communicate with the backend (default value:1
,0
disables the accounting of attempts), e.g.max_fails=3;
-
fail_timeout=<time>
— the time during which the specified number of unsuccessful attempts to communicate with the server should happen to consider the server unavailable (default value:10 seconds
), e.g.fail_timeout=30s;
-
zone <name> <size>
— defines shared memory zone that keeps the group’s configuration and run-time state that are shared between worker processes, e.g.zone backend 32k;
-
backup
— if server is marked as a backup server it does not receive requests unless both of the other servers are unavailable -
down
— marks the server as permanently unavailable
Upstream servers with SSL
Setting up SSL termination on NGINX is also very simple using the SSL module. For this you need to use upstream module, and proxy module also. A very good case study is also given here.
For more information please read Securing HTTP Traffic to Upstream Servers from the official documentation.
Round Robin
It’s the simpliest load balancing technique. Round Robin has the list of servers and forwards each request to each server from the list in order. Once it reaches the last server, the loop again jumps to the first server and start again.
upstream bck_testing_01 { # with default weight for all (weight=1) server 192.168.250.220:8080; server 192.168.250.221:8080; server 192.168.250.222:8080; }
Weighted Round Robin
In Weighted Round Robin load balancing algorithm, each server is allocated with a weight based on its configuration and ability to process the request.
This method is similar to the Round Robin in a sense that the manner by which requests are assigned to the nodes is still cyclical, albeit with a twist. The node with the higher specs will be apportioned a greater number of requests.
upstream bck_testing_01 { server 192.168.250.220:8080 weight=3; server 192.168.250.221:8080; # default weight=1 server 192.168.250.222:8080; # default weight=1 }
Least Connections
This method tells the load balancer to look at the connections going to each server and send the next connection to the server with the least amount of connections.
upstream bck_testing_01 { least_conn; # with default weight for all (weight=1) server 192.168.250.220:8080; server 192.168.250.221:8080; server 192.168.250.222:8080; }
For example: if clients D10, D11 and D12 attempts to connect after A4, C2 and C8 have already disconnected but A1, B3, B5, B6, C7 and A9 are still connected, the load balancer will assign client D10 to server 2 instead of server 1 and server 3. After that, client D11 will be assign to server 1 and client D12 will be assign to server 2.
Weighted Least Connections
This is, in general, a very fair distribution method, as it uses the ratio of the number of connections and the weight of a server. The server in the cluster with the lowest ratio automatically receives the next request.
upstream bck_testing_01 { least_conn; server 192.168.250.220:8080 weight=3; server 192.168.250.221:8080; # default weight=1 server 192.168.250.222:8080; # default weight=1 }
For example: if clients D10, D11 and D12 attempts to connect after A4, C2 and C8 have already disconnected but A1, B3, B5, B6, C7 and A9 are still connected, the load balancer will assign client D10 to server 2 or 3 (because they have a least active connections) instead of server 1. After that, client D11 and D12 will be assign to server 1 because it has the biggest weight
parameter.
IP Hash
The IP Hash method uses the IP of the client to create a unique hash key and associates the hash with one of the servers. This ensures that a user is sent to the same server in future sessions (a basic kind of session persistence) except when this server is unavailable. If one of the servers needs to be temporarily removed, it should be marked with the down
parameter in order to preserve the current hashing of client IP addresses.
This technique is especially helpful if actions between sessions has to be kept alive e.g. products put in the shopping cart or when the session state is of concern and not handled by shared memory of the application.
upstream bck_testing_01 { ip_hash; # with default weight for all (weight=1) server 192.168.250.220:8080; server 192.168.250.221:8080; server 192.168.250.222:8080; }
Generic Hash
This technique is very similar to the IP Hash but for each request the load balancer calculates a hash that is based on the combination of a text string, variable, or a combination you specify, and associates the hash with one of the servers.
upstream bck_testing_01 { hash $request_uri; # with default weight for all (weight=1) server 192.168.250.220:8080; server 192.168.250.221:8080; server 192.168.250.222:8080; }
For example: load balancer calculate hash from the full original request URI (with arguments). Clients A4, C7, C8 and A9 sends requests to the /static
location and will be assign to server 1. Similarly clients A1, C2, B6 which get /sitemap.xml
resource they will be assign to server 2. Clients B3 and B5 sends requests to the /api/v4
and they will be assign to server 3.
Other methods
It is similar to the Generic Hash method because you can also specify a unique hash identifier but the assignment to the appropriate server is under your control. I think it’s a somewhat primitive method and I wouldn’t say it is a full load balancing technique, but in some cases it is very useful.
Mainly this helps reducing the mess on the configuration made by a lot of
location
blocks with similar configurations.
First of all, create a map:
map $request_uri $bck_testing_01 { default "192.168.250.220:8080"; /api/v4 "192.168.250.220:8080"; /api/v3 "192.168.250.221:8080"; /static "192.168.250.222:8080"; /sitemap.xml "192.168.250.222:8080"; }
And add proxy_pass
directive:
server { ... location / { proxy_pass http://$bck_testing_01; } ... }
Rate limiting
🔖 Limit concurrent connections — Hardening — P1
**🔖 Use limit_conn to improve limiting the download speed — Performance — P3
NGINX has a default module to setup rate limiting. For me, it’s one of the most useful protect feature but sometimes really hard to understand.
I think, in case of doubt, you should read up on the following documents:
- Rate Limiting with NGINX and NGINX Plus
- NGINX rate-limiting in a nutshell
- NGINX Rate Limiting
- How to protect your web site from HTTP request flood, DoS and brute-force attacks
Rate limiting rules are useful for:
- traffic shaping
- traffic optimising
- slow down the rate of incoming requests
- protect http requests flood
- protect against slow http attacks
- prevent consume a lot of bandwidth
- mitigating ddos attacks
- protect brute-force attacks
Variables
NGINX has following variables (unique keys) that can be used in a rate limiting rules. For example:
VARIABLE | DESCRIPTION |
---|---|
$remote_addr |
client address |
$binary_remote_addr |
client address in a binary form, it is smaller and saves space then remote_addr |
$server_name |
name of the server which accepted a request |
$request_uri |
full original request URI (with arguments) |
$query_string |
arguments in the request line |
Please see official documentation for more information about variables.
Directives, keys, and zones
NGINX also provides following keys:
KEY | DESCRIPTION |
---|---|
limit_req_zone |
stores the current number of excessive requests |
limit_conn_zone |
stores the maximum allowed number of connections |
And directives:
DIRECTIVE | DESCRIPTION |
---|---|
limit_req |
in combination with a limit_conn_zone sets the shared memory zone and the maximum burst size of requests |
limit_conn |
in combination with a limit_req_zone sets the shared memory zone and the maximum allowed number of (simultaneous) connections to the server per a client IP |
Keys are used to store the state of each IP address and how often it has accessed a limited object. This information are stored in shared memory available from all NGINX worker processes.
You can enable the dry run mode with
limit_req_dry_run on;
. In this mode, requests processing rate is not limited, however, in the shared memory zone, the number of excessive requests is accounted as usual.
Both keys also provides response status parameters indicating too many requests or connections with specific http code (default 503).
limit_req_status <value>
limit_conn_status <value>
For example, if you want to set the desired logging level for cases when the server limits the number of connections:
# Add this to http context: limit_req_status 429; # Set your own error page for 429 http code: error_page 429 /rate_limit.html; location = /rate_limit.html { root /usr/share/www/http-error-pages/sites/other; internal; }
And create this file:
cat > /usr/share/www/http-error-pages/sites/other/rate_limit.html << __EOF__ HTTP 429 Too Many Requests __EOF__
Rate limiting rules also have zones that lets you define a shared space in which to count the incoming requests or connections.
All requests or connections coming into the same space will be counted in the same rate limit. This is what allows you to limit per URL, per IP, or anything else. In HTTP/2 and SPDY, each concurrent request is considered a separate connection.
The zone has two required parts:
<name>
— is the zone identifier<size>
— is the zone size
Example:
<key> <variable> zone=<name>:<size>;
State information for about 16,000 IP addresses takes 1 megabyte. So 1 kilobyte zone has 16 IP addresses.
The range of zones is as follows:
-
http context
-
server context
server { ... zone=<name>;
-
location directive
location /api { ... zone=<name>;
All rate limiting rules (definitions) should be added to the NGINX
http
context.
Remember also about this answer:
If your are loading a website, you are not loading only this site, but assets as well. Nginx will think of them as independent connections. You have 10r/s defined and a burst size of 5. Therefore after 10 Requests/s the next requests will be delayed for rate limiting purposes. If the burst size (5) gets exceeded the following requests will receive a 503 error.
limit_req_zone
key lets you set rate
parameter (optional) — it defines the rate limited URL(s).
See also examples (all comes from this handbook):
- Limiting the rate of requests with burst mode
- Limiting the rate of requests with burst mode and nodelay
- Limiting the rate of requests per IP with geo and map
- Limiting the number of connections
Burst and nodelay parameters
For enable queue you should use limit_req
or limit_conn
directives (see above). limit_req
also provides optional parameters:
PARAMETER | DESCRIPTION |
---|---|
burst=<num> |
sets the maximum number of excessive requests that await to be processed in a timely manner; maximum requests as rate * burst in burst seconds |
nodelay |
it imposes a rate limit without constraining the allowed spacing between requests; default NGINX would return 503 response and not handle excessive requests |
nodelay
parameters are only useful when you also set aburst
.
Without nodelay
NGINX would wait (no 503 response) and handle excessive requests with some delay.
NAXSI Web Application Firewall
- NAXSI
- NAXSI, a web application firewall for Nginx
NAXSI is an open-source, high performance, low rules maintenance WAF for NGINX and is usually referred to as a Positive model application Firewall. It is an open-source WAF (Web Application Firewall), providing high performances, and low rules maintenance Web Application Firewall module.
OWASP ModSecurity Core Rule Set (CRS)
- OWASP Core Rule Set
- OWASP Core Rule Set — Official documentation
The OWASP ModSecurity Core Rule Set (CRS) is a set of generic attack detection rules for use with ModSecurity or compatible web application firewalls. The CRS aims to protect web applications from a wide range of attacks, including the OWASP Top Ten, with a minimum of false alerts.
Core modules
ngx_http_geo_module
Documentation:
ngx_http_geo_module
This module makes available variables, whose values depend on the IP address of the client. When combined with GeoIP module allows for very elaborate rules serving content according to the geolocation context.
By default, the IP address used for doing the lookup is $remote_addr
, but it is possible to specify an another variable.
If the value of a variable does not represent a valid IP address then the
255.255.255.255
address is used.
Performance
Look at this (from official documentation):
Since variables are evaluated only when used, the mere existence of even a large number of declared
geo
variables does not cause any extra costs for request processing.
This module (watch out: don’t mistake this module for the GeoIP) builds in-memory radix tree when loading configs. This is the same data structure as used in routing, and lookups are really fast. If you have many unique values per networks, then this long load time is caused by searching duplicates of data in array. Otherwise, it may be caused by insertions to a radix tree.
Examples
See Use geo/map modules instead of allow/deny from this handbook.
# The variable created is $trusted_ips: geo $trusted_ips { default 0; 192.0.0.0/24 0; 8.8.8.8 1; } server { if ( $trusted_ips = 1 ) { return 403; } ... }
If the value of a variable does not represent a valid IP address then the
255.255.255.255
address is used.
You can also test IP ranges, for example:
# Create geo-ranges.conf: 127.0.0.0-127.255.255.255 loopback; # Add geo definition: geo $geo_ranges { ranges; default default; include geo-ranges.conf; 10.255.0.0-10.255.255.255 internal; }
3rd party modules
Not all external modules can work properly with your currently NGINX version. You should read the documentation of each module before adding it to the modules list. You should also to check what version of module is compatible with your NGINX release. What’s more, be careful before adding modules on production. Some of them can cause strange behaviors, increased memory and CPU usage, and also reduce the overall performance of NGINX.
Before installing external modules please read Event-Driven architecture section to understand why poor quality 3rd party modules may reduce NGINX performance.
If you have running NGINX on your server, and if you want to add new modules, you’ll need to compile them against the same version of NGINX that’s currently installed (
nginx -v
) and to make new module compatible with the existing NGINX binary, you need to use the same compile flags (nginx -V
). For more please see How to Compile Dynamic NGINX Modules.
If you use, e.g.
--with-stream=dynamic
, then all thosestream_xxx
modules must also be built as NGINX dynamic modules. Otherwise you would definitely see those linker errors.
ngx_set_misc
Documentation:
ngx_set_misc
ngx_http_geoip_module
Documentation:
ngx_http_geoip_module
ngx_http_geoip2_module
This module allows real-time queries against the Max Mind GeoIP database. It uses the old version of API, still very common on OS distributions. For using the new version of GeoIP API, see geoip2 module.
The Max Mind GeoIP database is a map of IP network address assignments to geographical locales that can be useful — though approximate — in identifying the physical location with which an IP host address is associated on a relatively granular level.
Performance
The GeoIP module sets multiple variables and by default NGINX parses and loads geoip data into memory once the config file only on (re)start or SIGHUP.
GeoIP lookups come from a distributed database rather than from a dynamic server, so unlike DNS, the worst-case performance hit is minimal. Additionally, from a performance point of view, you should not worry, as geoip database are stored in memory (at the reading configuration phase) and NGINX doing lookups very fast.
GeoIP module creates (and assigns values to) variables based on the IP address of the request client and one of Maxmind GeoIP databases. One of the common uses is to set the country of the end-user as a NGINX variable.
Variables in NGINX are evaluated only on demand. If $geoip_*
variable was not used during the request processing, then geoip db was not lookuped. So, if you don’t call the geoip variable on your app the geoip module wont be executed at all. The only inconvenience of using really large geobases is config reading time.
Examples
See Restricting access by geographical location from this handbook.
Nginx HTTP server has a phenomenal logging facility which is highly customizable. In this article, we will explain how to configure you own formats for access and error logs for Nginx in Linux.
The aim of this guide is to help you understand how logs are generated, so as to configure custom log formats for purposes of debugging, troubleshooting or analysis of what unfolds within your web server as well as web applications (such as tracing requests).
Read Also: 4 Good Open Source Log Monitoring and Management Tools for Linux
This article is made of three sections which will enlighten you about configuring access/error logs and how to enable conditional logging in Nginx.
Configuring Access Logs in Nginx
Under Nginx, all client requests to the server are recored in the access log in a specified format using the ngx_http_log_module module.
The default log file is log/access.log (usually /var/log/nginx/access_log on Linux systems) and the default format for logging is normally the combined or main format (this can vary from one distro to another).
The access_log directive (applicable in the http, server, location, if in location and limit except context) is used to set the log file and the log_format directive (applicable under the http context only) is used to set the log format. The log format is described by common variables, and variables that generated only at the time when a log is written.
The syntax for configuring a log format is:
log_format format_name 'set_of_variables_to_define_format';
and the syntax for configuring access log is:
access_log /path/to/log_file format_name; #simplest form OR access_log /path/to/log_file [format [buffer=size] [gzip[=level]] [flush=time] [if=condition]];
The following is a excerpt from the default Nginx configuration file /etc/nginx/nginx.conf on CentOS 7.
/etc/nginx/nginx.conf
http { #main log format log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log; }
This log format yields the following log entry.
127.0.0.1 - dbmanager [20/Nov/2017:18:52:17 +0000] "GET / HTTP/1.1" 401 188 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"
The following is another useful logging format which we use for tracing requests to our web applications using the some of the default variables, it most importantly has the request ID and logs client location details (country, country code, region and city).
/etc/nginx/nginx.conf
log_format custom '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' '"$http_x_forwarded_for" $request_id ' '$geoip_country_name $geoip_country_code ' '$geoip_region_name $geoip_city ';
You can use it like this:
access_log /var/log/nginx/access.log custom;
This will produce a log entry which appears like this.
153.78.107.192 - - [21/Nov/2017:08:45:45 +0000] "POST /ngx_pagespeed_beacon?url=https%3A%2F%2Fwww.example.com%2Fads%2Ffresh-oranges-1509260795 HTTP/2.0" 204 0 "https://www.suasell.com/ads/fresh-oranges-1509260795" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0" "-" a02b2dea9cf06344a25611c1d7ad72db Uganda UG Kampala Kampala
You can specify several logs using the access_log directives on the same level, here we are using more than one log file in the http context.
/etc/nginx/nginx.conf
http{ ##default log format log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; ##request tracing using custom format log_format custom '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' '"$http_x_forwarded_for" $request_id ' '$geoip_country_name $geoip_country_code ' '$geoip_region_name $geoip_city '; ##this uses the default log format access_log /var/log/nginx/access.log; ##this uses the our custom log format access_log /var/log/nginx/custom_log custom; }
The following are more advanced logging configurations examples, which are useful for log formats that contain compression-related variables and for creating compressed log files:
access_log /var/log/nginx/custom_log custom buffer 32k; access_log /path/to/log.gz compression gzip flush=5m;
Configuring Error Logs in Nginx
In case Nginx experiences any glitches, it records information concerning them in the error log. These issues fall under different severity levels: debug, info, notice, warn, error (this is the default level and works globally), crit, alert, or emerg.
The default log file is log/error.log, but it is normally located in /var/log/nginx/ on Linux distributions. The error_log directive is used to specify the log file, and it can be used in the main, http, mail, stream, server, location context (in that order).
You should also note that:
- Configurations in the main context are always inherited by lower levels in the order above.
- and configurations in the lower levels override the configurations inherited from the higher levels.
You can configure error logging using the following syntax:
error_log /path/to/log_file log_level;
For example:
error_log /var/log/nginx/error_log warn;
This will instruct Nginx to log all messages of type warn and more severe log level crit, alert, and emerg messages.
In the next example, messages of crit, alert, and emerg levels will be logged.
error_log /var/www/example1.com/log/error_log crit;
Consider the configuration below, here, we have defined error logging on different levels (in the http and server context). In case of an error, the message is written to only one error log, the one closest to the level where the error has appeared.
/etc/nginx/nginx.conf
http { log_format compression '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" "$gzip_ratio"'; error_log /var/log/nginx/error_log crit; server { listen 80; server_name example1.com; #this logs errors messages for example1.com only error_log /var/log/nginx/example1.error_log warn; …... } server { listen 80; server_name example2.com; #this logs errors messages for example2.com only error_log /var/log/nginx/example1.error_log; ……. } }
If you use more than one error_log directives as in the configuration below (same level), the messages are written to all specified logs.
/etc/nginx/nginx.conf
server { listen 80; server_name example1.com; error_log /var/www/example1.com/log/error_log warn; error_log /var/log/nginx/example1.error_log crit; …... }
Configuring Conditional Logging in Nginx
In some cases, we may want Nginx to perform conditional logging of messages. Not every message has to be logged by Nginx, therefore we can ignore insignificant or less important log entries from our access logs for particular instances.
We can use the ngx_http_map_module module which creates variables whose values depend on values of other variables. The parameters inside a map block (which should exist in the http content only) specify a mapping between source and resulting values.
For this kind of setting, a request will not be logged if the condition evaluates to “0”
or an empty string. This example excludes requests with HTTP status codes 2xx and 3xx.
/etc/nginx/nginx.conf
http{ map $status $condition { ~^[23] 0; default 1; } server{ access_log /path/to/access.log custom if=$condition; } }
Here is another useful example for debugging a web application in a development phase. This will ignore all messages and only log debug information.
/etc/nginx/nginx.conf
http{ map $info $debuggable { default 0; debug 1; } server{ …….. access_log /var/log/nginx/testapp_debug_access_log debug if=$debuggable; #logs other requests access_log /var/log/nginx/testapp_access.log main; ……. } }
You can find out more information, including logging to syslog here.
That’s all for now! In this guide, we explained how to configure custom logging format for access and error logs in Nginx. Use the feedback form below to ask questions or share you thoughts about this article.