09 ELK 收集Nginx访问日志实战案例


Nginx的日志格式与日志变量

  1. Nginx跟Apache一样,都支持自定义输出日志格式,在进行Nginx日志格式定义前,先来了解一下关于多层代理获取用户真实IP的几个概念。
  2. remote_addr:表示客户端地址,但有个条件,如果没有使用代理,这个地址就是客户端的真实IP,如果使用了代理,这个地址就是上层代理的IP。
  3. X-Forwarded-For:简称XFF,这是一个HTTP扩展头,格式为 X-Forwarded-For: client, proxy1, proxy2,
  4. 如果一个HTTP请求到达服务器之前,经过了三个代理 Proxy1、Proxy2、Proxy3,IP 分别为 IP1、IP2、IP3,用户真实IP为 IP0,
  5. 那么按照 XFF标准,服务端最终会收到以下信息:X-Forwarded-For: IP0, IP1, IP2
  6. 由此可知,IP3这个地址X-Forwarded-For并没有获取到,而remote_addr刚好获取的就是IP3的地址。
1
2
3
1. $remote_addr:此变量如果走代理访问,那么将获取上层代理的IP,如果不走代理,那么就是客户端真实IP地址。
2. $http_x_forwarded_for:此变量获取的就是X-Forwarded-For的值。
3. $proxy_add_x_forwarded_for:此变量是$http_x_forwarded_for和$remote_addr两个变量之和。

自定义Nginx日志格式

  1. 掌握了Nginx日志变量的含义后,接着开始对它输出的日志格式进行改造,这里我们仍将Nginx日志输出设置为json格式,
  2. 下面仅列出Nginx配置文件nginx.conf中日志格式和日志文件定义部分,定义好的日志格式与日志文件如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 当 $http_x_forwarded_for == "" 时,就把$remote_addr 赋值给$clientRealIp
# 当不为空时,~^(?P<firstAddr>[0-9\.]+),?.*$ 匹配输出 取出第一个IP值 交给$firstAddr再交给$clientRealIp
# accessip_list 此变量是$http_x_forwarded_for和$remote_addr两个变量之和
# client_ip map里面截取的真实IP


[root@filebeat1 nginx]# vim nginx.conf
...
http {
...
map $http_x_forwarded_for $clientRealIp {
"" $remote_addr;
~^(?P<firstAddr>[0-9\.]+),?.*$ $firstAddr;
}


log_format nginx_log_json '{"accessip_list":"$proxy_add_x_forwarded_for","client_ip":"$clientRealIp","http_host":"$host","@timestamp":"$time_iso8601","method":"$request_method","url":"$request_uri","status":"$status","http_referer":"$http_referer","body_bytes_sent":"$body_bytes_sent","request_time":"$request_time","http_user_agent":"$http_user_agent","total_bytes_sent":"$bytes_sent","server_ip":"$server_addr"}';
access_log /var/log/nginx/access.log nginx_log_json;
...

2层的代理配置

1
2
3
本机      172.17.70.235
代理1 172.17.70.236
代理2 172.17.70.230
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
[root@filebeat1 conf.d]# vim /etc/nginx/conf.d/test.conf 

server {
listen 80;
server_name 172.17.70.235;
root /data/;
index index.html;
}

[root@filebeat1 conf.d]# systemctl start nginx

# 配置1级代理 172.17.70.236

[root@logstash nginx]# vim conf.d/proxy.conf

upstream test {
server 172.17.70.235:80;
}

server {
listen 80;
server_name 172.17.70.236;
location / {
proxy_pass http://test;
include proxy_params;
}
}

[root@logstash nginx]# vim proxy_params

proxy_redirect default;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_connect_timeout 60;
proxy_send_timeout 60;
proxy_read_timeout 60;

proxy_buffer_size 32k;
proxy_buffering on;
proxy_buffers 4 128k;
proxy_busy_buffers_size 256k;
proxy_max_temp_file_size 256k;

[root@logstash nginx]# curl 172.17.70.236
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 2级代理 172.17.70.230

[root@server2 conf.d]# vim proxy.conf

upstream test {
server 172.17.70.236:80;
}

server {
listen 80;
server_name 172.17.70.230;
location / {
proxy_pass http://test;
include proxy_params;
}
}

[root@server2 conf.d]# curl 172.17.70.230/index.html

观察日志

1
2
3
4
5
[root@filebeat1 nginx]# tail -200 /var/log/nginx/access.log 
{"accessip_list":"172.17.70.235","client_ip":"172.17.70.235","http_host":"172.17.70.235","@timestamp":"2019-10-28T15:05:39+08:00","method":"GET","url":"/","status":"200","http_referer":"-","body_bytes_sent":"11","request_time":"0.000","http_user_agent":"curl/7.29.0","total_bytes_sent":"246","server_ip":"172.17.70.235"}
{"accessip_list":"172.17.70.236, 172.17.70.236","client_ip":"172.17.70.236","http_host":"172.17.70.236","@timestamp":"2019-10-28T15:05:45+08:00","method":"GET","url":"/","status":"200","http_referer":"-","body_bytes_sent":"11","request_time":"0.000","http_user_agent":"curl/7.29.0","total_bytes_sent":"241","server_ip":"172.17.70.235"}
{"accessip_list":"172.17.70.230, 172.17.70.230, 172.17.70.236","client_ip":"172.17.70.230","http_host":"172.17.70.230","@timestamp":"2019-10-28T15:05:51+08:00","method":"GET","url":"/index.html","status":"200","http_referer":"-","body_bytes_sent":"11","request_time":"0.000","http_user_agent":"curl/7.29.0","total_bytes_sent":"241","server_ip":"172.17.70.235"}
{"accessip_list":"61.51.152.214, 172.17.70.230, 172.17.70.236","client_ip":"61.51.152.214","http_host":"60.205.217.112","@timestamp":"2019-10-28T15:06:59+08:00","method":"GET","url":"/","status":"304","http_referer":"-","body_bytes_sent":"0","request_time":"0.000","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36","total_bytes_sent":"173","server_ip":"172.17.70.235"}

  1. 在这个输出中,可以看到,client_ip和accessip_list输出的异同,client_ip字段输出的就是真实的客户端IP地址,
  2. 而accessip_list输出是代理叠加而成的IP列表,
    • 第一条日志,是直接访问http://1172.17.70.235 不经过任何代理得到的输出日志,
    • 第二条日志,是经过一层代理访问172.17.70.236 而输出的日志,
    • 第三条日志,是经过二层代理访问172.17.70.230 得到的日志输出。
    • 最后是通过本地公网访问。
  3. Nginx中获取客户端真实IP的方法很简单,无需做特殊处理,这也给后面编写logstash的事件配置文件减少了很多工作量。

配置 filebeat

  1. filebeat是安装在Nginx服务器上的
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 将之前的apache修改一下即可
# 一个事监控文件路径改成 nginx
# 一个事topic名称 nginxlogs

[root@filebeat1 filebeat]# vim filebeat.yml

filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
fields:
log_topic: nginxlogs
output.kafka:
enabled: true
hosts: ["172.17.70.232:9092", "172.17.70.232:9092", "172.17.70.232:9092"]
version: "0.10"
topic: '%{[fields][log_topic]}'
partition.round_robin:
reachable_only: true
worker: 2
required_acks: 1
compression: gzip
max_message_bytes: 10000000
#processors:
#- drop_fields:
# fields: ["beat", "input", "source", "offset","prospector"]
logging.level: debug
name: "172.17.70.235"

kafka 测试接收

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 启动filebeat
# filebeat也会增加数据,没有过滤的话 只能在logstash里面过滤
[root@filebeat1 filebeat]# ./filebeat -e -c filebeat.yml

# 查看topic
[root@kafkazk1 kafka]# bin/kafka-topics.sh --zookeeper 172.17.70.232:2181,172.17.70.233:2181,172.17.70.234:2181 --list
__consumer_offsets
apachelogs
nginxlogs
osmessages

# 消费数据
# 刷新下公网页面 2级代理访问
[root@kafkazk1 kafka]# bin/kafka-console-consumer.sh --zookeeper 172.17.70.232:2181,172.17.70.233:2181,172.17.70.234:2181 --topic nginxlogs

配置 logstash

  1. 由于在Nginx输出日志中已经定义好了日志格式,因此在logstash中就不需要对日志进行过滤和分析操作了,
  2. 下面直接给出logstash事件配置文件kafka_nginx_into_es.conf的内容:
  3. apache需要自己定义获取真实客户端IP,Nginx通过map实现了。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
[root@logstash logstash]# vim kafka_nginx_into_es.conf

input {
kafka {
bootstrap_servers => "172.17.70.232:9092,172.17.70.233:9092,172.17.70.234:9092"
topics => "nginxlogs"
group_id => "logstash"
codec => json {
charset => "UTF-8"
}
add_field => { "[@metadata][myid]" => "nginxaccess_log" }
}
}

filter {
if [@metadata][myid] == "nginxaccess_log" {
mutate {
# 将message字段内容中UTF-8单字节编码做替换处理,这是为了应对URL有中文出现的情况。
gsub => ["message", "\\x", "\\\x"]
}
# 如果message字段中有HEAD请求,就删除此条信息。
if ( 'method":"HEAD' in [message] ) {
drop {}
}
json {
source => "message"
remove_field => "prospector"
remove_field => "beat"
remove_field => "source"
remove_field => "input"
remove_field => "offset"
remove_field => "fields"
remove_field => "host"
remove_field => "@version"
remove_field => "message"
}
}
}

output {
if [@metadata][myid] == "nginxaccess_log" {
stdout {
codec => "rubydebug"
}
}
}
1
2
[root@logstash logstash]# bin/logstash -f kafka_nginx_into_es.conf 
# 访问 产生日志

输出到 ES集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 用于判断,跟上面input中[@metadata][myid]对应,当有多个输入源的时候,
# 可根据不同的标识,指定到不同的输出地址。
# 指定输出到elasticsearch,并指定elasticsearch集群的地址。
# 指定nginx日志在elasticsearch中索引的名称,这个名称会在Kibana中用到。
# 索引的名称推荐以logstash开头,后面跟上索引标识和时间。

output {
if [@metadata][myid] == "nginxaccess_log" {

elasticsearch {
hosts => ["172.17.70.229:9200","172.17.70.230:9200","172.17.70.231:9200"]
index => "logstash_nginxlogs-%{+YYYY.MM.dd}"
}
}
}

[root@logstash logstash]# bin/logstash -f kafka_nginx_into_es.conf

# 刷新下数据
# http://60.205.217.112/
# 去配置索引
http://60.205.217.112:5601/
1
2
3
# 后台运行
[root@logstashserver ~]# cd /usr/local/logstash
[root@logstash logstash]# nohup bin/logstash -f kafka_nginx_into_es.conf &