系统日志采集方法特征

  • 构建应用系统和分析系统的桥梁,并将它们之间的关联解耦。
  • 支持近实时的在线分析系统和分布式并发的离线分析系统。
  • 具有高可扩展性,也就是说,当数据量增加时,可以通过增加节点进行水平扩展。

常用的系统日志采集系统

  • Hadoop的Chukwa
  • Apache Flume
  • Facebook的Scribe
  • LinkedIn的Kafka

Flume

  • 基本概念
    Flume是一个高可用的、高可靠的、分布式的海量日志采集、聚合和传输系统。Flume支持在日志系统中定制各类数据发送方,用于收集数据,同时,Flume提供对数据进行简单处理,并写到各种数据接收方(如文本、HDFS、HBase等)的能力
  • 核心
    Flume的核心是把数据从数据源(Source)收集过来,再将收集到的数据送到指定的目的地(Sink)。为了保证输送的过程一定成功,在送到目的地之前,会先缓存数据到管道(Channel),待数据真正到达目的地之后,Flume再删除缓存的数据。
  • 事件
    Flume的数据流由事件(Event)贯穿始终,事件是将传输的数据进行封装而得到的,是Flume传输数据的基本单位。如果是文本文件,事件通常是一行记录。事件携带日志数据并携带头信息,这些事件由Agent外部的数据源生成,当Source捕获事件后会进行特定的格式化,然后Source会把事件推入(单个或多个)Channel中。Channel可以看作是一个缓冲区,它将保存事件直到Sink处理完该事件。Sink负责持久化日志或者把事件推向另一个Source。
  • 安装
    1)下载安装包apache-flume-1.7.0-bin.tar.gz,并解压缩。
    2)在/etc/profile文件中配置JAVA_HOME、FLUME_HOME,保存退出后,执行source /etc/profile命令使配置文件生效。
export JAVA_HOME=/opt/jdk1.8.0_251
export JRE_HOME=$JAVA_HOME/jre
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=./://$JAVA_HOME/lib:$JRE_HOME/libexport FLUME_HOME=/opt/apache-flume-1.7.0-bin
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$PATH:$FLUME_HOME/bin

3)验证是否安装成功
进入$FLUME_HOME/bin目录,执行flume-ng version命令可验证是否安装成功。

[root@localhost ~]# cd $FLUME_HOME/bin
[root@localhost bin]# flume-ng version
Flume 1.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
From source with checksum 0d21b3ffdc55a07e1d08875872c00523
  • 使用方法
    Flume的用法很简单,主要是编写一个用户配置文件夹。在配置文件当中描述Source、Channel与Sink的具体实现,而后运行一个Agent实例。在运行Agent实例的过程中会读取配置文件的内容,这样Flume就会采集到数据。Flume提供了大量内置的Source、Channel和Sink类型,而且不同类型的Source、Channel和Sink可以进行灵活组合。
  • 配置文件的编写原则
    1)从整体上描述Agent中Sources、Sinks、Channels所涉及的组件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

2)详细描述Agent中每一个Source、Sink与Channel的具体实现,即需要指定Source到底是什么类型的,是接收文件的、HTTP的,还是Thrift的;对于Sink,需要指定结果是输出到HDFS中,还是HBase中等;对于Channel,需要指定格式是内存、数据库,还是文件等。

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444# Describe the sink
a1.sinks.k1.type = logger# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

3)通过Channel将Source与Sink连接起来。

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4)启动Agent的shell操作

[root@localhost bin]# flume-ng agent -n a1 -c ../conf -f ../conf/example.file -Dflume.root.logger=DEBUG,console

参数说明:

  • -n 指定Agent的名称(与配置文件中代理的名字相同)
  • -c 指定Flume中配置文件的目录
  • -f 指定配置文件
  • -Dflume.root.logger=DEBUG,console 设置日志等级
    4)使用Telnet发送数据
[root@localhost bin]# telnet localhost 44444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
I am liguogang!
OK

效果如下:

[root@localhost bin]# flume-ng agent -n a1 -c ../conf -f ../conf/example.file -Dflume.root.logger=DEBUG,console
Info: Sourcing environment configuration script /opt/apache-flume-1.7.0-bin/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /opt/jdk1.8.0_251/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/opt/apache-flume-1.7.0-bin/conf:/opt/apache-flume-1.7.0-bin/lib/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application -n a1 -f ../conf/example.file
2020-07-18 22:50:51,914 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2020-07-18 22:50:51,916 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:79)] Configuration provider started
2020-07-18 22:50:51,916 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:127)] Checking file:../conf/example.file for changes
2020-07-18 22:50:51,917 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:134)] Reloading configuration file:../conf/example.file
2020-07-18 22:50:51,922 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: k1 Agent: a1
2020-07-18 22:50:51,922 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:k1
2020-07-18 22:50:51,922 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] Created context for k1: channel
2020-07-18 22:50:51,922 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:k1
2020-07-18 22:50:51,922 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:313)] Starting validation of configuration for agent: a1
2020-07-18 22:50:51,923 (conf-file-poller-0) [INFO - org.apache.flume.conf.LogPrivacyUtil.<clinit>(LogPrivacyUtil.java:51)] Logging of configuration details is disabled. To see configuration details in the log run the agent with -Dorg.apache.flume.log.printconfig=true JVM argument. Please note that this is not recommended in production systems as it may leak private information to the logfile.
2020-07-18 22:50:51,925 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:467)] Created channel c1
2020-07-18 22:50:51,928 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] Creating sink: k1 using LOGGER
2020-07-18 22:50:51,928 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:135)] Channels:c12020-07-18 22:50:51,928 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] Sinks k12020-07-18 22:50:51,928 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] Sources r12020-07-18 22:50:51,928 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [a1]
2020-07-18 22:50:51,929 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels
2020-07-18 22:50:51,933 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel c1 type memory
2020-07-18 22:50:51,935 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel c1
2020-07-18 22:50:51,936 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source r1, type netcat
2020-07-18 22:50:51,941 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k1, type: logger
2020-07-18 22:50:51,943 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel c1 connected to [r1, k1]
2020-07-18 22:50:51,946 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:137)] Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3a2e57 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
2020-07-18 22:50:51,951 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:144)] Starting Channel c1
2020-07-18 22:50:51,978 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
2020-07-18 22:50:51,979 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: c1 started
2020-07-18 22:50:51,979 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:171)] Starting Sink k1
2020-07-18 22:50:51,979 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:182)] Starting Source r1
2020-07-18 22:50:51,979 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting
2020-07-18 22:50:51,980 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:141)] Polling sink runner starting
2020-07-18 22:50:51,986 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:169)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
2020-07-18 22:50:51,986 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.source.NetcatSource.start(NetcatSource.java:190)] Source started
2020-07-18 22:50:51,986 (Thread-2) [DEBUG - org.apache.flume.source.NetcatSource$AcceptHandler.run(NetcatSource.java:270)] Starting accept handler
2020-07-18 22:51:13,466 (netcat-handler-0) [DEBUG - org.apache.flume.source.NetcatSource$NetcatSocketHandler.run(NetcatSource.java:315)] Starting connection handler
2020-07-18 22:51:21,980 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:127)] Checking file:../conf/example.file for changes
2020-07-18 22:51:35,414 (netcat-handler-0) [DEBUG - org.apache.flume.source.NetcatSource$NetcatSocketHandler.run(NetcatSource.java:327)] Chars read = 17
2020-07-18 22:51:35,416 (netcat-handler-0) [DEBUG - org.apache.flume.source.NetcatSource$NetcatSocketHandler.run(NetcatSource.java:331)] Events processed = 1
2020-07-18 22:51:37,999 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 49 20 61 6D 20 6C 69 67 75 6F 67 61 6E 67 21 0D I am liguogang!. }