개발자 모도리: ElasticSearch

레이블이 ElasticSearch인 게시물을 표시합니다. 모든 게시물 표시

2018년 5월 18일 금요일

Elastic Stack을 이용한 서버 에러 로그 대시보드 만들기

각 서버에 분산되어 있는 에러 로그를 FileBeat를 통해서 LogStash로 보내고, 데이터를 필터링 한 후 ElasticSearch에 저장한다. 저장 된 데이터를 Kibana를 통해서 시각화하여 각 서버에서 발생하고 있는 에러 로그를 통합해서 볼 수 있게 한다.

시스템 구조

설치 환경

ElasticSearch, LogStash, Kibana를 로그 집계를 위한 한 서버에 설치

CPU : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz (듀얼코어)
RAM : 4GB
OS : Ubuntu 16.04 LTS

FileBeat를 로그가 발생하고 있는 각 서버에 설치

CPU : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz (싱글코어)
RAM : 1GB
OS : Ubuntu 14.04 LTS, 16.04 LTS

사전 준비

자바 설치 (Java 8 이상 필요, Oracle JDK 1.8.0_131 버전 권장)
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install oracle-java8-set-default
자바 버전 확인
java -version

설치

ElasticSearch

설치 패키지 다운로드
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.deb
패키지 설치
sudo dpkg -i elasticsearch-6.2.4.deb
서비스(시스템 시작 시 자동 시작) 등록 / 해제
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl disable elasticsearch.service
서비스 시작 / 종료
sudo systemctl start elasticsearch.service
sudo systemctl stop elasticsearch.service

설치 확인

curl -X GET "localhost:9200/"
{
  "name" : "Cp8oag6",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "AT69_T_DTp-1qgIJlatQqA",
  "version" : {
    "number" : "6.2.4",
    "build_hash" : "f27399d",
    "build_date" : "2016-03-30T09:51:41.449Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "1.2.3",
    "minimum_index_compatibility_version" : "1.2.3"
  },
  "tagline" : "You Know, for Search"
}

Kibana

설치 패키지 다운로드
wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-amd64.deb
패키지 설치
sudo dpkg -i kibana-6.2.4-amd64.deb
서비스 등록
sudo systemctl daemon-reload
sudo systemctl enable kibana.service
서비스 시작
sudo systemctl start kibana.service
설치 확인
인터넷 브라우저에서 localhost:5601로 접속해서 kibana 화면이 나오는지 확인

LogStash

설치 패키지 다운로드
wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.deb
패키지 설치
sudo dpkg -i logstash-6.2.4.deb
서비스 등록
sudo systemctl daemon-reload
sudo systemctl enable logstash.service
서비스 시작
sudo systemctl start logstash.service

FileBeat

설치 패키지 다운로드
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-amd64.deb
패키지 설치
sudo dpkg -i filebeat-6.2.4-amd64.deb
서비스 등록
sudo systemctl daemon-reload
sudo systemctl enable filebeat.service
서비스 시작
sudo systemctl start filebeat.service

환경 설정

ElasticSearch

설정 파일 수정
sudo vi /etc/elasticsearch/elasticsearch.yml
- ElasticSearch와 binding 할 IP 주소를 설정한다. (port는 수정 안 할 경우 기본 9200)
```
network.host : 0.0.0.0
```

서비스 재시작
sudo systemctl restart elasticsearch.service

Kibana

설정 파일 수정
sudo vi /etc/kibana/kibana.yml
- Kibana와 binding할 IP 주소를 설정한다. (port는 수정 안 할 경우 기본 5601)
```
server.host : "0.0.0.0"
elasticsearch.url : "http://elasticsearch_server_address:9200"
```
서비스 재시작
sudo systemctl restart kibana.service

LogStash

설정 파일 생성
sudo vi /etc/logstash/conf.d/[원하는 파일 명].conf

input {
  beats {
    port => 5044
  }
}

filter {
  if [fields][server_name] == "api-server" {
    if [fields][log_type] == "nginx_access" {
      grok {
        match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\"\"%{DATA:[nginx][access][agent]}\""] }
        remove_field => "message"
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
        target => "[nginx][access][geoip]"
      }
    }
  }
}

output {
  if([fields][server_name] == "api-server" and [fields][log_type] == "nginx_access") {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "new-api-access-log-%{+YYYY.MM.dd}"
    }
  }
  else {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "error-log-%{+YYYY.MM.dd}"
    }
  }
}

filebeat에서 로그 데이터를 받아서 nginx_access 로그만 별도의 index에 저장하고 나머지는 기본 index를 사용해서 elasticsearch에 저장 하는 설정

서비스 재시작
sudo systemctl restart logstash.service

FileBeat

설정 파일 수정
sudo vi /etc/filebeat/filebeat.yml

filebeat.prospectors:
- type : log
  enabled: true
  paths:
    - /var/log/php7.0-fpm.log


- type : log
  enabled: true
  paths:
    - /var/log/nginx/error.log
  fields:
    server_name: api-server
    log_type: nginx_error

#output.elasticsearch:

output.logstash:
  hosts: ["logstash_server_address:5044"]

php-fpm, nginx에서 발생하는 에러 로그를 수집하기 위한 설정. 빨간색 부분은 추가 데이터를 지정하는 부분 kibana에서 server_name 별로 필터링해서 보여주고자 할 때 이런식으로 각 서버 마다 server_name을 다르게 설정하면, 원하는 서버만 필터링해서 보기 편하다. elasticsearch에 바로 데이터를 보내지 않고, logstash를 거치기 때문에 elasticsearch 설정은 주석 처리한다.

모듈 사용 설정
: 많이 사용하는 것들은 미리 module로 빼 놓았다. 지원 module 목록

nginx 모듈 사용 설정
sudo filebeat modules enable nginx
설정 된 모듈 확인
sudo filebeat modules list
초기 환경 설정
sudo filebeat setup -e

모듈 설정 파일 수정
sudo vi /etc/filebeat/modules.d/nginx.yml

- module: nginx
  # Access logs
  access:
    enabled: true
    prospector:
      fields:
        server_name: api-server
        log_type: nginx_access

서비스 재시작
sudo systemctl restart filebeat.service

Kibana

접속
웹 브라우저에 kibana_server_ip:5601 입력해서 접속

설정

데이터 불러오기

Management -> Index Partterns

+ Create Index Pattern

Index pattern 입력 위의 설정대로 따라 했다면 new-api-access-log-YYYY.MM.dd
error-log-YYYY.MM.dd

두 가지 형태의 index가 생성되었을 것이다.
pattern 입력을 한 후 Next step 버튼을 누른다.
Time filter field name 설정

원하는 데이터 뽑아 내기

Discover -> index 선택
원본 데이터 확인
원하는 데이터만 뽑아서 보기 : 보고 싶은 필드를 확인하고 add를 누르면 해당 필드만 나온다.
선택한 필드 데이터 보기
미리 설정해 놓은 filebeat에서 추가로 설정한 fields.log_type, fields.server_name와 message만 보이도록 설정했다.
해당 데이터 중 특정 값을 갖는 데이터만 보기
필터링 결과 저장하기

그래프 만들기

Visualize -> [+]
원하는 그래프 선택
Index 선택
그래프 설정
error-log 를 서버 별로 필터링해서 막대 그래프로 표현

2017년 10월 31일 화요일

EMR에서 Hadoop Job을 돌리고 ElasticSearch에 결과 출력하기

참고 사이트

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/troubleshooting.html
http://libqa.com/wiki/807

EMR 클러스터 세팅

EMR 클러스터와 ElasticSearch 서버와의 연동 테스트를 위해서 우선 EMR 클러스터 마스터 노드에 ElasticSearch를 설치해서 진행했다.

Hadoop 출력을 Elastic Search에 저장하기
참고해서 마스터 노드에 Elastic Search를 설치한다.

Elastic Search 설정 파일을 수정한다.
vi $ES_HOME/config/elasticsearch.yml

network.host: 192.168.0.1

http.port: 9200

마스터 노드 내부 IP를 입력한다.

이 IP가 Java 프로그래밍 안에 들어가야 한다.

conf.set("es.nodes","192.168.0.1:9200");

보안그룹 인바운드 규칙 편집

마스터 노드 보안 그룹의 인바운드 규칙을 수정한다.

마스터, 슬레이브 노드 간에는 모든 TCP, 모든 UDP 등 기본으로 되어 있는 규칙이 있기 때문에

ElasticSearch를 마스터 노드 내부에 설치했을 경우 별도의 설정이 필요 없을 것이다.

만약 외부에서 ElasticSearch에 접속해야 될 경우 외부에서 접속할 수 있는 경로를 열어놔야 한다.

규칙 추가를 누르고 아래 내역을 입력하고 저장한다.

- 유형 : 사용자 지정 TCP

- 프로토콜 : TCP

- 포트 범위 : 9200

- 소스 : '내 IP'(고정 IP가 없다면 '위치 무관')

scp를 이용해서 로컬에서 작업한 mapreduce jar 파일을 마스터 노드로 복사한다.
elasticsearch를 실행하고 wordcount 예제를 돌려서 에러 없이 돌아가는지 확인한다.
정상적으로 돌아갔다면 데이터가 정상적으로 들어갔는지 확인한다.

마스터 노드에서
curl -XGET http://[Private IP]:9200/hadoop/_search?pretty=true?q=*.*

외부에서는
curl -XGET http://[Public IP]:9200/hadoop/_search?pretty=true?q=*.*

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1035,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "hadoop",
        "_type" : "wordcount",
        "_id" : "AV9wXlTwywZ-ibD3mov5",
        "_score" : 1.0,
        "_source" : {
          "reaffirm" : 1
        }
      },
      {
        "_index" : "hadoop",
        "_type" : "wordcount",
        "_id" : "AV9wXlTwywZ-ibD3mov8",
        "_score" : 1.0,
        "_source" : {
          "recognition," : 1
        }
      },
      {
        "_index" : "hadoop",
        "_type" : "wordcount",
        "_id" : "AV9wXlTwywZ-ibD3mov_",
        "_score" : 1.0,
        "_source" : {
          "refused" : 1
        }
      },

이런 식으로 source에 wordcount 결과가 저장된다.

2017년 10월 27일 금요일

Hadoop 출력을 Elastic Search에 저장하기

참고 사이트

http://wpcertification.blogspot.kr/2014/05/using-elasticsearch-to-store-output-of.html

우선 Elastic Search를 설치한다.
http://modoleesi.blogspot.kr/2017/10/elastic-search.html

연동을 위해서는 Elastic에서 제공하는 ES-Hadoop 라이브러리가 있다.
https://www.elastic.co/kr/products/hadoop

라이브러리 다운받고 압축을 푼다.
압축을 풀고 dist 디렉터리에 들어가면 파일이 엄청 많이 있는데, 그 중에서 elasticsearch-hadoop-mr-5.6.3.jar 를 사용할 것이다.

해당 파일을 lib 파일을 저장할 곳에 복사한다.
나는 $ES_HOME/lib/에 저장을 하고 있다.
cp elasticsearch-hadoop-mr-5.6.3.jar $ES_HOME/lib/

환경변수

vi $HOME/.bashrc

#ELASTIC-HADOOP
export ES_HADOOP_MR_LIB=$ES_HOME/lib/elasticsearch-hadoop-mr-5.6.3.jar
export LIBJARS=$ES_HADOOP_MR_LIB
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$ES_HADOOP_MR_LIB

source $HOME/.bashrc

hadoop classpath
명령을 쳐서 정상적으로 추가됐는지 확인한다.

그리고 이제 wordcount 예제를 수정해서 결과를 ElasticSearch에 저장해보자.
EMR 최신버전(5.9)가 Hadoop 2.7.3 버전을 지원해서 지금부터는 아마 2.7.3 으로 작성할 것 같다.

pom.xml

<dependencies>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-hadoop-mr</artifactId>
            <version>5.6.3</version>
        </dependency>

    </dependencies>

EsHadoopJob.java

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.elasticsearch.hadoop.mr.EsOutputFormat;

public class EsHadoopJob extends Configured implements Tool {

    public int run(String[] args) throws Exception {

        // Verify the number of parameters
        if (args.length != 2) {
            System.err.printf("Usage : %s [generic options] <input> <output>\n", getClass().getSimpleName());
            ToolRunner.printGenericCommandUsage(System.err);
            return -1;
        }

        // Create a configuration
        Configuration conf = getConf();

        // Configuration for ES-Haoop connector
        conf.set("es.nodes","localhost:9200");
        conf.set("es.resource", "hadoop/wordcount");

        // Create a job from the default configuration that will use the wordcount class
        Job job = Job.getInstance(conf, "wordcount");

       // for ES-HADOOP
        job.setSpeculativeExecution(false);

        // Configure the job: name, mapper, reducer, and combiner
        job.setJarByClass(EsHadoopJob.class);
        job.setMapperClass(EsHadoopMapper.class);
       //  job.setCombinerClass(EsHadoopReducer.class); 
        job.setReducerClass(EsHadoopReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // Configure the input/output format and key, value class
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(MapWritable.class);

        // OutputFormatClass for ES-Haoop connector
        job.setOutputFormatClass(EsOutputFormat.class);

        // Define our input path as the first command line argument and our output path as the second
        Path inputPath = new Path(args[0]);
        Path outputPath = new Path(args[1]);

        // Create File Input/Output formats for these paths (in the job)
        FileInputFormat.addInputPath(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);

        // Deleting the context path automatically from hdfs so tha we don't have delete it explicitly
        outputPath.getFileSystem(conf).delete(outputPath,true);

        // Run the job
        return (job.waitForCompletion(true) ? 0 : 1);
    }


    public static void main(String[] args) throws Exception {
        // Start the WordCount MapReduce application
        int res = ToolRunner.run(new Configuration(), new EsHadoopJob(), args);
        System.exit(res);
    }
}

EsHadoopMapper.java -> 바꿀 것 없음

EsHadoopReducer.java

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;

public class EsHadoopReducer extends Reducer<Text, IntWritable, Text, MapWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }

       private MapWritable result = new MapWritable();
        result.put(key, new IntWritable(sum));
        context.write(key, result);
    }
}

Elastic Search Index 생성

index 리스트를 조회해 보고, 기존에 index가 있을 경우 삭제하고, 새로 생성하며 field의 limit을 4000으로 설정한다. (설정 안 하면 에러가 발생할 수 있어서 우선 설정한다. 에러는 밑에 정리되어 있다.)

curl -XGET 'localhost:9200/_cat/indices?v&pretty'

curl -XDELETE http://localhost:9200/hadoop

curl -XPUT 'localhost:9200/hadoop?pretty'

curl -XPUT 'localhost:9200/hadoop/_settings' -d '
{
    "index.mapping.total_fields.limit": 4000
}'

빌드 후 실행

mvn clean install

hadoop jar es-hadoop-1.0-SNAPSHOT.jar EsHadoopJob -libjars elasticsearch-hadoop-mr-5.6.3.jar input output

데이터 확인

http://localhost:9200/hadoop/_search?pretty=true&q=*:*&size=999

이렇게 브라우저에 입력하면 인덱스에 입력된 모든 데이터 중 999개를 출력해준다.

발생했던 에러

es.resource 를 지정하지 않았다는 에러
: 지정하는 위치가 문제. job 생성시 Job.getInstance(conf, "wordcount"); 이렇게 conf를 사용하는데, conf 값 지정을 그 밑에서 하고 있었음. 그래서 위로 올렸더니 정상 동작
classpath에 넣어 놓은 라이브러리를 찾지 못한다고 (-libjars 명령 안 먹힘)
: job 생성 시 conf를 new Configuration()을 했었는데, 그걸 getConf()로 바꿈
wrong value class: class org.apache.hadoop.io.MapWritable is not class org.apache.hadoop.io.IntWritable
: 이거는 Combiner 클래스를 그대로 사용해서 문제가 발생. combiner 클래스의 output이 mapper의 output과 같아야 하는데 combiner를 reducer와 같게 지정해 놔서 IntWritable이 나와야 될 자리에 MapWritable이 나와서 발생한 에러
failed to execute bulk item (index) BulkShardRequest [[hadoop][1]] containing [232] requests org.elasticsearch.index.mapper.MapperParsingException: failed to parse -> elastic search 디렉터리에 log 디렉터리의 elasticsearch.log에 자세히 적혀 있음
object field starting or ending with a [.] makes object resolution ambiguous: [survive...that]
: field 이름이 .으로 시작하고 끝나는데 field 이름을 survive...that으로 하려니 ... 에러가 발생
이건 wordcount에서 survice...that을 한 단어로 봐서 생긴일 그래서 그냥 원본 파일에서 ... 제거 함. 우선은 테스트 해 보는게 목적이니깐. 나중에는 field 명에 . 을 넣는일은 없을테니깐

Limit of total fields [1000] in index [hadoop] has been exceeded
: 필드 생성 개수 초과
index 삭제하고 다시 만들어서 한계치를 올려준다.

curl -XDELETE http://localhost:9200/hadoop
curl -XPUT 'localhost:9200/hadoop?pretty'
curl -XPUT 'localhost:9200/hadoop/_settings' -d '
{
    "index.mapping.total_fields.limit": 4000
}'

2017년 10월 26일 목요일

Elastic Search 설치

참고 사이트

http://gyrfalcon.tistory.com/entry/elastic-stack-2-elasticsearch-%EC%84%A4%EC%B9%98-quick-start

설치 환경

Ubuntu 16.04 LTS 64bit
Elastic Search 5.6.3

바이너리 다운로드

https://www.elastic.co/downloads/elasticsearch 에서 tar 파일 다운로드

tar zxvf elasticsearch-5.6.3.tar.gz

환경변수

vi ~/.bashrc

#ELASTIC SEARCH
export ES_HOME=$HOME/elasticsearch
export PATH=$PATH:$ES_HOME/bin

source ~/.bashrc

실행

elasticsearch 파일 실행한 후
(데몬으로 실행하실 원할 경우 elasticsearch -d)

다른 명령창에서 curl -XGET 'http://localhost:9200' 를 실행해서

아래처럼 뜨면 설치는 완료

{
  "name" : "V8xgQGe",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "i-O8Ro4JSiOaEmuoXkKL5w",
  "version" : {
    "number" : "5.6.3",
    "build_hash" : "1a2f265",
    "build_date" : "2017-10-06T20:33:39.012Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

문제 해결

1. file descriptor, memory 부족으로 인한 에러
max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

상세 설명은 링크 참고
http://libqa.com/wiki/807

현재 설정되어 있는 limit 확인
ulimit -Sa

최대 file descriptor 늘리기 위해 파일 수정

sudo vi /etc/security/limits.conf

hadoop    hard    nofile    65536
hadoop    soft    nofile    65536
hadoop    hard    nproc     65536
hadoop    soft    nproc     65536

특정 프로세스가 소유할 수 있는 가상 메모리 영역 설정

sudo vi /etc/rc.local

echo 1048575 > /proc/sys/vm/max_map_count

시스템 전체에서 최대로 열 수 있는 파일 개수 설정

sudo sysctl -w fs.file-max=262114

설정 완료 후 재부팅

sudo reboot 0