Elasticsearch

[Elasticsearch] 한글 형태소 분석기 은전한잎 설치

Jack Moon 2015. 8. 21. 18:57

mecab-ko 설치
$ cd /usr/local
$ wget https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz
$ tar zxvf mecab-0.996-ko-0.9.2.tar.gz
$ cd mecab-0.996-ko-0.9.2
$ ./configure
$ make
$ make check
$ make install


만약 Centos 5.9등에서 안될 경우 gcc 버전 업그레이드 후 설치
# yum install gcc44 gcc44-c++
$ ./configure CXX=g++44
$ make
# make install


mecab-ko-dic 설치
$ cd /usr/local
$ wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.0-20150517.tar.gz
$ tar zxvf mecab-ko-dic-2.0.0-20150517.tar.gz
$ cd mecab-ko-dic-2.0.0-20150517
$ ./configure
$ make
$ make install


libMeCab.so 설치
$ cd /usr/local
$ wget https://mecab.googlecode.com/files/mecab-java-0.996.tar.gz
$ tar zxvf mecab-java-0.996.tar.gz
$ cd mecab-java-0.996
$ vi Makefile
    # java path 설정.               ; INCLUDE=/usr/local/jdk1.6.0_41/include
    # OpenJDK 사용시 "-O1" 로 변경. ; $(CXX) -O1 -c -fpic $(TARGET)_wrap.cxx  $(INC)
    # "-cp ." 추가.                 ; $(JAVAC) -cp . test.java
$ make
$ cp libMeCab.so /usr/local/lib
$ ldconfig

 

$ vi ~/.bash_profile
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export LD_LIBRARY_PATH

 

$ source ~/.bash_profile


테스트
$ mecab -d /usr/local/lib/mecab/dic/mecab-ko-dic


ElasticSearch Plugin 설치
$ cd /usr/local/elasticsearch-1.7.1
$ ./bin/plugin --install analysis-mecab-ko-0.17.0 --url https://bitbucket.org/eunjeon/mecab-ko-lucene-analyzer/downloads/elasticsearch-analysis-mecab-ko-0.17.0.zip

 

 

<기타>

사용자사전 추가하기

 

 

ElasticSearch 실행

$ /usr/local/elasticsearch-1.7.1/bin/elasticsearch -Djava.library.path=/usr/local/lib -d

 

 

테스트 인덱스 생성

$ curl -XPUT 'http://localhost:9200/book3' -d '
{
  "settings" : {
    "analysis" : {
      "analyzer" : {
        "korean_analyzer" : {
            "type":"custom",
            "tokenizer":"mecab_ko_standard_tokenizer"
        }
      }
    }
  }
}'

 

 

최종 테스트

$ curl -XPOST 'http://localhost:9200/book3/_analyze?analyzer=korean_analyzer&pretty' -d '동해물과 백두산이 마르고 닳도록'
{
  "tokens" : [ {
    "token" : "동해",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "NNP",
    "position" : 1
  }, {
    "token" : "물과",
    "start_offset" : 2,
    "end_offset" : 4,
    "type" : "EOJEOL",
    "position" : 2
  }, {
    "token" : "물",
    "start_offset" : 2,
    "end_offset" : 3,
    "type" : "NNG",
    "position" : 2
  }, {
    "token" : "백두",
    "start_offset" : 5,
    "end_offset" : 7,
    "type" : "NNG",
    "position" : 3
  }, {
    "token" : "백두산이",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "EOJEOL",
    "position" : 3
  }, {
    "token" : "백두산",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "COMPOUND",
    "position" : 3
  }, {
    "token" : "산",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "NNG",
    "position" : 4
  }, {
    "token" : "마르고",
    "start_offset" : 10,
    "end_offset" : 13,
    "type" : "EOJEOL",
    "position" : 5
  }, {
    "token" : "마르/VV",
    "start_offset" : 10,
    "end_offset" : 12,
    "type" : "VV",
    "position" : 5
  }, {
    "token" : "닳도록",
    "start_offset" : 14,
    "end_offset" : 17,
    "type" : "EOJEOL",
    "position" : 6
  }, {
    "token" : "닳/VV",
    "start_offset" : 14,
    "end_offset" : 15,
    "type" : "VV",
    "position" : 6
  } ]
}

 

<기타>

사용자사전 추가하기 https://bitbucket.org/eunjeon/mecab-ko-dic/src/081f29d23688f16da245ee89109853173ca5e25a/final/user-dic/README.md