SoFunction
Updated on 2024-10-29

Ways to install the ElasticSearch search tool and configure the Python driver

ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on a RESTful web interface.Elasticsearch is developed in Java and released as open source under the terms of the Apache license is the second most popular enterprise search engine. Designed for use in cloud computing , capable of real-time search , stable , reliable , fast , easy to install and use .
We build a website or an application and we want to add search functionality, and here's what strikes us: searching is hard. We want our search solution to be fast, we want a zero-configuration and a completely free search model, we want to be able to simply index data using JSON over HTTP, we want our search servers to be always available, we want to be able to start with one and scale to hundreds, we want real-time search, we want simple multi-tenancy, and we want to build a cloud-based solution.Elasticsearch is designed to solve all these problems and more.
Elasticsearch is a new member of the open source search platform, real-time data analysis of the gods, rapid development, based on Lucene, RESTful, distributed, cloud-oriented design, real-time search, full-text search, stable, highly reliable, scalable, easy to install + use, the introduction of a very good, good or bad to use to take it out to walk a walk.
Did a simple test, in two identical virtual machines, 20 million or so data, Elasticsearch insert data much slower than MongoDB (can be tolerated), but the search/query speed is 10 times faster, this is just a single-machine situation, multi-machine cluster situation Elasticsearch performance is better. The following installation steps were done on Ubuntu Server 14.04 LTS.

Installing Elasticsearch
After upgrading your system, install Oracle Java 7. Since Elasticsearch officially recommends Oracle JDK 7, don't try JDK 8 or OpenJDK:

$ sudo apt-get update
$ sudo apt-get upgrade
 
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
 
$ sudo apt-get install oracle-java7-installer

Install elasticsearch after joining the official Elasticsearch repositories:

$ wget -O - /GPG-KEY-elasticsearch | apt-key add -
$ sudo echo "deb /elasticsearch/1.1/debian stable main" >> /etc/apt/
 
$ sudo apt-get update
$ sudo apt-get install elasticsearch

Add it to the system startup file and start the elasticsearch service, use curl to test if the installation was successful:

$ sudo  elasticsearch defaults 95 1
 
$ sudo /etc//elasticsearch start
 
$ curl -X GET 'http://localhost:9200'
{
 "status" : 200,
 "name" : "Fer-de-Lance",
 "version" : {
  "number" : "1.1.1",
  "build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc",
  "build_timestamp" : "2014-04-16T14:27:12Z",
  "build_snapshot" : false,
  "lucene_version" : "4.7"
 },
 "tagline" : "You Know, for Search"
}

Elasticsearch's cluster and data management interface Marvel is very awesome, but unfortunately only for the development environment for free, if this tool is also free of charge on the invincible, the installation is very simple, after the completion of restarting the service to visit http://192.168.2.172:9200/_plugin/marvel/ you can see the interface:

$ sudo /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest
 
$ sudo /etc//elasticsearch restart
 * Stopping Elasticsearch Server                      [ OK ]
 * Starting Elasticsearch Server                      [ OK ]

 (550×473)

Installing the Python Client Driver
Like MongoDB, we generally use programs to interact with Elasticsearch. Elasticsearch also supports client drivers for many languages, so here we only install the Python driver, and you can refer to the official documentation for other languages.

$ sudo apt-get install python-pip
$ sudo pip install elasticsearch

Write a simple program to import gene_info.txt data into Elasticsearch:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
 
import os, , sys, re
import csv, time, string
from datetime import datetime
from elasticsearch import Elasticsearch
 
def import_to_db():
  data = (open('gene_info.txt', 'rb'), delimiter='\t')
  ()
 
  es = Elasticsearch()
  for row in data:
    doc = {
      'tax_id': row[0],
      'GeneID': row[1],
      'Symbol': row[2],
      'LocusTag': row[3],
      'Synonyms': row[4],
      'dbXrefs': row[5],
      'chromosome': row[6],
      'map_location': row[7],
      'description': row[8],
      'type_of_gene': row[9],
      'Symbol_from_nomenclature_authority': row[10],
      'Full_name_from_nomenclature_authority': row[11],
      'Nomenclature_status': row[12],
      'Other_designations': row[13],
      'Modification_date': row[14]
    }
    res = (index="gene", doc_type='gene_info', body=doc)
 
def main():
  import_to_db()
 
if __name__ == "__main__":
  main()

Kibana is a powerful data display client, integrated with Elasticsearch through a plugin. Installation is easy, just download and unzip it, then restart the Elasticsearch service and visit http://192.168.2.172:9200/_plugin/kibana/ to see the interface:

$ wget /kibana/kibana/kibana-3.0.
$ tar zxvf kibana-3.0.
$ sudo mv kibana-3.0.1 /usr/share/elasticsearch/plugins/_site
$ sudo /etc//elasticsearch restart

 (550×473)