Elasticsearch 快速入门

// 查询集群的磁盘状态
GET _cat/allocation?v

// 获取所有索引
GET _cat/indices

// 按索引数量排序
GET _cat/indices?s=docs.count:desc
GET _cat/indices?v&s=index

// 集群有多少节点
GET _cat/nodes

// 集群的状态
GET _cluster/health?pretty=true
GET _cat/indices/*?v&s=index

//获取指定索引的分片信息
GET logs/_search_shards

...

集群状态

curl -s -XGET 'http://<host>:9200/_cluster/health?pretty'

//系统正常，返回的结果
{
  "cluster_name" : "es-qwerty",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 1,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

检索文档

POST logs/_search
{
  "query":{
    "range":{
      "createdAt":{
        "gt":"2020-04-25",
        "lt":"2020-04-27",
        "format": "yyyy-MM-dd"
      }
    }
  },
  "size":0,
  "aggs":{
    "url_type_stats":{
      "terms": {
        "field": "urlType.keyword",
        "size": 2
      }
    }
  }
}

POST logs/_search
{
  "query":{
    "range":{
      "createdAt":{
        "gte":"2020-04-26 00:00:00",
        "lte":"now",
        "format": "yyyy-MM-dd hh:mm:ss"
      }
    }
  },
  "size":0,
  "aggs":{
    "url_type_stats":{
      "terms": {
        "field": "urlType.keyword",
        "size": 2
      }
    }
  }
}

POST logs/_search
{
  "query":{
    "range": {
      "createdAt": {
        "gte": "2020-04-26 00:00:00",
        "lte": "now",
         "format": "yyyy-MM-dd hh:mm:ss"
      }
    }
  },
  "size" : 0,
  "aggs":{
    "total_clientIp":{
      "cardinality":{
        "field": "clientIp.keyword"
      }
    },
    "total_userAgent":{
      "cardinality": {
        "field": "userAgent.keyword"
      }
    }
  }
}

POST logs/_search
{
  "size" : 0,
  "aggs":{
    "date_total_ClientIp":{
      "date_histogram":{
        "field": "createdAt",
        "interval": "quarter",
        "format": "yyyy-MM-dd",
        "extended_bounds":{
          "min": "2020-04-26 13:00:00",
          "max": "2020-04-26 14:00:00",
        }
      },
      "aggs":{
        "url_type_api": {
          "terms": {
            "field": "urlType.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

POST logs/_search
{
  "size" : 0,
  "aggs":{
    "total_clientIp":{
      "terms":{
        "size":30,
        "field": "clientIp.keyword"
      }
    }
  }
}

删除文档

// 删除
POST logs/_delete_by_query {"query":{"match_all": {}}}

// 删除索引
DELETE logs

创建索引

数据迁移本质是索引的重建，重建索引不会尝试设置目标索引，它不会复制源索引的设置。所以在操作之前设置目标索引，包括设置映射，分片数，副本等。

数据迁移

Reindex from Remoteedit

// Reindex支持从远程Elasticsearch集群重建索引：
POST _reindex
{
  "source": {
    "remote": {
      "host": "http://lotherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

// host参数必须包含scheme、host和port（例如https://lotherhost:9200）
// username和password参数可选

使用时需要在 elasticsearch.yml 中配置 reindex.remote.whitelist 属性。可以设置多组（例如，lotherhost:9200, another:9200, 127.0.10.*:9200, localhost:*）。

具体使用可参考 Reindex from Remoteedit

Elasticsearch-Dump

Elasticsearch-Dump 是一个 elasticsearch 数据导入导出开源工具包。安装、迁移相关执行可以在相同可用区的云主机上进行，使用方便。

需要 node 环境，npm 安装 elasticdump

npm install elasticdump -g
elasticdump

// Copy an index from production to staging with analyzer and mapping:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=analyzer
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

// Copy a single shard data:
elasticdump \
  --input=http://es.com:9200/api \
  --output=http://es.com:9200/api2 \
  --params='{"preference" : "_shards:0"}'

elasticdump 命令其他参数使用参考 Elasticdump Options

深度分页

elasticsearch 超过 10000 条数据的分页查询会报异常，官方提供了 search_after 的方式来支持
search_after 要求提供上一页两个必须的排序标识

//https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-after.html
GET logs/_search
{
  "from":9990,
  "size":10,
  "_source": ["url","clientIp","createdAt"],
  "query":{
    "match_all": {}
  },
  "sort":[
    {
      "createdAt":{
        "order":"desc"
      }
    },
    {
      "_id":{
        "order":"desc"
      }
    }
    ]
}

GET logs/_search
{
  "from":-1,
  "size":10,
  "_source": ["url","clientIp","createdAt"],
  "query":{
    "match_all": {}
  },
  "search_after": [1588042597000, "V363vnEBz1D1HVfYBb0V"],
  "sort":[
    {
      "createdAt":{
        "order":"desc"
      }
    },
    {
      "_id":{
        "order":"desc"
      }
    }
    ]
}