博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Elasticsearch monitor reference
阅读量:5924 次
发布时间:2019-06-19

本文共 3864 字,大约阅读时间需要 12 分钟。

hot3.png

https://www.elastic.co/guide/en/elasticsearch/guide/2.x/heap-sizing.html

 

 

https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_jvm_section

1.

"jvm": {            "timestamp": 1408556438203,            "uptime_in_millis": 14457,            "mem": {               "heap_used_in_bytes": 457252160,               "heap_used_percent": 44,               "heap_committed_in_bytes": 1038876672,               "heap_max_in_bytes": 1038876672,               "non_heap_used_in_bytes": 38680680,               "non_heap_committed_in_bytes": 38993920,
  • The jvm section first lists some general stats about heap memory usage. You can see how much of the heap is being used, how much is committed (actually allocated to the process), and the max size the heap is allowed to grow to. Ideally, heap_committed_in_bytes should be identical to heap_max_in_bytes. If the committed size is smaller, the JVM will have to resize the heap eventually—and this is a very expensive process. If your numbers are not identical, see for how to configure it correctly.

    The heap_used_percent metric is a useful number to keep an eye on. Elasticsearch is configured to initiate GCs when the heap reaches 75% full. If your node is consistently >= 75%, your node is experiencing memory pressure. This is a warning sign that slow GCs may be in your near future.

    If the heap usage is consistently >=85%, you are in trouble. Heaps over 90–95% are in risk of horrible performance with long 10–30s GCs at best, and out-of-memory (OOM) exceptions at worst.

2.

"gc": {   "collectors": {      "young": {         "collection_count": 13,         "collection_time_in_millis": 923      },      "old": {         "collection_count": 0,         "collection_time_in_millis": 0      }   }}

In contrast, the old generation collection count should remain small, and have a small collection_time_in_millis. These are cumulative counts(怎么个累加法?从cluster启动开始累加吗?), so it is hard to give an exact number when you should start worrying (for example, a node with a one-year uptime will have a large count even if it is healthy). This is one of the reasons that tools such as Marvel are so helpful. GC counts over time are the important consideration.

Time spent GC’ing is also important. For example, a certain amount of garbage is generated while indexing documents. This is normal and causes a GC every now and then. These GCs are almost always fast and have little effect on the node: young generation takes a millisecond or two, and old generation takes a few hundred milliseconds. This is much different from 10-second GCs.

Our best advice is to collect collection counts and duration periodically (or use Marvel) and keep an eye out for frequent GCs. You can also enable slow-GC logging, discussed in .

3.

It is much better to handle queuing in your application by gracefully handling the back pressure from a full queue. When you receive bulk rejections, you should take these steps:

  1. Pause the import thread for 3–5 seconds.
  2. Extract the rejected actions from the bulk response, since it is probable that many of the actions were successful. The bulk response will tell you which succeeded and which were rejected.
  3. Send a new bulk request with just the rejected actions.
  4. Repeat from step 1 if rejections are encountered again.

Using this procedure, your code naturally adapts to the load of your cluster and naturally backs off.

Rejections are not errors: they just mean you should try again later.

There are a dozen threadpools. Most you can safely ignore, but a few are good to keep an eye on:

indexing

Threadpool for normal indexing requests

bulk

Bulk requests, which are distinct from the nonbulk indexing requests

get

Get-by-ID operations

search

All search and query requests

merging

Threadpool dedicated to managing Lucene merges

 

转载于:https://my.oschina.net/fayebrooke/blog/693346

你可能感兴趣的文章
2017-2018-1 20155312 《信息安全系统设计基础》第八周学习总结
查看>>
JVM
查看>>
概括的解释下线程的几种可用状态。
查看>>
LeetCode 55. Jump Game I / II
查看>>
Java练习 SDUT-2445_小学数学
查看>>
bzoj 3126: [Usaco2013 Open]Photo——单调队列优化dp
查看>>
HashMap的实现原理
查看>>
字符串str.format()方法的个人整理
查看>>
Scrapy源码注解--CookiesMiddleware
查看>>
make命令--基础
查看>>
开通首页笔记
查看>>
配置管理
查看>>
django-路由层
查看>>
20177101010101 白玛次仁《面向对象程序设计》第十八周学习总结
查看>>
【转】AngularJS 取消对 HTML 片段的转义
查看>>
14.使用unbind()方法移除元素绑定的事件
查看>>
13.PHP中循环结构之foreach循环语句(任务一)
查看>>
QQ互联不能使用的通用解决方法
查看>>
java.sql.SQLException: ORA-28040: No matching authentication protocol
查看>>
红黑树深入剖析及Java实现
查看>>