1
Aldar
3y

Any Elasticsearch gurus here? I have a box with too many young gen GCs (one per 2 or 3 seconds), and irregular, very long old gen GCs (One per several hours, taking around a minute and freeing about 2/3's of the old gen space) -- I was thinking changing the new gen ratio from 2/3 to something like 3/4 or 4/5.

However, after reading an elastic article about settings to never touch... I'm no longer so sure...

Only other option I was considering is going from CMS to G1GC to cut back on the old gen GC time... A minute long downtime for Elastic is rather problematic.

Any thoughts? The box is rather old - running Elastic 5.6 with 20 GBs of heap, 207 shards and 306k docs.

Comments
  • 0
    We would need far more data...

    Memory allocation in elasticsearch is a very complicated thing.

    The memory is broken up in elasticsearch in several segments which may or may not belong to Elasticsearch itself.

    5.6 is ancient... If you can I highly rate to upgrade to 6 to have an upgrade path to 7 - 8 is already in process and will become reality soon.

    JDK 8 and G1GC isn't a silver bullet. Far from it... It's an entirely different thing than G1GC in JDK 11. People tend to treat it the same - it's not.

    JDK 8 is hopefully up to date? 292 if repository, 302 otherwise?

    https://wiki.openjdk.java.net/displ...

    If not, stop immediately and update first.
    Lots of stuff was backported to JDK 8 which could help.

    The JVM settings for Elasticsearch I use are backported from the Elasticsearch main repository.

    I would disrecommend fiddling with them - as Elasticsearch allocates and uses the JDK heap for several entirely different things, you most likely are making it worse than better....

    https://pastebin.com/12xFgaM5

    You should check the environment - I wrote SystemD Units to start elasticsearch, ensuring that the Limits / sysctl configuration is sane.

    Now to the interesting parts: Disable xpack monitoring. Check if problem persists.

    The monitoring via Marvel / XPack sucks bonkers in ES. In 7 they fixed several things, especially regarding it's resource usage.

    I'd recommend prometheus with a plugin for that.

    Then take a look at the cache statistics:

    https://elastic.co/guide/en/...

    You find most of the interesting caches in the Modules - Indices Doc Category....

    It's analyzing node statistics, analyzing usage of caches, analyzing query behaviour.

    There is seldomly a golden bullet here...

    Grafana / Monitoring is the key to find out why GCing is necessary.

    GCs aren't necessarily a bad thing though - except when (as you said) they're running multiple dozens of seconds.
  • 0
    Regarding shards: Shard size might play a role.

    If the shard size is extremely diverse in the cluster, e.g.

    Index 1 - 6 Shards, each 5 GB
    Index 2 - 5 Shards, each 30 GB
    Index 3 - 8 Shards, each 50 GB

    You can have the funny problem of smaller shards load "exiling" larger shards in memory.

    It makes sense to manually partition the cluster by node attributes / index settings to distribute a near equal shard size for indexes on nodes.

    E.g.

    Node 1 - 2 hosts indexes with max. shard size of 10 GB
    Node 3 - 4 hosts indexes with max. shard size of 30 GB
    ....
Add Comment