那年那日那朵花

".......(o´ω`o)......"

es的bug存在.es_temp_file文件导致启动不了

2018-09-12 10:25 elk

昨天遇到的一个bug,发现有一台es节点启动不了了。
具体情况如下,有一台服务器由于自身原因导致连续重启,待修复后,发现该服务器上的es节点启动不了了,es版本为5.3.3。报错信息如下:

[2018-09-11T16:24:28,786][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-7] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to created node environment
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:58) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.3.3.jar:5.3.3]
Caused by: java.lang.IllegalStateException: Failed to created node environment
        at org.elasticsearch.node.Node.<init>(Node.java:265) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.node.Node.<init>(Node.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.3.jar:5.3.3]
        ... 6 more
Caused by: java.io.IOException: failed to write in data directory [/data/es/data/nodes/0/indices/PaXTVdBDSXCPqKicNCfOnA/3/_state] write permission is required
        at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1075) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1058) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:277) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.node.Node.<init>(Node.java:262) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.node.Node.<init>(Node.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.3.jar:5.3.3]
        ... 6 more
Caused by: java.nio.file.FileAlreadyExistsException: /data/es/data/nodes/0/indices/PaXTVdBDSXCPqKicNCfOnA/3/_state/.es_temp_file
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
        at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_131]
        at java.nio.file.Files.createFile(Files.java:632) ~[?:1.8.0_131]
        at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1072) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1058) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:277) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.node.Node.<init>(Node.java:262) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.node.Node.<init>(Node.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.3.jar:5.3.3]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.3.jar:5.3.3]
        ... 6 more

从字面上差不多就能了解到是由于已经存在了/data/es/data/nodes/0/indices/PaXTVdBDSXCPqKicNCfOnA/3/_state/.es_temp_file该文件导致的。那么这个文件又是什么从何而来呢?
待查询网上后了解到,该情况可能是个es的一个bug。具体的相关信息如下:
https://github.com/elastic/elasticsearch/pull/21210
https://github.com/elastic/elasticsearch/issues/20992

We just ran into this same issue on 5.3.1. Looking at the fix, moving the delete into finally does not fully address the issue, so the fix is incomplete. If the system crashes right after the file is initially created, the file will be left behind and forever prevent the system from starting up.

猜想可能是由于服务器连续重启,导致开机自启动的es在还没有完全启动的情况下,又被强制关闭了,故导致.es_temp_file任然留存在服务器上了,然后导致之后的启动失败。

解决办法就是很简单。手动将隐藏文件.es_temp_file删除后,即可启动es节点了。

Cloudhu 个人随笔|built by django|

沪ICP备16019452号-1