es 删除数据的三种方法
1 因为高版本的es并不支持批量删除,所以第一个方法思路,首先查询es 获取主键id,然后根据id逐个删除
def scrollScanDeleteByTopic(client:TransportClient,index:String,topic:String)= { var searchResponse = client.prepareSearch(index).setTypes("docs") .setQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("topicName",topic))) .setSearchType(SearchType.DEFAULT) .addStoredField("id") .setSize(1000).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() var num = searchResponse.getHits.getHits.length // 循环直到遍历所有数据 //loop.breakable{ while (num != 0) { println(" num " + num) val res = searchResponse.getHits.getHits res.foreach { x => val response = client.prepareDelete(index, "docs", x.getId).execute().actionGet() // println(response) } println(s"========================= $index 删除成功 ${res.length}==${topic} ==========") searchResponse = client.prepareSearchScroll(searchResponse.getScrollId) .setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() num = searchResponse.getHits.getHits.length } // } }
2 发现逐个删除的效率有点低,在此基础上改进,查询之后采用多线程来删除
def scrollScanDeleteByTopic(client:TransportClient,index:String,topic:String)= { var searchResponse = client.prepareSearch(index).setTypes("docs") .setQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("topicName",topic))) .setSearchType(SearchType.DEFAULT) .addStoredField("id") .setSize(1000).setScroll(TimeValue.timeValueMinutes(5)) .execute().actionGet() var num = searchResponse.getHits.getHits.length // 循环直到遍历所有数据 //loop.breakable{ while (num != 0) { println(" num " + num) val res = searchResponse.getHits.getHits.map(_.getId)
deleteDocuments(4,res)
println(s"========================= $index 删除成功 ${res.length}==${topic} ==========")
searchResponse = client.prepareSearchScroll(searchResponse.getScrollId)
.setScroll(TimeValue.timeValueMinutes(5))
.execute().actionGet()
num = searchResponse.getHits.getHits.length
} // }
}def deleteDocuments(num:Int,arr:Array[String])= { val step = if (arr.length % num == 0) arr.length / num else arr.length + 1 for (i <- 0 until num) { new Thread(new Runnable { override def run(): Unit = { val client = Es_test.getDeleteClient() val beginNum = i * step val endNum = (i + 1) * step - 1 for (j <- 0 until arr.length if (j >= beginNum && j <= endNum)) { client.prepareDelete("ods_wj_scenes_detail", "docs", arr(i)).execute().actionGet() } Es_test.close(client) } }) }
3 网上发现一个插件elete-by-query,可以实现批量删除
"org.elasticsearch.plugin" % "delete-by-query" % "2.4.1" % Test
val queryBuilder = QueryBuilders.boolQuery() queryBuilder.must(QueryBuilders.matchAllQuery()) val start = new Date().getTime val response = DeleteByQueryAction.INSTANCE.newRequestBuilder(client).filter(queryBuilder).source("ods_wj_scenes_detail").get() val deleted = response.getDeleted val end = new Date().getTime println(s"=================$deleted=====${end-start}==============")
结果 方法一的效率比较低 方法二效率有提升 采用4个线程删除数据,76万数据耗时182883毫秒,方法三 74万数据 耗时51410毫秒可以说效率是逐渐提升的,方法三效率更高,其中也发现了一个问题,逐渐提高方法二的线程数,耗时并不会变少,瓶颈主要在查询上,经测量发现load 1万数据耗时2s左右,如果能提高查询效率,方法二的效率也会得到提升
还有 这个插件目前是测试版,和其他部分的兼容性并不是特别好,应用中遇到一个jar包问题,其依赖的jar包缺少一个关键的类ReflectUtil
log4j-slf4j-impl.2.8 ,解决的方法就是采用这个jar包的最新版本,我下的是2.9,完美解决这个问题
本文介绍了ES删除数据的三种方式,包括通过查询获取ID再逐个删除,使用delete-by-query插件进行批量删除,以及解决该插件依赖问题的具体方法,如替换log4j-slf4j-impl的jar包版本。
2065

被折叠的 条评论
为什么被折叠?



