主流开源分布式图数据库Benchmark

wey · 2021 年5 月 8 日 01:50

您好 Yangmeng,

可以参考这两个帖子， Yee 分别给的一些解释

https://discuss.nebula-graph.io/t/nebula-importer-optimize-performance/875/7

我引用一下 Yee 老师在两个帖子的信息

对于csv文件，importer读它是一个线程，启动concurrency个 client，每一个client有channelBufferSize size的queue来buffer query
The batchSize is the number of vertices or edges sent to nebula at the same time. Nebula Storaged will store these data in batch to avoid too many RPCs. The speed of inserting data is affected by your disk. So you can do some tests to find a suitable value.

看起来您的 10/10 是没有到storaged的处理极限，50/50也不知道有没有到极限，可以试着给定 cores数量的 concurrent数， batchSize再调大一些（超过50）评估一下速度，我建议把CSV截断为十分钟可以看结果的量级，不断逼近优化一下结果，然后share出来你的结果哈。