nebula2.5.0 python客户端如何加快边edge的读取速度？

zhang · 2022 年1 月 10 日 04:50

提问参考模版：

nebula 版本：2.5.0
部署方式：单机
安装方式：RPM
问题的具体描述
nebula 2.5.0的Python客户端读取数据按照示例程序中的实现，对edge遍历时，while循环一次默认读取10000条边，读取大文件时非常耗时，我测试时读本地的图数据库中的500万条边耗时约半小时，有没有方法可以加快读取速度呢？

steam · 2022 年1 月 10 日 04:52

贴一下硬件信息，还有 Python 客户端这块的读取配置。

zhang · 2022 年1 月 10 日 05:06

硬件：
机器数量：1台（单机测试）
服务器型号：Sugon X740-H30
操作系统版本：Centos7.6 (x86)
CPU型号：Hygon C86 7155 16-core Processor
CPU个数：2
CPU核数：16
内存：128G

Python3.6.8

from nebula2.mclient import MetaCache, HostAddr
from nebula2.sclient.GraphStorageClient import GraphStorageClient
import pandas as pd
import numpy as np
import os 
import time 

# 连接本地nebula数据库，并遍历边数据；

def scan_space_edge(graph_storage_client):
    # scan edge
    resp = graph_storage_client.scan_edge(
                space_name = "facebook",
                edge_name = "relation")
    edge_id = []

    # 遍历边edge数据
    while resp.has_next():
        cnt = 0
        result = resp.next()
        print('Py reading nebula data .........')
        for edge_data in result:
            # srcId = edge_data.get_src_id().as_string()
            # dscId = edge_data.get_dst_id().as_string()
            # srcId, dscId = int(srcId), int(dscId)
            # edge_id.append([srcId, dscId])
            cnt += 1
        print('while cnt:',cnt)

def link_nebula():
    meta_cache = None
    graph_storage_client = None
    result_edge = []
    
    try:
        meta_cache = MetaCache([('127.0.0.1', 9559)], 50000)
        graph_storage_client = GraphStorageClient(meta_cache)
        # print("--- Py link nebula success ---\n")
        start_t = time.time()
        scan_space_edge(graph_storage_client)
        print("Py read nebula edges data cost: ", round(time.time() - start_t, 2), "s")

    except Exception as x:
        import traceback
        print(traceback.format_exc())
        if graph_storage_client is not None:
            graph_storage_client.close()
        exit(1)
    finally:
        if graph_storage_client is not None:
            graph_storage_client.close()
        if meta_cache is not None:
            meta_cache.close()

# link_nebula()

这块的result = resp.next() 一次是10000条，能不能修改参数啥的可以一次读的多点；

zhang · 2022 年1 月 10 日 05:09

CPP客户端一个是我编译了之后运行报错：
./session_example: error while loading shared libraries: libnebula_graph_client.so: cannot open shared object file: No such file or directory

另外一个是用cpp暂时不会从nebula中拿数据，C++初学者；所以现在暂时用Python来读

Aiee · 2022 年1 月 10 日 06:04

由于python语言本身的性能导致反序列化的时候效率低, 没有什么好的提升方法, 建议还是用cpp或者exchange工具来读数据

HarrisChu · 2022 年1 月 10 日 06:14

python 一个是解码很慢，另外就是因为有 GIL 多线程其实最多只能用一个核。
也可以用 Java 的客户端，nebula-java/StorageClientExample.java at master · vesoft-inc/nebula-java · GitHub

zhang · 2022 年1 月 10 日 06:40

cpp读大数据的效率如何

nicole · 2022 年1 月 10 日 07:02

cpp的scan接口好像没有这方面的性能数据，不过你可以在业务层并发的进行数据读取。如果你会java或者scala的话可以用spark connector 进行大量数据读取，性能在readme中有说明 https://github.com/vesoft-inc/nebula-spark-connector

zhang · 2022 年1 月 10 日 07:26

好的谢谢

zhang · 2022 年1 月 10 日 08:05

cpp客户端运行示例时，报错找不到libnebula_graph_client.so，系统中有两个同名文件，怎么选啊？

[root@11ceeffa88ce nebula-cpp]# cd examples/
[root@11ceeffa88ce examples]# ls
CMakeLists.txt  SessionExample.cpp
[root@11ceeffa88ce examples]# LIBRARY_PATH=/usr/local/nebula/lib64:$LIBRARY_PATH g++ -std=c++11 SessionExample.cpp -I/usr/local/nebula/include -lnebula_graph_client -o session_example
[root@11ceeffa88ce examples]# ls
CMakeLists.txt  SessionExample.cpp  session_example
[root@11ceeffa88ce examples]# ./session_example 
./session_example: error while loading shared libraries: libnebula_graph_client.so: cannot open shared object file: No such file or directory
[root@11ceeffa88ce examples]# find / -name "libnebula_graph_client.so"
/usr/local/nebula/lib64/libnebula_graph_client.so
/nebula-cpp/build/lib/libnebula_graph_client.so

nicole · 2022 年1 月 10 日 09:43

cpp的我也不了解等其他同学回复你吧

zhang · 2022 年1 月 10 日 09:49

cpp的问题目前已解决，现在还不太会用，能不能提供下基于cpp 遍历某个space的所有边的示例代码参考下，我也是C++新手；谢谢

nicole · 2022 年1 月 10 日 09:51

https://github.com/vesoft-inc/nebula-cpp/blob/master/examples/StorageClientExample.cpp

zhang · 2022 年1 月 10 日 09:57

谢谢！我之前clone的2.5版本没有这个文件

zhouhuhq · 2023 年1 月 9 日 03:22

我也遇到同样的问题