neo4j 导入Nebula Graph 驱动问题

sataliulan · 2021 年6 月 21 日 13:25

版本：Nebula Exchange 2.0, Nebula Graph 2.0.1 spark2.4
os：Ubuntu 18.04
按照文档说明编写了neo4j_application.conf ，执行命令

${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange /root/nebula-spark-utils/nebula-exchange/target/nebula-exchange-2.0.0.jar -c /root/nebula-spark-utils/nebula-exchange/target/classes/neo4j_application.conf

报错误：

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.neo4j.driver.exceptions.ClientException: Database name parameter for selecting database is not supported in Bolt Protocol Version 3. Database name: 'graph.db'

再次参考文档，文档上说：Exchange使用Neo4j Driver 4.0.1实现对Neo4j数据的读取。查找Neo4j官方未找到Neo4j Driver ，但找到neo4j-spark-connector 不知道是不是这个，请确认下，若不是，希望给出Neo4j Driver 4.0.1的地址

nicole · 2021 年6 月 22 日 03:26

你提到的Driver是指 neo4j-spark-connector
麻烦确认下当前dbms.active_database是否为 graph.db

wey · 2021 年6 月 22 日 03:28

楼主更多信息

github.com/vesoft-inc/nebula-spark-utils

neo4j 导入Nebula Graph 驱动问题

已打开 12:06PM - 21 Jun 21 UTC

已关闭 02:21AM - 02 Jul 21 UTC

sataliulan

版本：Nebula Exchange 2.0, Nebula Graph 2.0.1 spark2.4 os：Ubuntu 18.04 按照文档说明编写了n…eo4j_application.conf ，执行命令${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange /root/nebula-spark-utils/nebula-exchange/target/nebula-exchange-2.0.0.jar -c /root/nebula-spark-utils/nebula-exchange/target/classes/neo4j_application.conf 报错误： Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.neo4j.driver.exceptions.ClientException: Database name parameter for selecting database is not supported in Bolt Protocol Version 3. Database name: 'graph.db' 再次参考文档，文档上说：Exchange使用Neo4j Driver 4.0.1实现对Neo4j数据的读取。查找Neo4j官方未找到Neo4j Driver ，但找到neo4j-spark-connector 不知道是不是这个，请确认下，若不是，希望给出Neo4j Driver 4.0.1的地址，

    {
      name: Disease
      type: {
        source: neo4j
        sink: client
      }
      server: "bolt://127.0.0.1:7687"
      user: neo4j
      password:111111
      database:graph.db
      exec: "match (n:Disease) with id(n) as _id, n.prevent as prevent, n.cureWay as cure_way, n.name as name, n.cure_lasttime as cure_last_time, n.cure_prob as cure_prob, n.cause as cause, n.cureDepartment as cure_department,n.desc as desc, n.easy_get as easy_get return _id, prevent, cure_way, name, cure_last_time, cure_prob, cause, cure_department, desc, easy_get order by _id"
      fields: [prevent,cureWay,name,cure_lasttime,cure_prob,cause,cureDepartment,desc,easy_get]
      nebula.fields: [prevent,cure_way,name,cure_last_time,cure_prob,cause,cure_department,desc,easy_get]
      vertex: {
        field:_id
      }
      partition: 10
      batch: 1000
      check_point_path: /tmp/medical
   }


...

dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=:7687

sataliulan · 2021 年6 月 22 日 03:42

这是当前的neo4j连接情况：
neo4j

nicole · 2021 年6 月 22 日 04:01

sataliulan:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.neo4j.driver.exceptions.ClientException: Database name parameter for selecting database is not supported in Bolt Protocol Version 3. Database name: 'graph.db'

看你这个报错信息，如果连接和库没问题的话，像是Neo4j服务端和客户端的版本不兼容。

sataliulan · 2021 年6 月 22 日 06:39

我目前仅仅安装了neo4j 的server，是不是目前的Nebula Exchange 与这个不兼容呢？

nicole · 2021 年6 月 22 日 07:20

Exchange使用的Neo4j的客户端版本是4.0.1，不兼容指的的这个4.0.1的客户端和你安装的server。

sataliulan · 2021 年6 月 22 日 08:20

哦哦，明白了

sataliulan · 2021 年6 月 22 日 09:52

你好，目前已经成功导入数据到Nebula Graph；并且创建了相应的index，但是在控制台查询时输出: Storage Error: part: 10, error: E_RPC_FAILURE(-3).
没有看懂，求解。
ps：我的查询语句为：match p=(a:BasePrescription)-[r]->(b:Disease) return p limit 1

nicole · 2021 年6 月 22 日 11:18

cc @yee 帮忙看下E_RPC_FAILURE的问题。
@sataliulan 可能需要你贴一下graphd和storaged日志。

sataliulan · 2021 年6 月 23 日 01:00

这是最新的日志，其中192.168.10.17是我将原来文件中的127.0.0.1更改后的地址
(base) hcl@hcl-OMEN-by-HP-Desktop-873-0xxx:/usr/local/nebula/logs$ tail graphd-stderr.log
E0622 17:48:59.516737 4853 QueryInstance.cpp:103] Storage Error: part: 10, error: E_RPC_FAILURE(-3).
E0623 08:54:53.574343 4753 ThriftClientManager.inl:65] Failed to resolve address for ‘192.168.l0.17’: Name or service not known (error=-2): Unknown error -2
E0623 08:54:53.574565 4753 StorageClientBase.inl:214] Request to “192.168.l0.17”:9779 failed: N6apache6thrift9transport19TTransportExceptionE: Channel is !good()
E0623 08:54:53.574651 4853 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 10
E0623 08:54:53.574672 4853 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 8
E0623 08:54:53.574688 4853 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 6
E0623 08:54:53.574698 4853 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 4
E0623 08:54:53.574707 4853 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 2
E0623 08:54:53.574734 4853 StorageAccessExecutor.h:112] Storage Error: part: 10, error: E_RPC_FAILURE(-3).
E0623 08:54:53.574774 4853 QueryInstance.cpp:103] Storage Error: part: 10, error: E_RPC_FAILURE(-3).
(base) hcl@hcl-OMEN-by-HP-Desktop-873-0xxx:/usr/local/nebula/logs$

sataliulan · 2021 年6 月 23 日 01:21

Nebula-graphd.conf的配置：
########## networking ##########

Comma separated Meta Server Addresses

–meta_server_addrs=192.168.10.17:9559,192.168.10.211:9559

Local IP used to identify the nebula-graphd process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=127.0.0.1

Network device to listen on

–listen_netdev=any

Port to listen on

–port=9669

To turn on SO_REUSEPORT or not

–reuse_port=false

Backlog of the listen socket, adjust this together with net.core.somaxconn

–listen_backlog=1024

Seconds before the idle connections are closed, 0 for never closed

–client_idle_timeout_secs=0

Seconds before the idle sessions are expired, 0 for no expiration

–session_idle_timeout_secs=60000
下面是nebula-storaged.conf的配置：
########## networking ##########

Comma separated Meta server addresses

–meta_server_addrs=192.168.10.17:9559,192.168.10.211:9559

Local IP used to identify the nebula-storaged process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=192.168.l0.17

Storage daemon listening port

–port=9779

HTTP service ip

–ws_ip=0.0.0.0

HTTP service port

–ws_http_port=19779

HTTP2 service port

–ws_h2_port=19780

heartbeat with meta service

–heartbeat_interval_secs=10

下面是nebula-metad.conf的配置：
########## networking ##########

Comma separated Meta Server addresses

–meta_server_addrs=192.168.10.17:9559,192.168.10.211:9559

Local IP used to identify the nebula-metad process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=192.168.10.17

Meta daemon listening port

–port=9559

HTTP service ip

–ws_ip=0.0.0.0

HTTP service port

–ws_http_port=19559

HTTP2 service port

–ws_h2_port=19560
下面是另一个节点的graphd.conf 配置：
########## networking ##########

Comma separated Meta Server Addresses

–meta_server_addrs=192.168.10.17:9559

Local IP used to identify the nebula-graphd process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=192.168.10.211

Network device to listen on

–listen_netdev=any

Port to listen on

–port=9669

To turn on SO_REUSEPORT or not

–reuse_port=false

Backlog of the listen socket, adjust this together with net.core.somaxconn

–listen_backlog=1024

Seconds before the idle connections are closed, 0 for never closed

–client_idle_timeout_secs=0

Seconds before the idle sessions are expired, 0 for no expiration

–session_idle_timeout_secs=0

The number of threads to accept incoming connections

–num_accept_threads=1

The number of networking IO threads, 0 for # of CPU cores

–num_netio_threads=0

The number of threads to execute user queries, 0 for # of CPU cores

–num_worker_threads=0

HTTP service ip

–ws_ip=0.0.0.0

HTTP service port

–ws_http_port=19669

HTTP2 service port

–ws_h2_port=19670
下面是另一节点的nebula-storage.conf 配置：
########## networking ##########

Comma separated Meta server addresses

–meta_server_addrs=192.168.10.17:9559

Local IP used to identify the nebula-storaged process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=192.168.10.211

Storage daemon listening port

–port=9779

HTTP service ip

–ws_ip=0.0.0.0

HTTP service port

–ws_http_port=19779

HTTP2 service port

–ws_h2_port=19780

heartbeat with meta service

–heartbeat_interval_secs=10

下面是另一个节点nebula-metad.conf的配置：
########## networking ##########

Comma separated Meta Server addresses

–meta_server_addrs=192.168.10.17:9559

Local IP used to identify the nebula-metad process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=192.168.10.211

Meta daemon listening port

–port=9559

HTTP service ip

–ws_ip=0.0.0.0

HTTP service port

–ws_http_port=19559

HTTP2 service port

–ws_h2_port=19560

wey · 2021 年6 月 23 日 09:17

这个错误代码的直接原因就是 graphD 上的 storage-client 的 RPC 连接断了，如果是比较接近fullscan的查询，可能有俩可能。

连接超时
storageD 无法保持这个连接了（crash，oom）

对于 1，可以调大 storage_client_timeout_ms

对于 2 如果是容器部署，可以通过查看 storaged 的 container 的 event，uptime（docker ps)看是不是 crash 过，如果是二进制部署，可以看看 storaged 所在 OS的 dmesg |grep nebula

如果 2 的话，配置如果不能提升（scale up)，可以想办法优化一下查询的语句，如果语句没法修改（从而节省内存）可以发上来看看查询的语句我们看看哈，有可能那个语句（比如 match）我们 nebula core 有优化的空间哈，可以提 issue（我们正在加紧优化 match查询哈）

参考文档
https://docs.nebula-graph.com.cn/2.0.1/2.quick-start/0.FAQ/#storage_error_e_rpc_failure

sataliulan · 2021 年6 月 23 日 10:01

我在后台日志中发现在配置ip地址时不小心将192.168.10.17 写成了192.168.l0.17.。。。目前已更正，但是我将主从节点都重新启动后，同样的查询发现日志还是报同样的错误，但是复核各个配置文件，已经配置正常；有没有办法清除原来的缓存，让数据库识别新的ip地址呢？

sataliulan · 2021 年6 月 23 日 10:27

执行查询语句：fetch prop on Disease 1121 yield Disease.name
错误日志如下：
E0623 18:20:06.761965 28924 ThriftClientManager.inl:65] Failed to resolve address for ‘192.168.l0.17’: Name or service not known (error=-2): Unknown error -2
E0623 18:20:06.762097 28924 StorageClientBase.inl:214] Request to “192.168.l0.17”:9779 failed: N6apache6thrift9transport19TTransportExceptionE: Channel is !good()
E0623 18:20:06.762162 28945 StorageAccessExecutor.h:35] GetVerticesExecutor failed, error E_RPC_FAILURE, part 2
E0623 18:20:06.762178 28945 StorageAccessExecutor.h:112] Storage Error: part: 2, error: E_RPC_FAILURE(-3).
E0623 18:20:06.762197 28945 QueryInstance.cpp:103] Storage Error: part: 2, error: E_RPC_FAILURE(-3).
E0623 18:20:07.173538 28925 ThriftClientManager.inl:65] Failed to resolve address for ‘192.168.l0.17’: Name or service not known (error=-2): Unknown error -2
E0623 18:20:07.173787 28925 StorageClientBase.inl:214] Request to “192.168.l0.17”:9779 failed: N6apache6thrift9transport19TTransportExceptionE: Channel is !good()
E0623 18:20:07.173828 28943 StorageAccessExecutor.h:35] GetVerticesExecutor failed, error E_RPC_FAILURE, part 2
E0623 18:20:07.173848 28943 StorageAccessExecutor.h:112] Storage Error: part: 2, error: E_RPC_FAILURE(-3).
E0623 18:20:07.173868 28944 QueryInstance.cpp:103] Storage Error: part: 2, error: E_RPC_FAILURE(-3).
root@hcl-OMEN-by-HP-Desktop-873-0xxx:/usr/local/nebula/logs#

附: 我已将配置文件中的192.168.l0.17 更改为192.168.10.17，并且已经重启server还是不行
这是更改后的nebula storage配置：
########## networking ##########

Comma separated Meta server addresses

–meta_server_addrs=192.168.10.17:9559,192.168.10.211:9559

Local IP used to identify the nebula-storaged process.

Change it to an address other than loopback if the service is distributed or

will be accessed remotely.

–local_ip=192.168.10.17

Storage daemon listening port

–port=9779

HTTP service ip

–ws_ip=0.0.0.0

HTTP service port

–ws_http_port=19779

HTTP2 service port

–ws_h2_port=19780

heartbeat with meta service

–heartbeat_interval_secs=10

######### Raft #########

Raft election timeout

–raft_heartbeat_interval_secs=30

RPC timeout for raft client (ms)

–raft_rpc_timeout_ms=50000

sataliulan · 2021 年6 月 23 日 11:43

@wey 我在配置文件开头添加–local_config=true，并使用命令 ./scripts/nebula.service -c etc/nebula-storaged.conf start storaged
发现无效，graphd 服务日志仍显示原来错误的配置；

yee · 2021 年6 月 24 日 03:17

在配置文件中加了 --local_config=true，之后，就不必在启动脚本再指定 conf 了：

./scripts/nebula.service start storaged

sataliulan · 2021 年6 月 24 日 03:18

尝试无效。目前已全部卸载重装。。