spark reader timeout error

部署方式:单机,源码编译部署
数据量:1600w节点,1500w边

使用spark reader 报错信息如下

ERROR [main] - List Spaces Error: java.net.SocketTimeoutException: Read timed out
 INFO [main] - return prop set for label N100 : [StructField(_vertexId,StringType,false)]
 INFO [main] - vertex course schema: 
root
 |-- _vertexId: string (nullable = false)

 INFO [main] - vertex course schema111: 
 INFO [main] - Code generated in 261.686111 ms
 INFO [main] - Code generated in 23.007125 ms
 INFO [main] - Starting job: show at NebulaReaderExample.java:51
 INFO [dag-scheduler-event-loop] - Got job 0 (show at NebulaReaderExample.java:51) with 1 output partitions
 INFO [dag-scheduler-event-loop] - Final stage: ResultStage 0 (show at NebulaReaderExample.java:51)
 INFO [dag-scheduler-event-loop] - Parents of final stage: List()
 INFO [dag-scheduler-event-loop] - Missing parents: List()
 INFO [dag-scheduler-event-loop] - Submitting ResultStage 0 (MapPartitionsRDD[4] at show at NebulaReaderExample.java:51), which has no missing parents
 INFO [dag-scheduler-event-loop] - Block broadcast_0 stored as values in memory (estimated size 5.7 KB, free 2004.6 MB)
 INFO [dag-scheduler-event-loop] - Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.2 KB, free 2004.6 MB)
 INFO [dispatcher-event-loop-5] - Added broadcast_0_piece0 in memory on rrcs-172-254-221-193.nyc.biz.rr.com:55903 (size: 3.2 KB, free: 2004.6 MB)
 INFO [dag-scheduler-event-loop] - Created broadcast 0 from broadcast at DAGScheduler.scala:1161
 INFO [dag-scheduler-event-loop] - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at NebulaReaderExample.java:51) (first 15 tasks are for partitions Vector(0))
 INFO [dag-scheduler-event-loop] - Adding task set 0.0 with 1 tasks
 INFO [dispatcher-event-loop-6] - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7651 bytes)
 INFO [Executor task launch worker for task 0] - Running task 0.0 in stage 0.0 (TID 0)
ERROR [Executor task launch worker for task 0] - List Spaces Error: java.net.SocketTimeoutException: Read timed out
ERROR [Executor task launch worker for task 0] - Get Parts failed: java.net.SocketTimeoutException: Read timed out
 INFO [Executor task launch worker for task 0] - partition index: 0, scanPart: []
zz_1021
 INFO [Executor task launch worker for task 0] - Finished task 0.0 in stage 0.0 (TID 0). 1069 bytes result sent to driver
 INFO [task-result-getter-0] - Finished task 0.0 in stage 0.0 (TID 0) in 2110 ms on localhost (executor driver) (1/1)
 INFO [task-result-getter-0] - Removed TaskSet 0.0, whose tasks have all completed, from pool 
 INFO [dag-scheduler-event-loop] - ResultStage 0 (show at NebulaReaderExample.java:51) finished in 2.640 s
 INFO [main] - Job 0 finished: show at NebulaReaderExample.java:51, took 2.832551 s
 INFO [main] - Starting job: show at NebulaReaderExample.java:51
 INFO [dag-scheduler-event-loop] - Got job 1 (show at NebulaReaderExample.java:51) with 4 output partitions
 INFO [dag-scheduler-event-loop] - Final stage: ResultStage 1 (show at NebulaReaderExample.java:51)
 INFO [dag-scheduler-event-loop] - Parents of final stage: List()
 INFO [dag-scheduler-event-loop] - Missing parents: List()
 INFO [dag-scheduler-event-loop] - Submitting ResultStage 1 (MapPartitionsRDD[4] at show at NebulaReaderExample.java:51), which has no missing parents
 INFO [dag-scheduler-event-loop] - Block broadcast_1 stored as values in memory (estimated size 5.7 KB, free 2004.6 MB)
 INFO [dag-scheduler-event-loop] - Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.2 KB, free 2004.6 MB)
 INFO [dispatcher-event-loop-1] - Added broadcast_1_piece0 in memory on rrcs-172-254-221-193.nyc.biz.rr.com:55903 (size: 3.2 KB, free: 2004.6 MB)
 INFO [dag-scheduler-event-loop] - Created broadcast 1 from broadcast at DAGScheduler.scala:1161
 INFO [dag-scheduler-event-loop] - Submitting 4 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at show at NebulaReaderExample.java:51) (first 15 tasks are for partitions Vector(1, 2, 3, 4))
 INFO [dag-scheduler-event-loop] - Adding task set 1.0 with 4 tasks
 INFO [dispatcher-event-loop-2] - Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7651 bytes)
 INFO [dispatcher-event-loop-2] - Starting task 1.0 in stage 1.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7651 bytes)
 INFO [dispatcher-event-loop-2] - Starting task 2.0 in stage 1.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7651 bytes)
 INFO [dispatcher-event-loop-2] - Starting task 3.0 in stage 1.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7651 bytes)
 INFO [Executor task launch worker for task 1] - Running task 0.0 in stage 1.0 (TID 1)
 INFO [Executor task launch worker for task 2] - Running task 1.0 in stage 1.0 (TID 2)
 INFO [Executor task launch worker for task 3] - Running task 2.0 in stage 1.0 (TID 3)
 INFO [Executor task launch worker for task 4] - Running task 3.0 in stage 1.0 (TID 4)
ERROR [Executor task launch worker for task 1] - List Spaces Error: java.net.SocketTimeoutException: Read timed out
ERROR [Executor task launch worker for task 4] - List Spaces Error: java.net.SocketTimeoutException: Read timed out
ERROR [Executor task launch worker for task 2] - List Spaces Error: java.net.SocketTimeoutException: Read timed out
ERROR [Executor task launch worker for task 3] - List Spaces Error: java.net.SocketTimeoutException: Read timed out
ERROR [Executor task launch worker for task 4] - Get Parts failed: java.net.SocketTimeoutException: Read timed out
ERROR [Executor task launch worker for task 2] - Get Parts failed: java.net.SocketTimeoutException: Read timed out
 INFO [Executor task launch worker for task 4] - partition index: 4, scanPart: []

meta客户端连接超时了,建议将超时时间设置长一些

泥抠大佬,这个需要在哪设置

https://docs.nebula-graph.com.cn/manual-CN/3.build-develop-and-administration/3.configurations/1.config-persistency-and-priority/
我猜是改配置文件,默认路径
/usr/local/nebula/etc/

好,我试下,谢谢amber

改了storage.conf,不成
metad.conf里无设置超时的

稍等我下,在spark-connector中开个设置timeout的口子

好,辛苦大佬

https://github.com/vesoft-inc/nebula-java/pull/204 可以先用这个把timeout设大点。你报错的timeout默认值是1000

好,我试试

泥抠大佬,666~~ :rofl: