使用SST导入点查询不到

spark 我不熟悉哈。
然后使用SST方式导入数据,设置了spark的分区数为15,repartitionWithNebula: true,生成的SST文件一个文件夹里边是24个
你确定spart part个数和space里面的part个数一样吗

还有一个问题:我设置repartitionWithNebula: false之后,生成的SST文件变多了,直接360个,分区数已经设置15了,数据量不大才100多万,导入很慢,除了改这个repartitionWithNebula的配置还有什么办法降低呢。
repartitionWithNebula修改之前
image
repartitionWithNebula修改之后


这导入太慢了…

这个需要写spark工具的同学来回答下。

spark的分区数与part数没啥关系吧,因为我看了别的帖子有提到这个,最终生成的sst文件夹里边是15个,跟part个数一样,每个文件夹里边是24个,这个我就不知道原因了。

repartitionWithNebula 设置为true之后clear图库重新导入,点没了。。。

(root@nebula) [req_test1]> show stats
+---------+--------------------+---------+
| Type    | Name               | Count   |
+---------+--------------------+---------+
| "Tag"   | "address"          | 0       |
| "Tag"   | "company"          | 0       |
| "Tag"   | "email"            | 0       |
| "Tag"   | "idcard"           | 0       |
| "Tag"   | "ip"               | 0       |
| "Tag"   | "phone"            | 0       |
| "Tag"   | "reqno"            | 0       |
| "Edge"  | "addressid"        | 100000  |
| "Edge"  | "addressreqno"     | 100000  |
| "Edge"  | "companyid"        | 100000  |
| "Edge"  | "companyreqno"     | 100000  |
| "Edge"  | "emailid"          | 100000  |
| "Edge"  | "emailreqno"       | 100000  |
| "Edge"  | "idreqno"          | 100000  |
| "Edge"  | "ipid"             | 100000  |
| "Edge"  | "ipreqno"          | 100000  |
| "Edge"  | "phoneid"          | 100000  |
| "Edge"  | "phonereqno"       | 100000  |
| "Edge"  | "workaddressid"    | 100000  |
| "Edge"  | "workaddressreqno" | 100000  |
| "Edge"  | "workphoneid"      | 100000  |
| "Edge"  | "workphonereqno"   | 100000  |
| "Space" | "vertices"         | 740065  |
| "Space" | "edges"            | 1500000 |
+---------+--------------------+---------+
Got 24 rows (time spent 1218/2376 us)

但是vertexs有值,刚好还是对的值。。

建图语句如下:

DROP SPACE IF EXISTS req_test; 
CREATE SPACE IF NOT EXISTS req_test1(partition_num=15, replica_factor=1, vid_type=FIXED_STRING(200)); 
USE req_test; 
CREATE TAG idcard(name string, word string, num int64, isblack int64);
CREATE TAG reqno(word string, name string, num int64);
CREATE TAG email(word string, name string, num int64);
CREATE TAG phone(word string, name string, num int64);
CREATE TAG address(word string, name string, num int64);
CREATE TAG company(word string, name string, num int64);
CREATE TAG ip(word string, name string, num int64);

CREATE EDGE idreqno(word string, eventtime int64);
CREATE EDGE emailid(word string, eventtime int64);
CREATE EDGE phoneid(word string, eventtime int64);
CREATE EDGE workphoneid(word string, eventtime int64);
CREATE EDGE addressid(word string, eventtime int64);
CREATE EDGE workaddressid(word string, eventtime int64);
CREATE EDGE companyid(word string, eventtime int64);
CREATE EDGE ipid(word string, eventtime int64);
CREATE EDGE emailreqno(word string, eventtime int64);
CREATE EDGE phonereqno(word string, eventtime int64);
CREATE EDGE workphonereqno(word string, eventtime int64);
CREATE EDGE addressreqno(word string, eventtime int64);
CREATE EDGE workaddressreqno(word string, eventtime int64);
CREATE EDGE companyreqno(word string, eventtime int64);
CREATE EDGE ipreqno(word string, eventtime int64);

数据里边的vid,有固定的字符比如:email1,email2…email900,其他vid都是随机字符
随机字符不好验证,所以验证的固定字符,email100之后的vid都查询不到。

1 个赞

sst文件是在3.x版本中生成的吗?还是2.x中生成的sst,然后ingest到了3.1里?两个版本的数据格式不一样

用的最新的exchange,是3.0版本的,我贴个数据吧
email.csv (4.4 MB)

这个地方在确认下,是repartitionWithNebula设置为false时出现tag都是0的情况,如果是这是一个已知的issue会在之后修复,参考:

repartitionWithNebula 设置true ,sst文件少,统计信息正确,但是部分点查询不到
repartitionWithNebula 设置false ,sst文件多,统计信息节点为0

之前那个repartitionWithNebula: true时缺数据这个问题我们尝试复现了,但没有碰到,所以麻烦把操作过程在描述下,看看在我们这里能否复现。
repartitionWithNebula: true


repartitionWithNebula: false

要不你用一下我的数据和schema尝试一下

exchange配置

    {
      name: email
      type: {
        source: csv
        sink: sst
      }
      path: "hdfs://10.100.2.90:8020/user/graph_dev/nodeedge/email.csv"
      fields: [_c1,_c2,_c3]
      nebula.fields: [word,name,num]
      vertex: {
        field:_c0
        # policy:hash
      }
      separator: ","
      header: false
      batch: 2560
      partition: 32
      repartitionWithNebula: true
    }

跑出来的文件我自己执行命令下载下来放在指定的space id目录下,手动执行ingest导入
导入后查询,email1,email10,email500,email900这些点是否存在

根据你原始文件,应该是第三列做vid第一列做name?但是你的配置里面是第一列做vid第三列做name,是不是配置列设置错误了?

不是的,就是第一列做vid,第三列做name的,这里边存在vid是有可能重复的,但是属性不一样的情况

repartition这个配置是后加的,设置为true时会在读取源数据并shuffle排序后根据nebula的分区方式对数据进行重新分区,即nebula有多少分区 就会生成多少个sst文件,并确保每个sst文件之间的key不会overlap.
设置为false时,会根据你spark环境默认配置的shuffle partition数 n和nebula space的分区数m省生成n*m个sst文件。即m个目录 每个目录下n个sst文件。
如果你ingest之后统计为0,看看生成sst文件的exchange日志里面 这些tag 对应的数据是不是在生成的时候有失败 导致该tag并没有出来对应的sst数据。

1 个赞

这个好像是已知问题了:exchange 3.0 的生成的sst只包含不带tagID的数据
目前情况下,repartition没啥理由要设成false

1 个赞

我看了这些帖子,现在就是repartitionWithNebula: false的情况是已知bug,那这个情况先不讨论了。
我这边又重新建图尝试了一遍,按照我上边提供的方式,可以稳定复现的。

(root@nebula) [req_test]> CREATE SPACE IF NOT EXISTS req_email(partition_num=15, replica_factor=1, vid_type=FIXED_STRING(200)); 
Execution succeeded (time spent 1681/2074 us)

Wed, 08 Jun 2022 18:17:56 CST

(root@nebula) [req_test]> USE req_email;
[ERROR (-1005)]: SpaceNotFound: 

Wed, 08 Jun 2022 18:17:58 CST

(root@nebula) [req_test]> USE req_email;
Execution succeeded (time spent 1182/1652 us)

Wed, 08 Jun 2022 18:18:15 CST

(root@nebula) [req_email]> CREATE TAG email(word string, name string, num int64);
Execution succeeded (time spent 1637/2067 us)

Wed, 08 Jun 2022 18:18:18 CST

(root@nebula) [req_email]> desc space req_email
+----+-------------+------------------+----------------+---------+------------+---------------------+---------+
| ID | Name        | Partition Number | Replica Factor | Charset | Collate    | Vid Type            | Comment |
+----+-------------+------------------+----------------+---------+------------+---------------------+---------+
| 89 | "req_email" | 15               | 1              | "utf8"  | "utf8_bin" | "FIXED_STRING(200)" |         |
+----+-------------+------------------+----------------+---------+------------+---------------------+---------+
Got 1 rows (time spent 1040/1848 us)

Wed, 08 Jun 2022 18:18:41 CST

(root@nebula) [req_email]> submit job ingest
+------------+
| New Job Id |
+------------+
| 90         |
+------------+
Got 1 rows (time spent 1901/2656 us)

Wed, 08 Jun 2022 18:23:23 CST

(root@nebula) [req_email]> 
(root@nebula) [req_email]> 
(root@nebula) [req_email]> show jobs
+--------+----------+------------+----------------------------+----------------------------+
| Job Id | Command  | Status     | Start Time                 | Stop Time                  |
+--------+----------+------------+----------------------------+----------------------------+
| 90     | "INGEST" | "FINISHED" | 2022-06-08T18:23:23.000000 | 2022-06-08T18:23:52.000000 |
+--------+----------+------------+----------------------------+----------------------------+
Got 1 rows (time spent 269436/270079 us)

Wed, 08 Jun 2022 18:23:57 CST

(root@nebula) [req_email]> 
(root@nebula) [req_email]> 
(root@nebula) [req_email]> match (v) where id(v)=="email900" return v 
+---+
| v |
+---+
+---+
Empty set (time spent 10651/11010 us)

Wed, 08 Jun 2022 18:24:23 CST

(root@nebula) [req_email]> match (v) where id(v)=="email100" return v 
+---+
| v |
+---+
+---+
Empty set (time spent 2231/2716 us)

Wed, 08 Jun 2022 18:24:26 CST

(root@nebula) [req_email]> submit job compact 
+------------+
| New Job Id |
+------------+
| 91         |
+------------+
Got 1 rows (time spent 1816/2398 us)

Wed, 08 Jun 2022 18:24:33 CST

(root@nebula) [req_email]> show jobs
+--------+-----------+------------+----------------------------+----------------------------+
| Job Id | Command   | Status     | Start Time                 | Stop Time                  |
+--------+-----------+------------+----------------------------+----------------------------+
| 91     | "COMPACT" | "FINISHED" | 2022-06-08T18:24:33.000000 | 2022-06-08T18:24:37.000000 |
| 90     | "INGEST"  | "FINISHED" | 2022-06-08T18:23:23.000000 | 2022-06-08T18:23:52.000000 |
+--------+-----------+------------+----------------------------+----------------------------+
Got 2 rows (time spent 1443/2154 us)

Wed, 08 Jun 2022 18:24:38 CST

(root@nebula) [req_email]> submit job stats
+------------+
| New Job Id |
+------------+
| 92         |
+------------+
Got 1 rows (time spent 1547/2103 us)

Wed, 08 Jun 2022 18:24:43 CST

(root@nebula) [req_email]> show jobs
+--------+-----------+------------+----------------------------+----------------------------+
| Job Id | Command   | Status     | Start Time                 | Stop Time                  |
+--------+-----------+------------+----------------------------+----------------------------+
| 92     | "STATS"   | "FINISHED" | 2022-06-08T18:24:43.000000 | 2022-06-08T18:24:43.000000 |
| 91     | "COMPACT" | "FINISHED" | 2022-06-08T18:24:33.000000 | 2022-06-08T18:24:37.000000 |
| 90     | "INGEST"  | "FINISHED" | 2022-06-08T18:23:23.000000 | 2022-06-08T18:23:52.000000 |
+--------+-----------+------------+----------------------------+----------------------------+
Got 3 rows (time spent 1444/2216 us)

Wed, 08 Jun 2022 18:24:46 CST

(root@nebula) [req_email]> show stats
+---------+------------+-------+
| Type    | Name       | Count |
+---------+------------+-------+
| "Tag"   | "email"    | 81825 |
| "Space" | "vertices" | 81825 |
| "Space" | "edges"    | 0     |
+---------+------------+-------+
Got 3 rows (time spent 1049/1661 us)

Wed, 08 Jun 2022 18:24:49 CST

(root@nebula) [req_email]> match (v) where id(v)=="email100" return v 
+---+
| v |
+---+
+---+
Empty set (time spent 2183/2778 us)

Wed, 08 Jun 2022 18:25:00 CST
(root@nebula) [req_email]> match (v) where id(v)=="email10" return v 
+----------------------------------------------------------------------+
| v                                                                    |
+----------------------------------------------------------------------+
| ("email10" :email{name: "email75213", num: 6397, word: "但是历史."}) |
+----------------------------------------------------------------------+
Got 1 rows (time spent 1989/2732 us)

Wed, 08 Jun 2022 18:30:50 CST

spark执行日志,看了下没有什么异常,

22/06/08 10:17:29 INFO cluster.YarnClientSchedulerBackend: Application application_1654594985832_0109 has started running.
22/06/08 10:17:29 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44220.
22/06/08 10:17:29 INFO netty.NettyBlockTransferService: Server created on 10.100.2.189:44220
22/06/08 10:17:29 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/06/08 10:17:30 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.100.2.189, 44220, None)
22/06/08 10:17:30 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.100.2.189:44220 with 366.3 MB RAM, BlockManagerId(driver, 10.100.2.189, 44220, None)
22/06/08 10:17:30 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.100.2.189, 44220, None)
22/06/08 10:17:30 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.100.2.189, 44220, None)
22/06/08 10:17:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28ee882c{/metrics/json,null,AVAILABLE,@Spark}
22/06/08 10:17:30 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> nd1,nd2, PROXY_URI_BASES -> http://nd1:8088/proxy/application_1654594985832_0109,http://nd2:8088/proxy/application_1654594985832_0109, RM_HA_URLS -> nd1:8088,nd2:8088), /proxy/application_1654594985832_0109
22/06/08 10:17:30 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
22/06/08 10:17:35 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.100.2.90:59980) with ID 1
22/06/08 10:17:35 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
22/06/08 10:17:35 INFO exchange.Exchange$: Processing Tag email
22/06/08 10:17:35 INFO exchange.Exchange$: field keys: _c1, _c2, _c3
22/06/08 10:17:35 INFO exchange.Exchange$: nebula keys: word, name, num
22/06/08 10:17:35 INFO exchange.Exchange$: Loading CSV files from hdfs://10.100.2.90:8020/user/graph_dev/nodeedge/email.csv
22/06/08 10:17:35 INFO storage.BlockManagerMasterEndpoint: Registering block manager nd1:40011 with 912.3 MB RAM, BlockManagerId(1, nd1, 40011, None)
22/06/08 10:17:35 INFO internal.SharedState: loading hive config file: file:/home/frms/hadoop-2.6.0-cdh5.14.2/etc/hadoop/hive-site.xml
22/06/08 10:17:35 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
22/06/08 10:17:35 INFO internal.SharedState: Warehouse path is '/user/hive/warehouse'.
22/06/08 10:17:35 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL.
22/06/08 10:17:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ca3648{/SQL,null,AVAILABLE,@Spark}
22/06/08 10:17:35 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json.
22/06/08 10:17:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71c0b742{/SQL/json,null,AVAILABLE,@Spark}
22/06/08 10:17:35 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution.
22/06/08 10:17:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25ce435{/SQL/execution,null,AVAILABLE,@Spark}
22/06/08 10:17:35 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json.
22/06/08 10:17:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19ad75e5{/SQL/execution/json,null,AVAILABLE,@Spark}
22/06/08 10:17:35 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql.
22/06/08 10:17:35 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9690008{/static/sql,null,AVAILABLE,@Spark}
22/06/08 10:17:35 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
22/06/08 10:17:36 INFO datasources.InMemoryFileIndex: It took 71 ms to list leaf files for 1 paths.
22/06/08 10:17:36 INFO datasources.InMemoryFileIndex: It took 4 ms to list leaf files for 1 paths.
22/06/08 10:17:36 INFO datasources.FileSourceStrategy: Pruning directories with: 
22/06/08 10:17:36 INFO datasources.FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
22/06/08 10:17:36 INFO datasources.FileSourceStrategy: Output Data Schema: struct<value: string>
22/06/08 10:17:36 INFO execution.FileSourceScanExec: Pushed Filters: 
22/06/08 10:17:37 INFO codegen.CodeGenerator: Code generated in 183.06651 ms
22/06/08 10:17:37 INFO codegen.CodeGenerator: Code generated in 17.466741 ms
22/06/08 10:17:37 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 291.3 KB, free 366.0 MB)
22/06/08 10:17:37 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 27.3 KB, free 366.0 MB)
22/06/08 10:17:37 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.2.189:44220 (size: 27.3 KB, free: 366.3 MB)
22/06/08 10:17:37 INFO spark.SparkContext: Created broadcast 0 from csv at FileBaseReader.scala:86
22/06/08 10:17:37 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
22/06/08 10:17:37 INFO spark.SparkContext: Starting job: csv at FileBaseReader.scala:86
22/06/08 10:17:37 INFO scheduler.DAGScheduler: Got job 0 (csv at FileBaseReader.scala:86) with 1 output partitions
22/06/08 10:17:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (csv at FileBaseReader.scala:86)
22/06/08 10:17:37 INFO scheduler.DAGScheduler: Parents of final stage: List()
22/06/08 10:17:37 INFO scheduler.DAGScheduler: Missing parents: List()
22/06/08 10:17:37 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at csv at FileBaseReader.scala:86), which has no missing parents
22/06/08 10:17:37 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.9 KB, free 366.0 MB)
22/06/08 10:17:37 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.6 KB, free 366.0 MB)
22/06/08 10:17:37 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.2.189:44220 (size: 4.6 KB, free: 366.3 MB)
22/06/08 10:17:37 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
22/06/08 10:17:37 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at csv at FileBaseReader.scala:86) (first 15 tasks are for partitions Vector(0))
22/06/08 10:17:37 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
22/06/08 10:17:37 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, nd1, executor 1, partition 0, NODE_LOCAL, 8284 bytes)
22/06/08 10:17:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on nd1:40011 (size: 4.6 KB, free: 912.3 MB)
22/06/08 10:17:41 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on nd1:40011 (size: 27.3 KB, free: 912.3 MB)
22/06/08 10:17:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4098 ms on nd1 (executor 1) (1/1)
22/06/08 10:17:42 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
22/06/08 10:17:42 INFO scheduler.DAGScheduler: ResultStage 0 (csv at FileBaseReader.scala:86) finished in 4.183 s
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Job 0 finished: csv at FileBaseReader.scala:86, took 4.222972 s
22/06/08 10:17:42 INFO datasources.FileSourceStrategy: Pruning directories with: 
22/06/08 10:17:42 INFO datasources.FileSourceStrategy: Post-Scan Filters: 
22/06/08 10:17:42 INFO datasources.FileSourceStrategy: Output Data Schema: struct<value: string>
22/06/08 10:17:42 INFO execution.FileSourceScanExec: Pushed Filters: 
22/06/08 10:17:42 INFO codegen.CodeGenerator: Code generated in 7.195269 ms
22/06/08 10:17:42 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 291.3 KB, free 365.7 MB)
22/06/08 10:17:42 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 27.3 KB, free 365.7 MB)
22/06/08 10:17:42 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.100.2.189:44220 (size: 27.3 KB, free: 366.2 MB)
22/06/08 10:17:42 INFO spark.SparkContext: Created broadcast 2 from csv at FileBaseReader.scala:86
22/06/08 10:17:42 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
22/06/08 10:17:42 INFO datasources.FileSourceStrategy: Pruning directories with: 
22/06/08 10:17:42 INFO datasources.FileSourceStrategy: Post-Scan Filters: 
22/06/08 10:17:42 INFO datasources.FileSourceStrategy: Output Data Schema: struct<_c0: string, _c1: string, _c2: string, _c3: string ... 2 more fields>
22/06/08 10:17:42 INFO execution.FileSourceScanExec: Pushed Filters: 
22/06/08 10:17:42 INFO codegen.CodeGenerator: Code generated in 16.786698 ms
22/06/08 10:17:42 INFO codegen.CodeGenerator: Code generated in 12.06962 ms
22/06/08 10:17:42 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 291.2 KB, free 365.4 MB)
22/06/08 10:17:42 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 27.2 KB, free 365.4 MB)
22/06/08 10:17:42 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.100.2.189:44220 (size: 27.2 KB, free: 366.2 MB)
22/06/08 10:17:42 INFO spark.SparkContext: Created broadcast 3 from rdd at Processor.scala:241
22/06/08 10:17:42 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
22/06/08 10:17:42 INFO codegen.CodeGenerator: Code generated in 27.116243 ms
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 17
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 26
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 23
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 25
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 30
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 24
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 16
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 15
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 35
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 11
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 34
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 12
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 33
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 38
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 27
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 13
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 19
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 29
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 10
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 28
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 5
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 0
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 40
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 6
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 32
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 4
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 1
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 31
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 8
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 3
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 14
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 39
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 7
22/06/08 10:17:42 INFO spark.SparkContext: Starting job: foreachPartition at VerticesProcessor.scala:145
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Registering RDD 13 (rdd at Processor.scala:241) as input to shuffle 0
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Registering RDD 17 (rdd at Processor.scala:241) as input to shuffle 1
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Registering RDD 25 (rdd at Processor.scala:241) as input to shuffle 2
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Got job 1 (foreachPartition at VerticesProcessor.scala:145) with 15 output partitions
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (foreachPartition at VerticesProcessor.scala:145)
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3)
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[13] at rdd at Processor.scala:241), which has no missing parents
22/06/08 10:17:42 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 10.100.2.189:44220 in memory (size: 27.3 KB, free: 366.2 MB)
22/06/08 10:17:42 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 12.1 KB, free 365.7 MB)
22/06/08 10:17:42 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 6.6 KB, free 365.6 MB)
22/06/08 10:17:42 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.100.2.189:44220 (size: 6.6 KB, free: 366.2 MB)
22/06/08 10:17:42 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184
22/06/08 10:17:42 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[13] at rdd at Processor.scala:241) (first 15 tasks are for partitions Vector(0, 1))
22/06/08 10:17:42 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
22/06/08 10:17:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, nd1, executor 1, partition 0, NODE_LOCAL, 8273 bytes)
22/06/08 10:17:42 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, nd1, executor 1, partition 1, NODE_LOCAL, 8273 bytes)
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 22
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 18
22/06/08 10:17:42 INFO spark.ContextCleaner: Cleaned accumulator 2
22/06/08 10:17:42 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 10.100.2.189:44220 in memory (size: 4.6 KB, free: 366.2 MB)
22/06/08 10:17:42 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on nd1:40011 in memory (size: 4.6 KB, free: 912.3 MB)
22/06/08 10:17:43 INFO spark.ContextCleaner: Cleaned accumulator 20
22/06/08 10:17:43 INFO spark.ContextCleaner: Cleaned accumulator 9
22/06/08 10:17:43 INFO spark.ContextCleaner: Cleaned accumulator 21
22/06/08 10:17:43 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 10.100.2.189:44220 in memory (size: 27.3 KB, free: 366.3 MB)
22/06/08 10:17:43 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on nd1:40011 in memory (size: 27.3 KB, free: 912.3 MB)
22/06/08 10:17:43 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on nd1:40011 (size: 6.6 KB, free: 912.3 MB)
22/06/08 10:17:43 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on nd1:40011 (size: 27.2 KB, free: 912.3 MB)
22/06/08 10:17:43 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 567 ms on nd1 (executor 1) (1/2)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1021 ms on nd1 (executor 1) (2/2)
22/06/08 10:17:44 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
22/06/08 10:17:44 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (rdd at Processor.scala:241) finished in 1.036 s
22/06/08 10:17:44 INFO scheduler.DAGScheduler: looking for newly runnable stages
22/06/08 10:17:44 INFO scheduler.DAGScheduler: running: Set()
22/06/08 10:17:44 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 2, ShuffleMapStage 3, ResultStage 4)
22/06/08 10:17:44 INFO scheduler.DAGScheduler: failed: Set()
22/06/08 10:17:44 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[17] at rdd at Processor.scala:241), which has no missing parents
22/06/08 10:17:44 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 19.9 KB, free 366.0 MB)
22/06/08 10:17:44 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 10.0 KB, free 365.9 MB)
22/06/08 10:17:44 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.100.2.189:44220 (size: 10.0 KB, free: 366.3 MB)
22/06/08 10:17:44 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1184
22/06/08 10:17:44 INFO scheduler.DAGScheduler: Submitting 32 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[17] at rdd at Processor.scala:241) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
22/06/08 10:17:44 INFO cluster.YarnScheduler: Adding task set 2.0 with 32 tasks
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3, nd1, executor 1, partition 0, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 4, nd1, executor 1, partition 1, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 2.0 (TID 5, nd1, executor 1, partition 2, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 2.0 (TID 6, nd1, executor 1, partition 3, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 2.0 (TID 7, nd1, executor 1, partition 4, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 2.0 (TID 8, nd1, executor 1, partition 5, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 2.0 (TID 9, nd1, executor 1, partition 6, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 2.0 (TID 10, nd1, executor 1, partition 7, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 2.0 (TID 11, nd1, executor 1, partition 8, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 2.0 (TID 12, nd1, executor 1, partition 9, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on nd1:40011 (size: 10.0 KB, free: 912.3 MB)
22/06/08 10:17:44 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.100.2.90:59980
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 10.0 in stage 2.0 (TID 13, nd1, executor 1, partition 10, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 11.0 in stage 2.0 (TID 14, nd1, executor 1, partition 11, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 2.0 (TID 15, nd1, executor 1, partition 12, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 2.0 (TID 8) in 746 ms on nd1 (executor 1) (1/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 4) in 748 ms on nd1 (executor 1) (2/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 2.0 (TID 6) in 748 ms on nd1 (executor 1) (3/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 2.0 (TID 16, nd1, executor 1, partition 13, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 2.0 (TID 9) in 753 ms on nd1 (executor 1) (4/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 2.0 (TID 17, nd1, executor 1, partition 14, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 15.0 in stage 2.0 (TID 18, nd1, executor 1, partition 15, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 2.0 (TID 7) in 757 ms on nd1 (executor 1) (5/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 2.0 (TID 11) in 755 ms on nd1 (executor 1) (6/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 16.0 in stage 2.0 (TID 19, nd1, executor 1, partition 16, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 2.0 (TID 12) in 772 ms on nd1 (executor 1) (7/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 17.0 in stage 2.0 (TID 20, nd1, executor 1, partition 17, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 2.0 (TID 10) in 773 ms on nd1 (executor 1) (8/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 18.0 in stage 2.0 (TID 21, nd1, executor 1, partition 18, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 19.0 in stage 2.0 (TID 22, nd1, executor 1, partition 19, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 2.0 (TID 5) in 778 ms on nd1 (executor 1) (9/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 3) in 780 ms on nd1 (executor 1) (10/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 20.0 in stage 2.0 (TID 23, nd1, executor 1, partition 20, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 2.0 (TID 14) in 109 ms on nd1 (executor 1) (11/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 21.0 in stage 2.0 (TID 24, nd1, executor 1, partition 21, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 2.0 (TID 15) in 113 ms on nd1 (executor 1) (12/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 22.0 in stage 2.0 (TID 25, nd1, executor 1, partition 22, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 10.0 in stage 2.0 (TID 13) in 123 ms on nd1 (executor 1) (13/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 23.0 in stage 2.0 (TID 26, nd1, executor 1, partition 23, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 15.0 in stage 2.0 (TID 18) in 130 ms on nd1 (executor 1) (14/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 24.0 in stage 2.0 (TID 27, nd1, executor 1, partition 24, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 16.0 in stage 2.0 (TID 19) in 121 ms on nd1 (executor 1) (15/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 25.0 in stage 2.0 (TID 28, nd1, executor 1, partition 25, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 13.0 in stage 2.0 (TID 16) in 141 ms on nd1 (executor 1) (16/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 26.0 in stage 2.0 (TID 29, nd1, executor 1, partition 26, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 14.0 in stage 2.0 (TID 17) in 146 ms on nd1 (executor 1) (17/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 27.0 in stage 2.0 (TID 30, nd1, executor 1, partition 27, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 28.0 in stage 2.0 (TID 31, nd1, executor 1, partition 28, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 29.0 in stage 2.0 (TID 32, nd1, executor 1, partition 29, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 18.0 in stage 2.0 (TID 21) in 126 ms on nd1 (executor 1) (18/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 17.0 in stage 2.0 (TID 20) in 130 ms on nd1 (executor 1) (19/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 19.0 in stage 2.0 (TID 22) in 126 ms on nd1 (executor 1) (20/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 30.0 in stage 2.0 (TID 33, nd1, executor 1, partition 30, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 22.0 in stage 2.0 (TID 25) in 64 ms on nd1 (executor 1) (21/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Starting task 31.0 in stage 2.0 (TID 34, nd1, executor 1, partition 31, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 20.0 in stage 2.0 (TID 23) in 82 ms on nd1 (executor 1) (22/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 21.0 in stage 2.0 (TID 24) in 78 ms on nd1 (executor 1) (23/32)
22/06/08 10:17:44 INFO scheduler.TaskSetManager: Finished task 23.0 in stage 2.0 (TID 26) in 60 ms on nd1 (executor 1) (24/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 26.0 in stage 2.0 (TID 29) in 73 ms on nd1 (executor 1) (25/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 25.0 in stage 2.0 (TID 28) in 80 ms on nd1 (executor 1) (26/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 27.0 in stage 2.0 (TID 30) in 75 ms on nd1 (executor 1) (27/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 28.0 in stage 2.0 (TID 31) in 75 ms on nd1 (executor 1) (28/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 24.0 in stage 2.0 (TID 27) in 84 ms on nd1 (executor 1) (29/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 29.0 in stage 2.0 (TID 32) in 75 ms on nd1 (executor 1) (30/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 30.0 in stage 2.0 (TID 33) in 51 ms on nd1 (executor 1) (31/32)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Finished task 31.0 in stage 2.0 (TID 34) in 46 ms on nd1 (executor 1) (32/32)
22/06/08 10:17:45 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 
22/06/08 10:17:45 INFO scheduler.DAGScheduler: ShuffleMapStage 2 (rdd at Processor.scala:241) finished in 0.995 s
22/06/08 10:17:45 INFO scheduler.DAGScheduler: looking for newly runnable stages
22/06/08 10:17:45 INFO scheduler.DAGScheduler: running: Set()
22/06/08 10:17:45 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 3, ResultStage 4)
22/06/08 10:17:45 INFO scheduler.DAGScheduler: failed: Set()
22/06/08 10:17:45 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[25] at rdd at Processor.scala:241), which has no missing parents
22/06/08 10:17:45 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 38.6 KB, free 365.9 MB)
22/06/08 10:17:45 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 19.0 KB, free 365.9 MB)
22/06/08 10:17:45 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.100.2.189:44220 (size: 19.0 KB, free: 366.2 MB)
22/06/08 10:17:45 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1184
22/06/08 10:17:45 INFO scheduler.DAGScheduler: Submitting 15 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[25] at rdd at Processor.scala:241) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
22/06/08 10:17:45 INFO cluster.YarnScheduler: Adding task set 3.0 with 15 tasks
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 35, nd1, executor 1, partition 0, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 36, nd1, executor 1, partition 1, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 3.0 (TID 37, nd1, executor 1, partition 2, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 3.0 (TID 38, nd1, executor 1, partition 3, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 3.0 (TID 39, nd1, executor 1, partition 4, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 3.0 (TID 40, nd1, executor 1, partition 5, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 3.0 (TID 41, nd1, executor 1, partition 6, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 3.0 (TID 42, nd1, executor 1, partition 7, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 3.0 (TID 43, nd1, executor 1, partition 8, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 3.0 (TID 44, nd1, executor 1, partition 9, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:45 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on nd1:40011 (size: 19.0 KB, free: 912.2 MB)
22/06/08 10:17:45 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.100.2.90:59980
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 10.0 in stage 3.0 (TID 45, nd1, executor 1, partition 10, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 36) in 2066 ms on nd1 (executor 1) (1/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 11.0 in stage 3.0 (TID 46, nd1, executor 1, partition 11, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 3.0 (TID 43) in 2068 ms on nd1 (executor 1) (2/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 3.0 (TID 47, nd1, executor 1, partition 12, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 3.0 (TID 37) in 2072 ms on nd1 (executor 1) (3/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 3.0 (TID 48, nd1, executor 1, partition 13, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 3.0 (TID 44) in 2090 ms on nd1 (executor 1) (4/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 3.0 (TID 49, nd1, executor 1, partition 14, NODE_LOCAL, 7767 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 3.0 (TID 39) in 2106 ms on nd1 (executor 1) (5/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 35) in 2108 ms on nd1 (executor 1) (6/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 3.0 (TID 41) in 2108 ms on nd1 (executor 1) (7/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 3.0 (TID 42) in 2124 ms on nd1 (executor 1) (8/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 3.0 (TID 40) in 2140 ms on nd1 (executor 1) (9/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 3.0 (TID 38) in 2146 ms on nd1 (executor 1) (10/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 10.0 in stage 3.0 (TID 45) in 252 ms on nd1 (executor 1) (11/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 3.0 (TID 47) in 246 ms on nd1 (executor 1) (12/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 13.0 in stage 3.0 (TID 48) in 233 ms on nd1 (executor 1) (13/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 14.0 in stage 3.0 (TID 49) in 222 ms on nd1 (executor 1) (14/15)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 3.0 (TID 46) in 261 ms on nd1 (executor 1) (15/15)
22/06/08 10:17:47 INFO cluster.YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool 
22/06/08 10:17:47 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (rdd at Processor.scala:241) finished in 2.356 s
22/06/08 10:17:47 INFO scheduler.DAGScheduler: looking for newly runnable stages
22/06/08 10:17:47 INFO scheduler.DAGScheduler: running: Set()
22/06/08 10:17:47 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 4)
22/06/08 10:17:47 INFO scheduler.DAGScheduler: failed: Set()
22/06/08 10:17:47 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[32] at foreachPartition at VerticesProcessor.scala:145), which has no missing parents
22/06/08 10:17:47 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 34.0 KB, free 365.9 MB)
22/06/08 10:17:47 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 15.0 KB, free 365.8 MB)
22/06/08 10:17:47 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.100.2.189:44220 (size: 15.0 KB, free: 366.2 MB)
22/06/08 10:17:47 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1184
22/06/08 10:17:47 INFO scheduler.DAGScheduler: Submitting 15 missing tasks from ResultStage 4 (MapPartitionsRDD[32] at foreachPartition at VerticesProcessor.scala:145) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
22/06/08 10:17:47 INFO cluster.YarnScheduler: Adding task set 4.0 with 15 tasks
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 50, nd1, executor 1, partition 0, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 51, nd1, executor 1, partition 1, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 4.0 (TID 52, nd1, executor 1, partition 2, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 4.0 (TID 53, nd1, executor 1, partition 3, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 4.0 (TID 54, nd1, executor 1, partition 4, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 4.0 (TID 55, nd1, executor 1, partition 5, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 4.0 (TID 56, nd1, executor 1, partition 6, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 4.0 (TID 57, nd1, executor 1, partition 7, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 4.0 (TID 58, nd1, executor 1, partition 8, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 4.0 (TID 59, nd1, executor 1, partition 9, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:47 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on nd1:40011 (size: 15.0 KB, free: 912.2 MB)
22/06/08 10:17:47 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.100.2.90:59980
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Starting task 10.0 in stage 4.0 (TID 60, nd1, executor 1, partition 10, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 4.0 (TID 58) in 786 ms on nd1 (executor 1) (1/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Starting task 11.0 in stage 4.0 (TID 61, nd1, executor 1, partition 11, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 4.0 (TID 52) in 787 ms on nd1 (executor 1) (2/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 4.0 (TID 62, nd1, executor 1, partition 12, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 4.0 (TID 54) in 787 ms on nd1 (executor 1) (3/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 4.0 (TID 63, nd1, executor 1, partition 13, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 4.0 (TID 57) in 788 ms on nd1 (executor 1) (4/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 4.0 (TID 64, nd1, executor 1, partition 14, NODE_LOCAL, 7673 bytes)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 4.0 (TID 53) in 790 ms on nd1 (executor 1) (5/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 50) in 792 ms on nd1 (executor 1) (6/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 51) in 791 ms on nd1 (executor 1) (7/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 4.0 (TID 59) in 791 ms on nd1 (executor 1) (8/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 4.0 (TID 55) in 791 ms on nd1 (executor 1) (9/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 4.0 (TID 56) in 791 ms on nd1 (executor 1) (10/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 4.0 (TID 61) in 196 ms on nd1 (executor 1) (11/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 4.0 (TID 62) in 196 ms on nd1 (executor 1) (12/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 14.0 in stage 4.0 (TID 64) in 196 ms on nd1 (executor 1) (13/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 10.0 in stage 4.0 (TID 60) in 199 ms on nd1 (executor 1) (14/15)
22/06/08 10:17:48 INFO scheduler.TaskSetManager: Finished task 13.0 in stage 4.0 (TID 63) in 199 ms on nd1 (executor 1) (15/15)
22/06/08 10:17:48 INFO cluster.YarnScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool 
22/06/08 10:17:48 INFO scheduler.DAGScheduler: ResultStage 4 (foreachPartition at VerticesProcessor.scala:145) finished in 1.002 s
22/06/08 10:17:48 INFO scheduler.DAGScheduler: Job 1 finished: foreachPartition at VerticesProcessor.scala:145, took 5.430672 s
22/06/08 10:17:48 INFO exchange.Exchange$: import for tag email cost time: 6.26 s
22/06/08 10:17:48 INFO exchange.Exchange$: SST-Import: failure.email: 0
22/06/08 10:17:48 WARN exchange.Exchange$: Edge is not defined
22/06/08 10:17:48 INFO server.AbstractConnector: Stopped Spark@31447150{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
22/06/08 10:17:48 INFO ui.SparkUI: Stopped Spark web UI at http://10.100.2.189:4040
22/06/08 10:17:48 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/06/08 10:17:48 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/06/08 10:17:48 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/06/08 10:17:48 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/06/08 10:17:48 INFO cluster.YarnClientSchedulerBackend: Stopped
22/06/08 10:17:48 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/06/08 10:17:48 INFO memory.MemoryStore: MemoryStore cleared
22/06/08 10:17:48 INFO storage.BlockManager: BlockManager stopped
22/06/08 10:17:48 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/06/08 10:17:48 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/06/08 10:17:48 INFO spark.SparkContext: Successfully stopped SparkContext
22/06/08 10:17:48 INFO util.ShutdownHookManager: Shutdown hook called
22/06/08 10:17:48 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5007ddfb-c0af-4ae8-94c5-1a05a79d58d1
22/06/08 10:17:48 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d88830f6-d805-4311-8561-1e65cab67da8

show stats 的大小应该是正确的,但是查询不到一些vid。
image

这边复现出来了,正在查原因

1 个赞

@codelone 这个已经定位了,是Exchange生成SST的时候通过vid计算partition id的算法有些问题,当字符串长度为8的时候,计算方法是错的,这就导致数据虽然被写入Nebula,但是在fetch或match的时候有可能取不到数据,但是stats或者lookup由于是直接scan的,是可以查到数据的。
我会提一个issue,后面在Exchange中改掉。目前你可以先用client模式。
非常感谢!

2 个赞