exchange导入数据失败

Thericecookers · 2021 年6 月 15 日 05:09

这个问题我也遇到了，定位到了exchange对应代码是：

// $ nebula-spark-utils/nebula-exchange/src/main/scala/com/vesoft/nebula/exchange/Exchange.scala

// reimport for failed tags and edges
    if (ErrorHandler.existError(configs.errorConfig.errorPath)) {
      val batchSuccess = spark.sparkContext.longAccumulator(s"batchSuccess.reimport")
      val batchFailure = spark.sparkContext.longAccumulator(s"batchFailure.reimport")
      val data         = spark.read.text(configs.errorConfig.errorPath)
      val processor    = new ReloadProcessor(data, configs, batchSuccess, batchFailure)
      processor.process()
      LOG.info(s"batchSuccess.reimport: ${batchSuccess.value}")
      LOG.info(s"batchFailure.reimport: ${batchFailure.value}")
    }

这段代码应该是用于拉起失败的batch的，我还没有看里面的具体逻辑，但从我测试的结果来看，失败的任务信息是会被缓存起来的，即使kill掉对应进程，在重启nebula-exchange时也会拉起之前失败的任务。具体是不是这样还需要看一下代码逻辑。
所以我猜测你的问题可能是由于之前没有修改application.conf，然后exchange跑了一个OCR数据（可能是默认的？），任务失败后（大概率是因为缺少space），失败的batch信息被缓存起来。因此之后再调用exchange时，在跑完当前设置的tags/edges后会拉起之前失败的这个任务，因此出现了Space was not chosen.
#使用spark exchange导入报错
这个问题似乎也可以通过注释重启batch逻辑的代码段解决