提问参考模版:
-
nebula 版本:1.0.1
-
部署方式(分布式 / 单机 / Docker / DBaaS):分布式
-
出问题的 Space 的创建方式:CREATE SPACE
alecliu_test
(partition_num = 100, replica_factor = 1, charset = utf8, collate = utf8_bin) -
问题的具体描述
使用spark writer 在YARN-client模式下导入没有报异常但是图数据库中没有导入数据,数据源为hive
- 配置文件:
{
spark: {
app: {
name: Spark Writer
}
driver: {
cores: 8
maxResultSize: 8G
}
cores {
max: 64
}
}
nebula: {
# 查询引擎 IP 列表
addresses: ["xxxxxxx"]
# 连接 Nebula Graph 服务的用户名和密码
user: xxxx
pswd: xxxxxx
# Nebula Graph 图空间名称
space: alecliu_test
# thrift 超时时长及重试次数
# 如未设置,则默认值分别为 3000 和 3
connection {
timeout: 3000
retry: 3
}
execution {
retry: 3
}
}
tags: {
# 与上述类似
# 从 Hive 加载将执行命令 $ {exec} 作为数据集
player: {
type: hive
exec: "select cast(dst_dataset_id as bigint) as pid , hive_user , dst_dataset_id from imd_datamap_safe.lineage_spark_etl_job where dt='2020-07-15'"
fields: {
hive_user:name,
dst_dataset_id:age
}
vertex: pid
batch: 160
}
team: {
type: hive
exec: "select cast(dst_db_id as bigint) as tid , dst_cluster from imd_datamap_safe.lineage_spark_etl_job where dt='2020-07-15'"
fields: {
dst_cluster : team
}
vertex: tid
batch: 160
}
}
edges: {
serve: {
type: hive
exec: "select cast(dst_dataset_id as bigint) as pid , cast(dst_db_id as bigint) as tid ,start_time ,end_time from imd_datamap_safe.lineage_spark_etl_job where dt='2020-07-15'"
fields: {
start_time: start_year,
end_time: end_year
}
source: pid
target: tid
batch: 160
}
}
}
- 命令
spark-submit \
--class com.vesoft.nebula.tools.generator.v2.SparkClientGenerator \
--master yarn \
--deploy-mode client --num-executors 50 --executor-memory 2G --executor-cores 4 \
./sst.generator-1.0.0-beta.jar -c ./import_test.conf -h -d
- 日志