exchange导入通用jdbc时联表查询问题

1萨达 · 2024 年8 月 26 日 05:31

使用exchange导入达梦数据库内容使用通用jdbc方式,其中需要sentence写一个联表查询的结果写入tag,看官方说明spark2.4后sentence配置和table配置不能同时写,因此就注释了table,但这样会报错说table配置项不存在,有没有能帮忙解决下

MuYi-方扬 · 2024 年8 月 26 日 18:40

你配置项贴出来瞅瞅？

1萨达 · 2024 年8 月 28 日 13:19

{
# Spark 相关配置
spark: {
app: {
name: NebulaGraph Exchange 3.5.0
}
driver: {
cores: 1
maxResultSize: 1G
}
executor: {
memory:1G
}
cores: {
max: 16
}
}
# NebulaGraph 相关配置
nebula: {
address:{
# 指定 Graph 服务和所有 Meta 服务的 IP 地址和端口。
# 如果有多台服务器，地址之间用英文逗号（,）分隔。
# 格式：“ip1:port”,“ip2:port”,“ip3:port”
graph:[“XXX:9669”,“XXX:9669”,“XXX:9669”]
#任意一个 Meta 服务的地址。
#如果您的 NebulaGraph 在虚拟网络中，如k8s，请配置 Leader Meta的地址。
meta:[“XXX:9559”]
}
# 指定拥有 NebulaGraph 写权限的用户名和密码。
user: user
pswd: “pass”
# 指定图空间名称。
space: test-space
connection: {
timeout: 3000
retry: 3
}
execution: {
retry: 3
}
error: {
max: 32
output: /tmp/errors
}
rate: {
limit: 1024
timeout: 1000
}
}
# 处理点
tags: [
# 设置 Tag 相关信息。
{
name: bg_syfw
type: {
source: jdbc
sink: client
}
url:“jdbc:dm://xx.xx.xx.x:xxxx”
driver:“dm.jdbc.driver.DmDriver”
user:“user”
password:“password”
#table:“table1,table2”
sentence:“select t1.field1,t1.field2,t1.field3 from table_1 t1 where not exists (select 1 from table_2 t2 where t2.xx=t1.xx )”
fetchSize:2 # 每次请求数据库要读取的行数。
# 在 fields 里指定 player 表中的列名称，其对应的 value 会作为 NebulaGraph 中指定属性。
# fields 和 nebula.fields 里的配置必须一一对应。
# 如果需要指定多个列名称，用英文逗号（,）隔开。
fields: [field1,field2,field3]
nebula.fields: [field1,field2,field3]
# 指定表中某一列数据为 NebulaGraph 中点 VID 的来源。
vertex: {
field:field1
# udf:{
# separator:“_”
# oldColNames:[field-0,field-1,field-2]
# newColName:new-field
# }
}
# 单批次写入 NebulaGraph 的数据条数。
batch: 256
# Spark 分区数量
partition: 32
}
]
}
配置是这样的
使用spark2.4-3.5.0.jar
如果将table注释就会报错No configuration setting found for key ‘table’
如果不注释table,写入联表的table1,table2,就会报错Found duplicate column(s) in the data schema: filed1 …，因为两张表中有部分字段列名一样
但是看官方说明 spark2.4版本中table配置和sentence不能同时写,写联表时只写sentence就行,但是注释table配置又会报错

system · 2024 年9 月 27 日 13:20

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。