## 问题描述

通过 `nebula-exchange-2.6.1.jar` 从`clickhouse`导入数据至`nebula`遇到几个问题。

### 问题1：

在`clickhouse_application.conf`中：

- 如果配置所有的`metad`节点地址，那么报错`UnknownHostException`。
- 如果只填写报错信息后的`metad`节点地址， 那么不会报错
- 如果只填写另两个`metad`节点地址中的任意一个， 那么报相同错`UnknownHostException`。

报错信息如下：

```bash
22/02/21 08:07:21 ERROR MetaClient: Get Space Error: java.net.UnknownHostException: metad0
Exception in thread "main" com.facebook.thrift.transport.TTransportException: java.net.UnknownHostException: metad0
        at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
        at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:145)
        at com.vesoft.nebula.client.meta.MetaClient.freshClient(MetaClient.java:165)
        at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:227)
        at com.vesoft.nebula.client.meta.MetaClient.getTags(MetaClient.java:255)
        at com.vesoft.nebula.exchange.MetaProvider.getLabelType(MetaProvider.scala:93)
        at com.vesoft.nebula.exchange.utils.NebulaUtils$.getDataSourceFieldType(NebulaUtils.scala:31)
        at com.vesoft.nebula.exchange.processor.VerticesProcessor.process(VerticesProcessor.scala:111)
        at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:150)
        at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:126)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:126)
        at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: metad0
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
        ... 24 more
```



### 问题 2：

导入数据的所有中文显示为`?`



## 运行环境

### `Docker` 环境

```bash
λ docker --version
Docker version 20.10.12, build e91ed57

λ docker-compose --version
Docker Compose version v2.2.3
```

![docker-version](./images/docker-version.PNG)

![docker-compose-version](./images/docker-compose-version.PNG)

### `Clickhouse` 环境

版本：22.1.3.7

![clickhouse-version](./images/clickhouse-version.PNG)

部署方式：docker-compose 部署

```yaml
version: '2'

services:
  server1:
    image: clickhouse/clickhouse-server:latest
    container_name: buckets-nebula-clickhouse
    environment:
      - TZ=UTC
    ports:
      - "20044:8123"
      - "20045:9000"
    volumes:
      - ./conf/server1:/etc/clickhouse-server
      - ./data/server1/clickhouse:/var/lib/clickhouse
    networks:
      - nebula-net-clickhouse

networks:
  nebula-net-clickhouse:
    name: nebula-net-clickhouse
```

数据：官方`basketballplayer`，额外添加一个存在中文属性的player顶点

```sql
DROP DATABASE IF EXISTS basketballplayer;
CREATE DATABASE IF NOT EXISTS basketballplayer ENGINE = Ordinary;

DROP TABLE IF EXISTS basketballplayer.player;
CREATE TABLE IF NOT EXISTS basketballplayer.player(playerid int, name String, age int) ENGINE = Log;
INSERT INTO basketballplayer.player (playerid, name, age) VALUES
    (100, 'Tim Duncan', 42), 
    (101, 'Tony Parker', 36),
    (102, 'LaMarcus Aldridge', 33),
    (103, 'Rudy Gay', 32),
    (104, 'Marco Belinelli', 32),
    (105, 'Danny Green', 31),
    (106, 'Kyle Anderson', 25),
    (107, 'Aron Baynes', 32),
    (108, 'Boris Diaw', 36),
    (109, '姚明', 40);
SELECT playerid, name, age FROM basketballplayer.player;

DROP TABLE IF EXISTS basketballplayer.team;
CREATE TABLE IF NOT EXISTS basketballplayer.team(teamid int, name String) ENGINE = Log;
INSERT INTO basketballplayer.team (teamid, name) VALUES
    (200, 'Warriors'),
    (201, 'Nuggets'),
    (202, 'Rockets'),
    (203, 'Trail'),
    (204, 'Spurs'),
    (205, 'Thunders'),
    (206, 'Jazz'),
    (207, 'Clippers'),
    (208, 'Kings');
SELECT teamid, name FROM basketballplayer.team;

DROP TABLE IF EXISTS basketballplayer.follow;
CREATE TABLE IF NOT EXISTS basketballplayer.follow(src_player int, dst_player int, degree int) ENGINE = Log;
INSERT INTO basketballplayer.follow (src_player, dst_player, degree) VALUES
    (100, 101, 95),
    (100, 102, 90),
    (101, 100, 95),
    (102, 101, 75),
    (102, 100, 75),
    (103, 102, 70),
    (104, 101, 50),
    (104, 105, 60),
    (105, 104, 83);
SELECT src_player, dst_player, degree FROM basketballplayer.follow;


DROP TABLE IF EXISTS basketballplayer.serve;
CREATE TABLE IF NOT EXISTS basketballplayer.serve(playerid int, teamid int, start_year int, end_year int) ENGINE = Log;
INSERT INTO basketballplayer.serve (playerid, teamid, start_year, end_year) VALUES
    (100, 200, 1997, 2016),
    (101, 201, 1999, 2018),
    (102, 203, 2006, 2015),
    (102, 204, 2015, 2019),
    (103, 204, 2017, 2019),
    (104, 200, 2007, 2009),
    (109, 202, 2002, 2011);
SELECT playerid, teamid, start_year, end_year FROM basketballplayer.serve;
```



### `Spark` 环境

`java` 版本：`OpenJDK 64-Bit Server VM, Java 1.8.0_252`

`scala` 版本：`Scala version 2.11.12`

`spark` 版本：`version 2.4.6`

![spark-version](./images/spark-version.PNG)

部署方式：docker-compose 部署

```yaml
version: '2'

services:
  master:
    image: docker.io/bitnami/spark:2.4.6
    container_name: buckets-nebula-spark-master
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    ports:
      - '20050:8080'
      - '20051:7077'
    networks:
      - nebula-net-spark

  worker:
    image: docker.io/bitnami/spark:2.4.6
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://master:7077
      - SPARK_WORKER_MEMORY=1G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    networks:
      - nebula-net-spark

networks:
  nebula-net-spark:
    name: nebula-net-spark
```



### `Nebula` 环境

使用 docker-compose 部署： 

```bash
$ git clone -b v2.6 https://github.com/vesoft-inc/nebula-docker-compose.git
```

修改项:

- `--v=0` 修改为 `--v=3`

### 服务状态

```shell
docker-compose ps

docker exec -it nebula-docker-compose-graphd-1 bash ./scripts/nebula.service status graphd
docker exec -it nebula-docker-compose-graphd1-1 bash ./scripts/nebula.service status graphd
docker exec -it nebula-docker-compose-graphd2-1 bash ./scripts/nebula.service status graphd
docker exec -it nebula-docker-compose-metad0-1 bash ./scripts/nebula.service status metad
docker exec -it nebula-docker-compose-metad1-1 bash ./scripts/nebula.service status metad
docker exec -it nebula-docker-compose-metad2-1 bash ./scripts/nebula.service status metad
docker exec -it nebula-docker-compose-storaged0-1 bash ./scripts/nebula.service status storaged
docker exec -it nebula-docker-compose-storaged1-1 bash ./scripts/nebula.service status storaged
docker exec -it nebula-docker-compose-storaged2-1 bash ./scripts/nebula.service status storaged
```

![nebula-graph-ps](./images/nebula-graph-ps.PNG)



### 创建`Schema`

通过 `studio` 界面连接并通过控制台界面创建 `schema`

```sql
## 创建图空间
CREATE SPACE IF NOT EXISTS basketballplayer (partition_num = 10,  replica_factor = 1, vid_type = FIXED_STRING(30));

## 创建 Tag player
CREATE TAG player(name string, age int);

## 创建 Tag team
CREATE TAG team(name string);

## 创建 Edge type follow
CREATE EDGE follow(degree int);

## 创建 Edge type serve
CREATE EDGE serve(start_year int, end_year int);
```

### 配置信息

1 填写所有 `metad` 地址（`address`部分）

> 注： 有些为临时端口， 重启docker-compose 后需要修改。

```conf
{
    # 以下为 Nebula Graph 的 Graph 服务和 Meta 服务所在机器的 IP 地址及端口。
    graph:["192.168.0.4:9669", "192.168.0.4:63264", "192.168.0.4:63271"]
    meta:["192.168.0.4:63237", "192.168.0.4:63241", "192.168.0.4:63244"]
}
```

2 填写指定 `metad` 地址（`address`部分）

> 注： 有些为临时端口， 重启docker-compose 后需要修改。

```conf
{
    # 以下为 Nebula Graph 的 Graph 服务和 Meta 服务所在机器的 IP 地址及端口。
    graph:["192.168.0.4:9669"]
    meta:["192.168.0.4:63237"]
}
```

3 配置文件（全）

```conf
{
  # Spark 相关配置
  spark: {
    app: {
      name: Nebula Exchange 2.6.1
    }
    driver: {
      cores: 1
      maxResultSize: 1G
    }
    cores {
      max: 16
    }
  }

# Nebula Graph 相关配置
  nebula: {
    address:{
      # 以下为 Nebula Graph 的 Graph 服务和 Meta 服务所在机器的 IP 地址及端口。

      # # test 1
      # graph:["192.168.0.4:9669", "192.168.0.4:63264", "192.168.0.4:63271"]
      # meta:["192.168.0.4:63237", "192.168.0.4:63241", "192.168.0.4:63244"]

      # test 2
      graph:["192.168.0.4:9669"]
      meta:["192.168.0.4:63237"]

    }
    user: root
    pswd: nebula
    space: basketballplayer
    connection {
      timeout: 3000
      retry: 3
    }
    execution {
      retry: 3
    }
    error: {
      max: 32
      output: /tmp/errors
    }
    rate: {
      limit: 1024
      timeout: 1000
    }
  }
  tags: [
    # 设置 Tag player 相关信息。
    {
      name: player
      type: {
        source: clickhouse
        sink: client
      }
      url:"jdbc:clickhouse://192.168.0.4:20044/basketballplayer?characterEncoding=UTF-8"
      user:"default"
      password:""
      numPartition:"5"
      sentence:"select * from player"
      fields: [name,age]
      nebula.fields: [name,age]
      vertex: {
        field:playerid
      }
      batch: 256
      partition: 32
    }
    {
      name: team
      type: {
        source: clickhouse
        sink: client
      }
      url:"jdbc:clickhouse://192.168.0.4:20044/basketballplayer?characterEncoding=UTF-8"
      user:"default"
      password:""
      numPartition:"5"
      sentence:"select * from team"
      fields: [name]
      nebula.fields: [name]
      vertex: {
        field:teamid
      }
      batch: 256
      partition: 32
    }
  ]
  edges: [
    {
      name: follow
      type: {
        source: clickhouse
        sink: client
      }
      url:"jdbc:clickhouse://192.168.0.4:20044/basketballplayer?characterEncoding=UTF-8"
      user:"default"
      password:""
      numPartition:"5"
      sentence:"select * from follow"
      fields: [degree]
      nebula.fields: [degree]
      source: {
        field:src_player
      }
      target: {
        field:dst_player
      }
      batch: 256
      partition: 32
    }
    {
      name: serve
      type: {
        source: clickhouse
        sink: client
      }
      url:"jdbc:clickhouse://192.168.0.4:20044/basketballplayer?characterEncoding=UTF-8"
      user:"default"
      password:""
      numPartition:"5"
      sentence:"select * from serve"
      fields: [start_year,end_year]
      nebula.fields: [start_year,end_year]
      source: {
        field:playerid
      }
      target: {
        field:teamid
      }
      batch: 256
      partition: 32
    }
  ]
}

```



### 执行导入

上传配置文件和`nebula-exchange-2.6.1.jar`至 `buckets-nebula-spark-master:/tmp`

```bash
docker exec -it buckets-nebula-spark-master bash ./bin/spark-submit --master spark://master:7077 --class com.vesoft.nebula.exchange.Exchange  /tmp/nebula-exchange-2.6.1.jar  -c /tmp/clickhouse_application.conf
```



## 其他信息

```tex
nebula-graph-issue-xsh
|--- `images`: 截图信息
|--- `logs-0-before-load1`: 完成schema后备份
|--- `logs-0-after-load1`: 全填写导入报错失败后备份
|--- `logs-0-after-load2`: 填写报错信息后描述`metad`节点导入成功后备份
|--- `issue.md`: 问题记录
|--- `load1-info.txt`: 全填写导入控制台信息记录（报错）
|--- `load2-info.txt`: 填写报错信息后描述`metad`节点导入控制台信息记录（成功）
```

