Apache Canal 使用案例


Canal 是用于解析数据库日志的工具,可用于 MySQL 同步数据到 ES。

介绍

Canal 伪装自己为 MySQL slaveMySQL master 把 binlog 推送到 slave,Canal 解析 binlog。

canal.png


使用之前需注意:

  • 要开启 MySQL binlog:
    [mysqld]
    log-bin=mysql-bin #添加这一行就ok
    binlog-format=ROW #选择row模式
    server_id=1 #配置mysql replaction需要定义,不能和canal的slaveId重复
    
  • 要给 Canal 一个权限足够的 MySQL 用户:
    CREATE USER canal IDENTIFIED BY 'canal'; 
    GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
    -- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
    FLUSH PRIVILEGES;
    

当前版本的 Canal release 页面发布了 3 个组件:

  • canal.adapter:用于同步数据。
  • canal.deploy:部署 canal server。
  • canal.example:client 的样例。

以上组件都是开箱即用的(但是经我试用,基本都不能直接使用需要自己从源代码编译)。


本文使用 1.1.2 版本,由于 Canal 源码有一些编译问题,进行了小小的修改,这个版本可以直接编译使用。这里要编译 deployerclient-adapter

在项目中使用 Canal 时要有 serverclient

  • server 用于监控 MySQL binlog,可直接用 canal.deploy 部署。
  • client 用户数据解析,可参考 canal.example 开发;同步数据可以直接用 canal.adapter。

源码主要部分有:

  • protocol:解析 MySQL 协议。
  • server:server 端,监控 MySQL。
  • client:client 端,可以自定义解析。
  • adapter:同步数据用到的主要代码,可用于 ES(基于 SPI)。

使用

准备

准备和 Nifi 中相同的表结构和 ES index、mapping 结构。

server

server 的目录结构:

server.png

主要介绍 conf:

  • canal.properties:canal 的配置信息,默认监控的 instance 是 example(重要部分有中文注释):
    #################################################
    #########       common argument     #############
    #################################################
    canal.id = 1
    canal.ip =
    canal.port = 11111
    canal.metrics.pull.port = 11112
    # HA 模式下要设置
    canal.zkServers =
    # flush data to zk
    canal.zookeeper.flush.period = 1000
    canal.withoutNetty = false
    # tcp, kafka, RocketMQ
    canal.serverMode = tcp
    # flush meta cursor/parse position to file
    canal.file.data.dir = ${canal.conf.dir}
    canal.file.flush.period = 1000
    ## memory store RingBuffer size, should be Math.pow(2,n)
    canal.instance.memory.buffer.size = 16384
    ## memory store RingBuffer used memory unit size , default 1kb
    canal.instance.memory.buffer.memunit = 1024
    ## meory store gets mode used MEMSIZE or ITEMSIZE
    canal.instance.memory.batch.mode = MEMSIZE
    canal.instance.memory.rawEntry = true
      
    ## detecing config
    canal.instance.detecting.enable = false
    #canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
    canal.instance.detecting.sql = select 1
    canal.instance.detecting.interval.time = 3
    canal.instance.detecting.retry.threshold = 3
    canal.instance.detecting.heartbeatHaEnable = false
      
    # support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
    canal.instance.transaction.size =  1024
    # mysql fallback connected to new master should fallback times
    canal.instance.fallbackIntervalInSeconds = 60
      
    # network config
    canal.instance.network.receiveBufferSize = 16384
    canal.instance.network.sendBufferSize = 16384
    canal.instance.network.soTimeout = 30
      
    # binlog filter config
    # 哪些 binlog 的语句要监控
    canal.instance.filter.druid.ddl = true
    canal.instance.filter.query.dcl = false
    canal.instance.filter.query.dml = false
    canal.instance.filter.query.ddl = false
    canal.instance.filter.table.error = false
    canal.instance.filter.rows = false
    canal.instance.filter.transaction.entry = false
      
    # binlog format/image check
    canal.instance.binlog.format = ROW,STATEMENT,MIXED
    canal.instance.binlog.image = FULL,MINIMAL,NOBLOB
      
    # binlog ddl isolation
    canal.instance.get.ddl.isolation = false
      
    # parallel parser config
    canal.instance.parser.parallel = true
    ## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
    #canal.instance.parser.parallelThreadSize = 16
    ## disruptor ringbuffer size, must be power of 2
    canal.instance.parser.parallelBufferSize = 256
      
    # table meta tsdb info
    canal.instance.tsdb.enable = true
    canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
    canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
    canal.instance.tsdb.dbUsername = canal
    canal.instance.tsdb.dbPassword = canal
    # dump snapshot interval, default 24 hour
    canal.instance.tsdb.snapshot.interval = 24
    # purge snapshot expire , default 360 hour(15 days)
    canal.instance.tsdb.snapshot.expire = 360
      
    # aliyun ak/sk , support rds/mq
    canal.aliyun.accesskey =
    canal.aliyun.secretkey =
      
    #################################################
    #########       destinations        #############
    #################################################
    # 监控的 instance
    canal.destinations = example
    # conf root dir
    canal.conf.dir = ../conf
    # auto scan instance dir add/remove and start/stop instance
    canal.auto.scan = true
    canal.auto.scan.interval = 5
      
    canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
    #canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml
      
    canal.instance.global.mode = spring
    canal.instance.global.lazy = false
    #canal.instance.global.manager.address = 127.0.0.1:1099
    #canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
    canal.instance.global.spring.xml = classpath:spring/file-instance.xml
    #canal.instance.global.spring.xml = classpath:spring/default-instance.xml
      
    ##################################################
    #########            MQ              #############
    ##################################################
    canal.mq.servers = 127.0.0.1:6667
    canal.mq.retries = 0
    canal.mq.batchSize = 16384
    canal.mq.maxRequestSize = 1048576
    canal.mq.lingerMs = 1
    canal.mq.bufferMemory = 33554432
    canal.mq.canalBatchSize = 50
    canal.mq.canalGetTimeout = 100
    canal.mq.flatMessage = true
    canal.mq.compressionType = none
    canal.mq.acks = all
    
  • example:默认 instance 的配置,内部包括 instance.properties(重要部分有中文注释):
    #################################################
    ## mysql serverId , v1.0.26+ will autoGen
    # mysql 的 slave id
    # canal.instance.mysql.slaveId=1234
      
    # enable gtid use true/false
    canal.instance.gtidon=false
      
    # position info
    canal.instance.master.address=127.0.0.1:3306
    canal.instance.master.journal.name=
    canal.instance.master.position=
    canal.instance.master.timestamp=
    canal.instance.master.gtid=
      
    # rds oss binlog
    canal.instance.rds.accesskey=
    canal.instance.rds.secretkey=
    canal.instance.rds.instanceId=
      
    # table meta tsdb info
    canal.instance.tsdb.enable=true
    #canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
    #canal.instance.tsdb.dbUsername=canal
    #canal.instance.tsdb.dbPassword=canal
      
    #canal.instance.standby.address =
    #canal.instance.standby.journal.name =
    #canal.instance.standby.position =
    #canal.instance.standby.timestamp =
    #canal.instance.standby.gtid=
      
    # username/password
    # 为 canal 建立的 mysql 用户名密码
    canal.instance.dbUsername=canal
    canal.instance.dbPassword=canal
    canal.instance.connectionCharset=UTF-8
    canal.instance.defaultDatabaseName=test
    # enable druid Decrypt database password
    canal.instance.enableDruid=false
    #canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
      
    # table regex
    canal.instance.filter.regex=.*\\..*
    # table black regex
    canal.instance.filter.black.regex=
      
    # mq config
    canal.mq.topic=example
    canal.mq.partition=0
    # hash partition config
    #canal.mq.partitionsNum=3
    #canal.mq.partitionHash=mytest.person:id,mytest.role:id
    #################################################
    
  • metrics:监控信息。
  • spring:server 的 spring 配置文件。

启动 server 用上面编译的 deployer,在 bin 目录下执行:

sh ./startup.sh

client

client 目录结构:

client.png

  • conf 目录下 application.yml 是 client 的配置文件(client 是一个 Spring Boot 2.0 的项目):

    server:
      port: 8081
    logging:
      level:
        org.springframework: DEBUG
        com.alibaba.otter.canal.client.adapter.hbase: DEBUG
        com.alibaba.otter.canal.client.adapter.es: DEBUG
        com.alibaba.otter.canal.client.adapter.rdb: DEBUG
    spring:
      jackson:
        date-format: yyyy-MM-dd HH:mm:ss
        time-zone: GMT+8
        default-property-inclusion: non_null
    canal.conf:
      canalServerHost: 127.0.0.1:11111
    #  zookeeperHosts: slave1:2181
    #  mqServers: slave1:6667 #or rocketmq
    #  flatMessage: true
      batchSize: 500
      syncBatchSize: 1000
      retries: 3
      timeout:
      accessKey:
      secretKey:
      mode: tcp # kafka rocketMQ
      srcDataSources:
        defaultDS:
          url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true
          username: root
          password: 123456
      canalAdapters:
      - instance: example # canal instance Name or mq topic name
        groups:
        - groupId: g1
          outerAdapters:
          - name: logger
          - name: es
            hosts: 127.0.0.1:9300
            properties:
              cluster.name: elasticsearch
    
  • es 下 book.yml 是 ES mapping 的配置,告诉 client ES 要用到的 index、type 等:

    dataSourceKey: defaultDS
    destination: example
    esMapping:
      _index: book
      _type: idx
      # _id: _id
      pk: id
      sql: "select id, title, desc, path, create_time, update_time from book"
    #  objFields:
    #    _labels: array:;
      etlCondition: "where update_time>='{0}'"
      commitBatch: 3000
    

启动 client 使用上面编译的 client-adapter,在 bin 目录下执行:

sh ./startup.sh

效果

client-adapter 会订阅到 instance(本例是 example):

server-log.png

往 book 表增删改数据:

client-log.png