Redis基于Sentinel哨兵高可用方案

下载最新redis版本,当前最新版本为 2.8.19 2014-12-30

安装redis

首页地址：http://redis.io/

最新稳定版下载地址: http://download.redis.io/releases/redis-2.8.19.tar.gz

# tar -xvf redis-2.8.19.tar.gz

# cd redis-2.8.19

# make install

# make test

l 报错

# make test

cd src && make test

make[1]: Entering directory `/data/software/redis-2.8.19/src’

You need tcl 8.5 or newer in order to run the Redis test

make[1]: *** [test] Error 1

make[1]: Leaving directory `/data/software/redis-2.8.19/src’

make: *** [test] Error 2

l 解决

安装tcl

http://www.linuxfromscratch.org/blfs/view/cvs/general/tcl.html

yum install tcl tcl-devel -y

命令用法简介

redis-server -h
Usage: ./redis-server [/path/to/redis.conf] [options]
       ./redis-server - (read config from stdin)
       ./redis-server -v or --version
       ./redis-server -h or --help
       ./redis-server --test-memory <megabytes>
 
Examples:
       ./redis-server (run the server with default conf)
       ./redis-server /etc/redis/6379.conf
       ./redis-server --port 7777
       ./redis-server --port 7777 --slaveof 127.0.0.1 8888
       ./redis-server /etc/myredis.conf --loglevel verbose
 
Sentinel mode:
       ./redis-server /etc/sentinel.conf --sentinel

配置主从

# mkdir /etc/redis
#cp sentinel.conf redis.conf /etc/redis –pa

配置项解释

###斜体字为2.8比2.4新的配置项

daemonize yes         ###daemon后台运行模式
pidfile /var/run/redis.pid   ###pidfile
port 6379   ###默认6379端口
tcp-backlog 511   ###tcp三次握手等待确认ack最大的队列数
bind 192.168.65.128 127.0.0.1  ###绑定的监听的ip,可同时写多个ip,不写默认监控0.0.0.0
timeout 0  ###关闭空闲客户端,0为disable
tcp-keepalive 0 ###是否开启tcp长连接,定义socket连接timeout时长
loglevel notice  ###定义日志级别为notice(输出必要的日志)
logfile "/var/log/redis/redis.log" ###定义日志输出目录
databases 16  ###允许redis定义的最大的db簇
###开启rdb并应用如下数据保存策略,aof和rdb可同时启用,aof强烈建议开启
save 900 1     
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes  ###有任何问题导致的bgsave失败都停止redis的对外服务
rdbcompression yes  ###rdb写压缩
rdbchecksum yes  ###rdb文件正确性检查
dbfilename dump.rdb  ###rdb文件名称(dump.rdb可任意起)
dir /var/lib/redis/     ###rdb文件写目录
# slaveof <masterip> <masterport>    ###如果是slave需要写明master
# masterauth <master-password>    ###master密码验证(因为redis基于内存,作用不大)
slave-serve-stale-data no  ###如果master宕机,slave是否还正常对外服务no->停止服务
slave-read-only yes    ###salve是否只读(默认只读)
repl-diskless-sync no   ### Disk-backed 启动新进程写rdb到disk,然后增量传输到salves 方式二:Diskless 直接写salve sockers,而不用写disk,当网络好的时候建议使用,但因为用到的是M/S+哨兵的架构,随时可能会进行主从切换,这个方式暂时不用
repl-diskless-sync-delay 5  ##每5s传输一次diskless开启后生效
repl-ping-slave-period 3   ###slave每3s ping测试master是否存活
repl-timeout 10      ###定义replicationtxuq  timeout时间,这里的时间要大于repl-ping-slave-period里的时间,不然后一直发生low traffic between the master and the slave问题
repl-disable-tcp-nodelay no   ###yes 一条线40milliseconds发送一次数据包,意识着更小的tcp packets和更小的带宽去发送到slave,但在slave会有相应的数据延迟. No,刚相反.一般上我们期望是低延迟,所以最选择no是个不错的选择,但如果在复杂的网络情况或m/s之间有很多中继网络,这里建议修改为yes
slave-priority 100  ###slave优先级别,数字越小优先级别越高,当master宕机时优先选取slave
# min-slaves-to-write 3   ###当最少有3个slave延迟<= 10s时,master才正常提供服务
# min-slaves-max-lag 10  ###当最少有3个slave延迟<= 10s时,master才正常提供服务
maxmemory  1536000000   #1.5G   ###单位<bytes> ,redis最大占用内存数,当超过1.5G时redis会根据策略清理内存key
#############6种清理策略
# volatile-lru -> remove the key with an expire set using an LRU algorithm ##推荐使用
# allkeys-lru -> remove any key according to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys-random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
maxmemory-policy volatile-lru  ###使用LRU算法清理过期key
# maxmemory-samples 3  ###LRU算法和TTL算法均为模糊算法,该精确算法redis会选择3个keys选择最少使用的一个key进行删除, 个人不建议使用
appendonly yes   ###aof持久化,增量写disk,比aof慢,但更可靠.企业的默认选择,有的会两者均开启
appendfilename "appendonly.aof"  ###aof文件名
# appendfsync always
appendfsync everysec   ###建议使用该方式
# appendfsync no
no-appendfsync-on-rewrite no  ##建议no, 主要用来缓和redis调用fsync()写数据时间长的问题.当BGSAVE或BGREWRITEAOF被调用期间,fsync()进程将阻止被调用,即相当于
auto-aof-rewrite-percentage 100   ###当文件大小达到64mb的100%大小时开始rewrite aof文件
auto-aof-rewrite-min-size 64mb   ###当文件大小达到64mb的100%大小时开始rewrite aof文件
aof-load-truncated yes   ###当aof文件被损坏时,redis将返回错误并退出
lua-time-limit 5000   ###LUA scripts最大执行时间,单位(milliseconds),超出后返回查询错误并写日志
slowlog-log-slower-than 10000 ###单位microseconds(毫秒) 1000000 microseconds=1 s,记录执行时长超过10000 microseconds的执行命令
slowlog-max-len 128  ###最大长度为128
latency-monitor-threshold 0   ###监控相关,关闭就好
notify-keyspace-events ""   ###空表示关闭,发布相关key的操作记录到所有 client
######下面是高级设置,个人保持为默认配置
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

运行redis

# redis-server /etc/redis/redis.conf

Info输出项解释:

和redis.conf中的配置相对应

# redis-cli -h Mrds
Mrds:6379> info
# Server
redis_version:2.8.19     ###redis版本号
redis_git_sha1:00000000  ###git SHA1
redis_git_dirty:0   ###git dirty flag
redis_build_id:78796c63e58b72dc
redis_mode:standalone   ###redis运行模式
os:Linux 2.6.32-431.el6.x86_64 x86_64   ###os版本号
arch_bits:64  ###64位架构
multiplexing_api:epoll  ###调用epoll算法
gcc_version:4.4.7   ###gcc版本号
process_id:25899   ###服务器进程PID
run_id:eae356ac1098c13b68f2b00fd7e1c9f93b1c6a2c   ###Redis的随机标识符(用于sentinel和集群)
tcp_port:6379   ###Redis监听的端口号
uptime_in_seconds:6419 ###Redis运行时长(s为单位)
uptime_in_days:0  ###Redis运行时长(天为单位)
hz:10
lru_clock:10737922  ###以分钟为单位的自增时钟,用于LRU管理
config_file:/etc/redis/redis.conf   ###redis配置文件
 
# Clients
connected_clients:1   ###已连接客户端的数量（不包括通过从属服务器连接的客户端）
client_longest_output_list:0   ###当前连接的客户端中最长的输出列表
client_biggest_input_buf:0   ###当前连接的客户端中最大的输出缓存
blocked_clients:0  ###正在等待阻塞命令（BLPOP、BRPOP、BRPOPLPUSH）的客户端的数量 需监控
 
# Memory
used_memory:2281560   ###由 Redis 分配器分配的内存总量，以字节（byte）为单位
used_memory_human:2.18M   ###以更友好的格式输出redis占用的内存
used_memory_rss:2699264   ###从操作系统的角度，返回 Redis 已分配的内存总量（俗称常驻集大小）。这个值和 top 、 ps 等命令的输出一致
used_memory_peak:22141272  ### Redis 的内存消耗峰值（以字节为单位）
used_memory_peak_human:21.12M  ###以更友好的格式输出redis峰值内存占用
used_memory_lua:35840  ###LUA引擎所使用的内存大小
mem_fragmentation_ratio:1.18  ###used_memory_rss 和 used_memory 之间的比率
mem_allocator:jemalloc-3.6.0
 
###在理想情况下， used_memory_rss 的值应该只比 used_memory 稍微高一点儿。当 rss > used ，且两者的值相差较大时，表示存在（内部或外部的）内存碎片。内存碎片的比率可以通过 mem_fragmentation_ratio 的值看出。
当 used > rss 时，表示 Redis 的部分内存被操作系统换出到交换空间了，在这种情况下，操作可能会产生明显的延迟。
 
 
# Persistence
loading:0  ###记录服务器是否正在载入持久化文件
rdb_changes_since_last_save:0   ###距离最近一次成功创建持久化文件之后，经过了多少秒
rdb_bgsave_in_progress:0   ###记录了服务器是否正在创建 RDB 文件
rdb_last_save_time:1420023749  ###最近一次成功创建 RDB 文件的 UNIX 时间戳
rdb_last_bgsave_status:ok   ###最近一次创建 RDB 文件的结果是成功还是失败
rdb_last_bgsave_time_sec:0  ###最近一次创建 RDB 文件耗费的秒数
rdb_current_bgsave_time_sec:-1  ###如果服务器正在创建 RDB 文件，那么这个域记录的就是当前的创建操作已经耗费的秒数
aof_enabled:1   ###AOF 是否处于打开状态
aof_rewrite_in_progress:0   ###服务器是否正在创建 AOF 文件
aof_rewrite_scheduled:0   ###RDB 文件创建完毕之后，是否需要执行预约的 AOF 重写操作
aof_last_rewrite_time_sec:-1  ###最近一次创建 AOF 文件耗费的时长
aof_current_rewrite_time_sec:-1  ###如果服务器正在创建 AOF 文件，那么这个域记录的就是当前的创建操作已经耗费的秒数
aof_last_bgrewrite_status:ok  ###最近一次创建 AOF 文件的结果是成功还是失败
aof_last_write_status:ok 
aof_current_size:176265  ###AOF 文件目前的大小
aof_base_size:176265  ###服务器启动时或者 AOF 重写最近一次执行之后，AOF 文件的大小
aof_pending_rewrite:0  ###是否有 AOF 重写操作在等待 RDB 文件创建完毕之后执行
aof_buffer_length:0   ###AOF 缓冲区的大小
aof_rewrite_buffer_length:0  ###AOF 重写缓冲区的大小
aof_pending_bio_fsync:0  ###后台 I/O 队列里面，等待执行的 fsync 调用数量
aof_delayed_fsync:0    ###被延迟的 fsync 调用数量
 
# Stats
total_connections_received:8466  ###服务器已接受的连接请求数量
total_commands_processed:900668   ###服务器已执行的命令数量
instantaneous_ops_per_sec:1   ###服务器每秒钟执行的命令数量
total_net_input_bytes:82724170
total_net_output_bytes:39509080
instantaneous_input_kbps:0.07
instantaneous_output_kbps:0.02
rejected_connections:0  ###因为最大客户端数量限制而被拒绝的连接请求数量
sync_full:2
sync_partial_ok:0
sync_partial_err:0
expired_keys:0   ###因为过期而被自动删除的数据库键数量
evicted_keys:0   ###因为最大内存容量限制而被驱逐（evict）的键数量。
keyspace_hits:0  ###查找数据库键成功的次数。
keyspace_misses:500000   ###查找数据库键失败的次数。
pubsub_channels:0  ###目前被订阅的频道数量
pubsub_patterns:0  ###目前被订阅的模式数量
latest_fork_usec:402  ###最近一次 fork() 操作耗费的毫秒数
 
# Replication
role:master   ###如果当前服务器没有在复制任何其他服务器，那么这个域的值就是 master ；否则的话，这个域的值就是 slave 。注意，在创建复制链的时候，一个从服务器也可能是另一个服务器的主服务器
connected_slaves:2   ###2个slaves
slave0:ip=192.168.65.130,port=6379,state=online,offset=1639,lag=1
slave1:ip=192.168.65.129,port=6379,state=online,offset=1639,lag=0
master_repl_offset:1639
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:1638
 
# CPU
used_cpu_sys:41.87  ###Redis 服务器耗费的系统 CPU
used_cpu_user:17.82  ###Redis 服务器耗费的用户 CPU
used_cpu_sys_children:0.01  ###后台进程耗费的系统 CPU
used_cpu_user_children:0.01  ###后台进程耗费的用户 CPU
 
# Keyspace
db0:keys=3101,expires=0,avg_ttl=0   ###keyspace 部分记录了数据库相关的统计信息，比如数据库的键数量、数据库已经被删除的过期键数量等。对于每个数据库，这个部分都会添加一行以下格式的信息

创建redis目录:

# mkdir /var/run/redis
# chown redis. /var/run/redis

测试1: Sentinel哨兵高自动可用切换

Sentinel配置

启动前:

port 26379
dir "/var/lib/redis/tmp"   ###定义目录存放
sentinel monitor mymaster 192.168.65.128 6379 2     ###监控mymaster(可自定义-但只能包括A-z 0-9和”._-”)
sentinel down-after-milliseconds mymaster 30000   ###mymaster多久不响应认为SDOWN
sentinel parallel-syncs mymaster 1   ###指定最大同时同步新maser配置的salve数量
sentinel failover-timeout mymaster 180000  ###2次failover切换时间

启动后(redis.con配置文件完全由sentinel控制,请不要再随意手动改动):

port 26379
dir "/var/lib/redis/tmp"
sentinel monitor mymaster 192.168.65.128 6379 2
sentinel config-epoch mymaster 18   ###确认mymater SDOWN时长
sentinel leader-epoch mymaster 18  ###同时一时间最多18个slave可同时更新配置,建议数字不要太大,以免影响正常对外提供服务
sentinel known-slave mymaster 192.168.65.129 6379   ###已知的slave
sentinel known-slave mymaster 192.168.65.130 6379   ###已知的slave
sentinel known-sentinel mymaster 192.168.65.130 26379 be964e6330ee1eaa9a6b5a97417e866448c0ae40    ###已知slave的唯一id
sentinel known-sentinel mymaster 192.168.65.129 26379 3e468037d5dda0bbd86adc3e47b29c04f2afe9e6  ###已知slave的唯一id
sentinel current-epoch 18  ####当前可同时同步的salve数最大同步阀值

开启sentinel

开启sentinel后会后redis.conf会由sentinel进行管理

# redis-server /etc/redis/sentinel.conf --sentinel &> /var/log/redis/sentinel.log &

启动sentinel各服务器输出信息

Mrds
Srds1 Srds2的输出结果应该是一样的.

Mrds输出:

                _._                                                 
           _.-``__ ''-._                                            
      _.-``    `.  `_.  ''-._           Redis 2.8.19 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 30176
  `-._    `-._  `-./  _.-'    _.-'                                  
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |           http://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                  
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                  
      `-._    `-.__.-'    _.-'                                      
          `-._        _.-'                                          
              `-.__.-'                                              
 
[30176] 06 Jan 20:26:03.709 # Sentinel runid is 61b67b4ee4bbd9835f1713d8e900f2abb9fb5658
[30176] 06 Jan 20:26:03.710 # +monitor master mymaster 192.168.65.128 6379 quorum 2
[30176] 06 Jan 20:26:04.711 * +slave slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.128 6379
[30176] 06 Jan 20:26:04.711 * +slave slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379
[30176] 06 Jan 20:27:38.824 * +sentinel sentinel 192.168.65.129:26379 192.168.65.129 26379 @ mymaster 192.168.65.128 6379
[30176] 06 Jan 20:27:43.062 * +sentinel sentinel 192.168.65.130:26379 192.168.65.130 26379 @ mymaster 192.168.65.128 6379

Redis-cli> info输出信息:

Srds1输出

[4993] 06 Jan 20:27:36.788 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                 
           _.-``__ ''-._                                            
      _.-``    `.  `_.  ''-._           Redis 2.8.19 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 4993
  `-._    `-._  `-./  _.-'    _.-'                                  
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |           http://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |                                 
  `-._    `-._`-.__.-'_.-'    _.-'                                  
      `-._    `-.__.-'    _.-'                                      
          `-._        _.-'                                          
              `-.__.-'                                              
 
[4993] 06 Jan 20:27:36.790 # Sentinel runid is 157f2df2543470ecd35d92fba75eaa8069c3f1a0
[4993] 06 Jan 20:27:36.790 # +monitor master mymaster 192.168.65.128 6379 quorum 2
[4993] 06 Jan 20:27:36.797 * +slave slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.128 6379
[4993] 06 Jan 20:27:36.797 * +slave slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379
[4993] 06 Jan 20:27:38.427 * +sentinel sentinel 192.168.65.128:26379 192.168.65.128 26379 @ mymaster 192.168.65.128 6379
[4993] 06 Jan 20:27:43.064 * +sentinel sentinel 192.168.65.130:26379 192.168.65.130 26379 @ mymaster 192.168.65.128 6379

Redis-cli> info输出信息

Srds2输出

                _._                                                 
           _.-``__ ''-._                                            
      _.-``    `.  `_.  ''-._           Redis 2.8.19 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 22660
  `-._    `-._  `-./  _.-'    _.-'                                  
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |           http://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                  
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |                                 
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                      
          `-._        _.-'                                          
              `-.__.-'                                              
 
[22660] 06 Jan 20:27:41.032 # Sentinel runid is e4b5faa87975a4b8e82a93476f1675a62511cfba
[22660] 06 Jan 20:27:41.032 # +monitor master mymaster 192.168.65.128 6379 quorum 2
[22660] 06 Jan 20:27:41.036 * +slave slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.128 6379
[22660] 06 Jan 20:27:41.036 * +slave slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379
[22660] 06 Jan 20:27:42.586 * +sentinel sentinel 192.168.65.128:26379 192.168.65.128 26379 @ mymaster 192.168.65.128 6379
[22660] 06 Jan 20:27:42.895 * +sentinel sentinel 192.168.65.129:26379 192.168.65.129 26379 @ mymaster 192.168.65.128 6379

Redis-cli> info 输出信息

redis.conf最新信息:

测试1: 自动切换

1. 设置slave-priority的值Srds1为100,Srds2为99,方便主观确认新maser就为Srds1,因为slave-priority数字越小优先级越高

2. 关闭Mrds redis-server进程

3. 确认failover信息

Srds1输出:

[4993] 06 Jan 21:16:31.080 # +sdown master mymaster 192.168.65.128 6379 //确认master SDOWN
[4993] 06 Jan 21:16:31.165 # +odown master mymaster 192.168.65.128 6379 #quorum 2/2 //投票2/2确认master宕机
[4993] 06 Jan 21:16:31.165 # +new-epoch 1
[4993] 06 Jan 21:16:31.165 # +try-failover master mymaster 192.168.65.128 6379 //尝试failover 192.168.65.128
[4993] 06 Jan 21:16:31.170 # +vote-for-leader 157f2df2543470ecd35d92fba75eaa8069c3f1a0 1   //投票选举master
[4993] 06 Jan 21:16:31.178 # 192.168.65.128:26379 voted for 157f2df2543470ecd35d92fba75eaa8069c3f1a0 1    //128一票
[4993] 06 Jan 21:16:31.180 # 192.168.65.130:26379 voted for 157f2df2543470ecd35d92fba75eaa8069c3f1a0 1   //130一票
[4993] 06 Jan 21:16:31.271 # +elected-leader master mymaster 192.168.65.128 6379
[4993] 06 Jan 21:16:31.272 # +failover-state-select-slave master mymaster 192.168.65.128 6379  //对mymaster进行故障切换
[4993] 06 Jan 21:16:31.331 # +selected-slave slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379  //slave130当选新的master
[4993] 06 Jan 21:16:31.331 * +failover-state-send-slaveof-noone slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379
[4993] 06 Jan 21:16:31.410 * +failover-state-wait-promotion slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379 //failover的状态为等待130声明故障切换成功
[4993] 06 Jan 21:16:32.182 # +promoted-slave slave 192.168.65.130:6379 192.168.65.130 6379 @ mymaster 192.168.65.128 6379  //130声明故障切换完毕并且切换成功
[4993] 06 Jan 21:16:32.182 # +failover-state-reconf-slaves master mymaster 192.168.65.128 6379  //failover当前状态为重新加载新的master配置
[4993] 06 Jan 21:16:32.232 * +slave-reconf-sent slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.128 6379
[4993] 06 Jan 21:16:33.198 * +slave-reconf-inprog slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.128 6379 //129 doing新配置
[4993] 06 Jan 21:16:33.198 * +slave-reconf-done slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.128 6379 //129 done新配置
[4993] 06 Jan 21:16:33.282 # +failover-end master mymaster 192.168.65.128 6379 //failover结束
[4993] 06 Jan 21:16:33.282 # +switch-master mymaster 192.168.65.128 6379 192.168.65.130 6379   //master由128切换为130
[4993] 06 Jan 21:16:33.283 * +slave slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.130 6379  //129状态为salve
[4993] 06 Jan 21:16:33.293 * +slave slave 192.168.65.128:6379 192.168.65.128 6379 @ mymaster 192.168.65.130 6379 //128状态为slave
[4993] 06 Jan 21:17:03.337 # +sdown slave 192.168.65.128:6379 192.168.65.128 6379 @ mymaster 192.168.65.130 6379  //128状态为sdown

Srds2输出:

[22660] 06 Jan 21:16:31.115 # +sdown master mymaster 192.168.65.128 6379  //mymaster 128 SDOWN
[22660] 06 Jan 21:16:31.175 # +new-epoch 1
[22660] 06 Jan 21:16:31.179 # +vote-for-leader 157f2df2543470ecd35d92fba75eaa8069c3f1a0 1 //选举new master
[22660] 06 Jan 21:16:31.200 # +odown master mymaster 192.168.65.128 6379 #quorum 3/2 //3个哨兵均确认old master 128 down, sdown à odown
[22660] 06 Jan 21:16:31.200 # Next failover delay: I will not start a failover before Tue Jan  6 21:22:31 2015 //我将进行failover替换128
[22660] 06 Jan 21:16:32.239 # +config-update-from sentinel 192.168.65.129:26379 192.168.65.129 26379 @ mymaster 192.168.65.128 6379  //从129同步新配置
[22660] 06 Jan 21:16:32.239 # +switch-master mymaster 192.168.65.128 6379 192.168.65.130 6379 //master由130切换为128
[22660] 06 Jan 21:16:32.240 * +slave slave 192.168.65.129:6379 192.168.65.129 6379 @ mymaster 192.168.65.130 6379 //129为130 slave
[22660] 06 Jan 21:16:32.249 * +slave slave 192.168.65.128:6379 192.168.65.128 6379 @ mymaster 192.168.65.130 6379 //128为130slave
[22660] 06 Jan 21:17:02.286 # +sdown slave 192.168.65.128:6379 192.168.65.128 6379 @ mymaster 192.168.65.130 6379 //128 sdown

Redis.conf由sentinel控制

配置自动更新– Mrds

配置自动更新-Srds1

Redis.conf自动变更为

Sentinel.conf自动变更为:

配置自动更新-Srds2

Redis.conf自动变更:

Sentinel自动变更:

************************自动切换成功************************

再次启动Mrds redis看

Srds2上可以看到128会被作为 slave自动加进来,

Maser redis.conf的最后一行会被强制加入

测试2: Mrds是否能再被选举为maser

1. 启动Mrds上的redis进程

2. 关闭Srds2上的redis进程,确认master是否被切换为Srds1上(应该切换到Srds1上)

10s内自动切换至 Srds1

3. 关闭Srds1上的redis进程,确认master是否被切换为Mrds上(应该切换到Mrds上)

Ok,至此切换测试全部结束.恭喜

Over-自动切换测试完毕

Tcp-backlog含义

2.8版本中的redis有新增配置项 tcp-backlog 511(默认128)

我们看到Send-Q的值为100, 即是我们配置的tcp-backlog值. 为了搞清楚这个值的意思, 了解了下tcp的三次握手进行中的一些queue的知识. 参考下图我们可以看到在server接收到sny的时候会进入到一个syn queue队列, 当server端最终收到ack时转换到accept queue队列. 上面终端显示在listen状态下的连接, 其Send-Q就是这个accept queue队列的最大值. 只有server端执行了accept后才会从这个队列中移除这个连接. 这个值的大小是受somaxconn影响的, 因为是取的它们两者的最小值, 所以如果要调大的话必需修改内核的somaxconn值.

Sentinel简介：

Monitoring. Sentinel constantly checks if your master and slave instances are
working as expected. 时时监控
Notification. Sentinel can notify the system administrator, or another computer
program, via an API, that something is wrong with one of the monitored
Redis instances. 通知
Automatic failover. If a master is not working as expected, Sentinel can start a
failover process where a slave is promoted to master, the other additional
slaves are reconfigured to use the new master, and the applications using
the Redis server informed about the new address to use when connecting. 自动切换
Configuration provider. Sentinel acts as a source of authority for
clients service discovery: clients connect to Sentinels in order to ask
for the address of the current Redis master responsible for a given
service. If a failover occurs, Sentinels will report the new address.

和主从配置最大不一样的地方是:

Redis.conf配置完全由sentinel监管和配置。

Mini Example:

命令格式:

sentinel <option_name> <master_name> <option_value>
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
 
sentinel monitor resque 192.168.1.3 6380 4
sentinel down-after-milliseconds resque 10000
sentinel failover-timeout resque 180000
sentinel parallel-syncs resque 5

Quorum 投票

投票数可定义,单台机器只能定义为SDOWN,只有投票通过后才可以定义为ODOWN,

另外在MASTER的选举中,假如总共有5台机器,我们定义2台机器发现就可以认为是ODOWN,当2台机器发现后,这两台机器中的一台会进行failover,前提是被其它3台认证通过.

配置项解释:

1.     ##sentinel实例之间的通讯端口  
2.     ##redis-0  
3.     port 26379  
4.     ##sentinel需要监控的master信息：<mastername> <masterIP> <masterPort> <quorum>  
5.     ##<quorum>应该小于集群中slave的个数,只有当至少<quorum>个sentinel实例提交"master失效"  
6.     ##才会认为master为O_DWON("客观"失效)  
7.     sentinel monitor def_master 127.0.0.1 6379 2  
8.       
9.     sentinel auth-pass def_master 012_345^678-90  
10.     
11.   ##master被当前sentinel实例认定为“失效”的间隔时间  
12.   ##如果当前sentinel与master直接的通讯中，在指定时间内没有响应或者响应错误代码，那么  
13.   ##当前sentinel就认为master失效(SDOWN，“主观”失效)  
14.   ##<mastername> <millseconds>  
15.   ##默认为30秒  
16.   sentinel down-after-milliseconds def_master 30000  
17.     
18.   ##当前sentinel实例是否允许实施“failover”(故障转移)  
19.   ##no表示当前sentinel为“观察者”(只参与"投票".不参与实施failover)，  
20.   ##全局中至少有一个为yes  
21.   sentinel can-failover def_master yes  
22.     
23.   ##当新master产生时，同时进行“slaveof”到新master并进行“SYNC”的slave个数。  
24.   ##默认为1,建议保持默认值  
25.   ##在salve执行salveof与同步时，将会终止客户端请求。  
26.   ##此值较大，意味着“集群”终止客户端请求的时间总和和较大。  
27.   ##此值较小,意味着“集群”在故障转移期间，多个salve向客户端提供服务时仍然使用旧数据。  
28.   sentinel parallel-syncs def_master 1  
29.     
30.   ##failover过期时间，当failover开始后，在此时间内仍然没有触发任何failover操作，  
31.   ##当前sentinel将会认为此次failoer失败。  
32.   sentinel failover-timeout def_master 900000  
33.     
34.   ##当failover时，可以指定一个“通知”脚本用来告知系统管理员，当前集群的情况。  
35.   ##脚本被允许执行的最大时间为60秒，如果超时，脚本将会被终止(KILL)  
36.   ##脚本执行的结果：  
37.   ## 1    -> 稍后重试，最大重试次数为10;   
38.   ## 2    -> 执行结束，无需重试  
39.   ##sentinel notification-script mymaster /var/redis/notify.sh  
40.     
41.   ##failover之后重配置客户端，执行脚本时会传递大量参数，请参考相关文档  
# sentinel client-reconfig-script <master-name> <script-path>

编码大的配置会作为主配置:

Because every configuration has a different version number, the greater version always wins over smaller versions.
slave-priority of zero in order to be never selected by Sentinels as the new master
stanley 2014/12/28 22:23:50

    PING This command simply returns PONG.
    SENTINEL masters Show a list of monitored masters and their state.
    SENTINEL master <master name> Show the state and info of the specified master.
    SENTINEL slaves <master name> Show a list of slaves for this master, and their state.
    SENTINEL get-master-addr-by-name <master name> Return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the promoted slave.
    SENTINEL reset <pattern> This command will reset all the masters with matching name. The pattern argument is a glob-style pattern. The reset process clears any previous state in a master (including a failover in progress), and removes every slave and sentinel already discovered and associated with the master.
    SENTINEL failover <master name> Force a failover as if the master was not reachable, and without asking for agreement to other Sentinels (however a new version of the configuration will be published so that the other Sentinels will update their configurations).

增加多台sentinels时

If you need to add multiple Sentinels at once, it is suggested to add it one after the other, waiting for all the other Sentinels to already know about the first one before adding the next.

减少sentinels时:

Stop the Sentinel process of the Sentinel you want to remove.
Send a SENTINEL RESET * command to all the other Sentinel instances (instead of * you can use the exact master name if you want to reset just a single master). One after the other, waiting at least 30 seconds between instances.
Check that all the Sentinels agree about the number of Sentinels currently active, by inspecting the output of SENTINEL MASTER mastername of every Sentinel.

依赖计算机时间：

Redis Sentinel is heavily dependent on the computer time:

TILT mode:

if the time difference is negative or unexpectedly big (2 seconds or more) the TILT mode is entered
If everything appears to be normal for 30 second, the TILT mode is exited.

附件:

Redis起停脚本

#!/bin/sh
#
# redis        init file for starting up the redis daemon
#
# chkconfig:   - 20 80
# description: Starts and stops the redis daemon.

# Source function library.
. /etc/rc.d/init.d/functions

name="redis-server"
exec="/usr/local/bin/$name"
pidfile="/var/run/redis/redis.pid"
REDIS_CONFIG="/etc/redis/redis.conf"

[ -e /etc/sysconfig/redis ] && . /etc/sysconfig/redis

lockfile=/var/lock/subsys/redis

start() {
    [ -f $REDIS_CONFIG ] || exit 6
    [ -x $exec ] || exit 5
    echo -n $"Starting $name: "
    daemon --user ${REDIS_USER-redis} "$exec $REDIS_CONFIG"
    retval=$?
    echo
    [ $retval -eq 0 ] && touch $lockfile
    return $retval
}

stop() {
    echo -n $"Stopping $name: "
    killproc -p $pidfile $name
    retval=$?
    echo
    [ $retval -eq 0 ] && rm -f $lockfile
    return $retval
}

restart() {
    stop
    start
}

reload() {
    false
}

rh_status() {
    status -p $pidfile $name
}

rh_status_q() {
    rh_status >/dev/null 2>&1
}


case "$1" in
    start)
        rh_status_q && exit 0
        $1
        ;;
    stop)
        rh_status_q || exit 0
        $1
        ;;
    restart)
        $1
        ;;
    reload)
        rh_status_q || exit 7
        $1
        ;;
    force-reload)
        force_reload
        ;;
    status)
        rh_status
        ;;
    condrestart|try-restart)
        rh_status_q || exit 0
        restart
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart|condrestart|try-restart}"
        exit 2
esac
exit $?

原创文章，作者：stanley，如若转载，请注明出处：http://www.178linux.com/672