by Uwe Meding

I am running a simple cluster with 2 server nodes and a DRBD setup that mirrors a user file system and database. The systems are setup and configured to run unattended for as long as there is power (and as long as the hardware holds up obviously). After a recent power outage one device setup recovered just fine, however created the following inconsistency on the other mirrored devices:

Primary device — started up standalone with an unknown secondary device

# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs          ro                   ds                     p      mounted  fstype
...     sync'ed:    51.1%                (8720/17804)M
1:db    StandAlone  Primary/Unknown      UpToDate/DUnknown      r----
2:data  SyncSource  Secondary/Secondary  UpToDate/Inconsistent  C

Secondary device — waiting for a connection an unknown primary device

# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs            ro                   ds                     p  mounted  fstype
...     sync'ed:      24.5%                (13448/17804)M
1:db    WFConnection  Secondary/Unknown    UpToDate/DUnknown      C
2:data  SyncTarget    Secondary/Secondary  Inconsistent/UpToDate  C

In order to resolve this, we must invalidate the bad data on secondary device, which is obviously confused about its state, and then re-connect the devices. The steps are:

On node with bad data:

# drbdadm disconnect
# drdbadm -- --discard-my-data connect

On node with good data:

# drbdadm connect

1. Node with bad data:

# drbdadm disconnect db
# drbdadm -- --discard-my-data connect db
# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs            ro                   ds                     p  mounted  fstype
...     sync'ed:      41.8%                (10376/17804)M
1:db    WFConnection  Secondary/Unknown    UpToDate/DUnknown      C
2:data  SyncTarget    Secondary/Secondary  Inconsistent/UpToDate  C

Notice that the devices are already resynchronizing.

2. Node with good data:

# drbdadm connect db
# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs          ro                   ds                     p  mounted  fstype
...     sync'ed:    12.1%                (1254008/1422724)K
...     sync'ed:    64.2%                (6380/17804)M
1:db    SyncSource  Primary/Secondary    UpToDate/Inconsistent  C
2:data  SyncSource  Secondary/Secondary  UpToDate/Inconsistent  C
# drbdadm primary data
# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs          ro                 ds                     p  mounted  fstype
...     sync'ed:    32.2%              (968504/1422724)K
...     sync'ed:    67.4%              (5820/17804)M
1:db    SyncSource  Primary/Secondary  UpToDate/Inconsistent  C
2:data  SyncSource  Primary/Secondary  UpToDate/Inconsistent  C

3. After some time …

# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs         ro                 ds                 p  mounted  fstype
1:db    Connected  Primary/Secondary  UpToDate/UpToDate  C
2:data  Connected  Primary/Secondary  UpToDate/UpToDate  C

4. Wrap-up
A this point the devices are running synchronously again and we are ready to kick off whatever start up or mount protocol we have. If you are in a hurry, you can do this as soon as you see the devices synchronizing. At that point we have an operational drbd configuration already. The final status should look something like this:

# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8.1 (api:88/proto:86-94)
GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-01-21 17:26:47
m:res   cs         ro                 ds                 p  mounted  fstype
1:db    Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4
2:data  Connected  Primary/Secondary  UpToDate/UpToDate  C  /db      ext4

Leave a Reply