Troubleshooting Ceph After Proxmox Node Restarts
Date: 2025-11-09
Environment: Proxmox VE Cluster (node1, node2, node3)
Main Problem
Symptoms
- Unable to login to GUI on node1 and node2 (only node3 worked)
- "login failed" error for all user accounts
- Proxmox API not responding on node1 and node2
- Problem returned after machine restarts
- 27 VMs/containers using Ceph RBD storage were inaccessible
Error Logs
proxy detected vanished client connection
command 'lxc-info -n 10025 -p' failed: got signal 9
Diagnosis and Resolution - Phase 1: GUI Issues
Problem 1.1: GUI/API Not Responding
Diagnosis:
# Check cluster status
pvecm status # All 3 nodes in quorum
# Check services
systemctl status pveproxy # running
systemctl status pvedaemon # running
# Test API
curl -k https://node1:8006/api2/json/access/ticket # timeout
Root cause: lxc-info processes were blocking pvedaemon while checking container status.
Solution:
# Kill stuck processes
pkill -9 lxc-info
# Restart pvedaemon
systemctl restart pvedaemon
# Verification
curl -k https://node1:8006/api2/json/access/ticket # OK
Status: ✅ Temporarily fixed (problem returned)
Problem 1.2: Problem Returns After Restart/VM Start
Diagnosis:
# Check stuck processes
ps aux | grep lxc-info
# Multiple lxc-info processes in hung state
# Identify containers
# CT 10025 on node1 - es-coord-1
# CT 10026 on node2 - es-coord-2
Root cause: Containers using Ceph RBD were blocking during status checks via lxc-info.
Diagnosis and Resolution - Phase 2: Automatic Workaround
Problem 2.1: Need for Continuous Manual Process Cleanup
Solution - Automatic cleanup:
Script: /usr/local/bin/lxc-cleanup.sh
#!/bin/bash
# Kill lxc-info processes older than 30 seconds
for pid in $(ps -eo pid,etimes,cmd | grep 'lxc-info' | grep -v grep | awk '$2 > 30 {print $1}'); do
kill -9 $pid 2>/dev/null
done
Cron job: /etc/cron.d/lxc-cleanup
* * * * * root /usr/local/bin/lxc-cleanup.sh
Deployment:
# Deploy on all nodes
for node in node1 node2 node3; do
ssh $node "cat > /usr/local/bin/lxc-cleanup.sh << 'EOF'
[script as above]
EOF
chmod +x /usr/local/bin/lxc-cleanup.sh"
ssh $node "cat > /etc/cron.d/lxc-cleanup << 'EOF'
* * * * * root /usr/local/bin/lxc-cleanup.sh
EOF"
done
Status: ✅ Workaround working, but not solving root cause
Diagnosis and Resolution - Phase 3: Root Cause - Ceph
Problem 3.1: Identifying Source of Blocks
Diagnosis:
# Check container configurations
grep -l 'ceph-rbd' /etc/pve/nodes/*/lxc/*.conf /etc/pve/nodes/*/qemu-server/*.conf
# List containers/VMs using Ceph
# Found 27 VMs/containers with rootfs/disks on ceph-rbd
# Check Ceph status
ceph -s
Result:
cluster:
health: HEALTH_WARN
1 osds down
OSD count 1 < osd_pool_default_size 3
services:
mon: 3 daemons, quorum node1,node2,node3
mgr: node1(active)
osd: 1 osds: 0 up, 1 in
data:
usage: 0 B used, 0 B / 0 B avail
pgs: 32 unknown
Root cause: OSD.0 was down, preventing containers/VMs with Ceph from accessing disks.
Problem 3.2: OSD.0 Cannot Start
Diagnosis:
# Check OSD status
systemctl status ceph-osd@0
# Failed - exit code 1
# Logs
journalctl -u ceph-osd@0
# auth: unable to find a keyring on /var/lib/ceph/osd/ceph-0/keyring
# no keyring found, disabling cephx
Root cause: Missing keyring for OSD.0
Solution Attempt 1 - Generate keyring:
# Generate new key
ceph auth get-or-create osd.0 mon 'allow profile osd' mgr 'allow profile osd' osd 'allow *'
# [osd.0]
# key = <key>
# Save to file
ceph auth get osd.0 -o /var/lib/ceph/osd/ceph-0/keyring
chown ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
chmod 600 /var/lib/ceph/osd/ceph-0/keyring
Next error:
# Start attempt
systemctl start ceph-osd@0
# Failed
# New logs
journalctl -u ceph-osd@0
# missing 'type' file and unable to infer osd type
Status: ❌ Keyring not sufficient, more metadata missing
Problem 3.3: Missing OSD Metadata
Diagnosis:
# OSD directory contents
ls -la /var/lib/ceph/osd/ceph-0/
# Only keyring - no other files
Root cause: OSD directory is nearly empty - missing:
- type - backend type (bluestore/filestore)
- whoami - OSD ID
- block - symlink to block device
- fsid - OSD UUID
- ceph_fsid - Cluster UUID
- ready - ready marker
Status: ❌ Deeper problem - check backend storage
Diagnosis and Resolution - Phase 4: Storage Backend
Problem 4.1: Missing Device for OSD
Diagnosis:
# Check OSD metadata from Ceph
ceph osd metadata 0
Result - Key Information:
{
"osd_objectstore": "bluestore",
"bluestore_bdev_dev_node": "/dev/dm-13",
"bluestore_bdev_devices": "sdf,sdg,sdh,sdi,sdj,sdk,sdl,sdm",
"bluestore_bdev_partition_path": "/dev/dm-13",
"bluestore_bdev_type": "ssd",
"bluestore_bdev_size": "6597065572352",
"osd_data": "/var/lib/ceph/osd/ceph-0"
}
Key Discovery:
- OSD uses BlueStore (not FileStore)
- Backend: /dev/dm-13 (device mapper multipath)
- Physical devices: 8x iSCSI disks (sdf-sdm)
- Size: 6TB
Check device:
ls -la /dev/dm-13
# ls: cannot access '/dev/dm-13': No such file or directory
Root cause: Device mapper /dev/dm-13 doesn't exist - no storage connection!
Problem 4.2: No iSCSI Sessions
Diagnosis:
# Check iSCSI sessions
iscsiadm -m session
# iscsiadm: No active sessions.
# Check disks
lsblk | grep sd[f-m]
# sdf-sdm exist but are tiny (16K)
# Check iSCSI configuration
ls -la /etc/iscsi/nodes/
# 8 configured targets to storage array
Root cause: After node restart, iSCSI sessions didn't login automatically.
Diagnosis and Resolution - Phase 5: Restore iSCSI and Multipath
Solution 5.1: Restore iSCSI Connections
Restore sessions:
# Login to all targets
iscsiadm -m node -L all
Result:
Logging in to [iface: default, target: iqn.yyyy-mm.tld.vendor:target:..., portal: 10.0.1.10,3260]
...
Login to [...] successful. (x8 targets)
Verification:
# Check active sessions
iscsiadm -m session
# tcp: [1] 10.0.1.10:3260,13 iqn.yyyy-mm.tld.vendor:target:...
# ... (8 sessions)
# Check multipath device
ls -la /dev/dm-13
# brw-rw---- 1 root disk 252, 13 Nov 9 01:54 /dev/dm-13
# Verify multipath
multipath -ll | grep -A15 dm-13
Multipath result:
ceph-lun-1 (36xxxxxxxxxxxxxxxxxxxx) dm-13 VENDOR,MODEL
size=6.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 18:0:0:1 sdo 8:224 active ready running
| |- 22:0:0:1 sds 65:32 active ready running
| |- 20:0:0:1 sdr 65:16 active ready running
| `- 24:0:0:1 sdu 65:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 17:0:0:1 sdn 8:208 active ready running
|- 19:0:0:1 sdp 8:240 active ready running
|- 21:0:0:1 sdq 65:0 active ready running
`- 23:0:0:1 sdt 65:48 active ready running
Status: ✅ Device dm-13 exists and is active (8 multipath paths)
Problem 5.2: Device Exists But OSD Still Doesn't Work
Diagnosis:
# Try reading BlueStore label
ceph-bluestore-tool show-label --dev /dev/dm-13
# unable to decode label
# No valid bdev label found
Root cause: /dev/dm-13 exists but doesn't contain BlueStore data - likely OSD uses LVM on this device.
Check LVM:
# Read beginning of disk
dd if=/dev/dm-13 bs=4096 count=10 | hexdump -C | head -50
Discovery:
00000200 4c 41 42 45 4c 4f 4e 45 01 00 00 00 00 00 00 00 |LABELONE........|
00000210 3a c4 ab 25 20 00 00 00 4c 56 4d 32 20 30 30 31 |:..% ...LVM2 001|
...
00001200 63 65 70 68 2d 39 37 36 33 62 30 34 62 2d 61 38 |ceph-9763b04b-a8|
...
device = "/dev/mapper/ceph-lun-1"
Key discovery: OSD uses LVM!
- Volume Group: ceph-9763b04b-a846-47bc-984c-3c9da95d7329
- Physical Volume: /dev/mapper/ceph-lun-1 (alias for dm-13)
Solution 5.3: Activate LVM OSD
Scan and activate:
# Scan LVM
pvscan
vgscan
lvscan | grep ceph
# ACTIVE '/dev/ceph-9763b04b-a846-47bc-984c-3c9da95d7329/osd-block-b59f609a-4c09-4795-89f6-29b30800a3c7' [<6.00 TiB]
# Activate VG
vgchange -ay ceph-9763b04b-a846-47bc-984c-3c9da95d7329
# 1 logical volume(s) in volume group now active
# Check
ls -la /dev/ceph-9763b04b-a846-47bc-984c-3c9da95d7329/
# osd-block-b59f609a-4c09-4795-89f6-29b30800a3c7 -> ../dm-14
Verify data:
# Read BlueStore label from LV
ceph-bluestore-tool show-label --dev /dev/ceph-9763b04b-a846-47bc-984c-3c9da95d7329/osd-block-b59f609a-4c09-4795-89f6-29b30800a3c7
Result:
{
"osd_uuid": "b59f609a-4c09-4795-89f6-29b30800a3c7",
"size": 6597065572352,
"description": "main",
"bluefs": "1",
"ceph_fsid": "79e21f45-2ab5-4caa-8848-30d0ee790bc8",
"osd_key": "<key>",
"ready": "ready",
"type": "bluestore",
"whoami": "0"
}
Status: ✅ OSD data is complete and readable! LVM works, data exists.
Diagnosis and Resolution - Phase 6: OSD Metadata Reconstruction
Solution 6.1: Create Symlink to Block Device
Create symlink:
cd /var/lib/ceph/osd/ceph-0
ln -sf /dev/ceph-9763b04b-a846-47bc-984c-3c9da95d7329/osd-block-b59f609a-4c09-4795-89f6-29b30800a3c7 block
# Verify
ls -la block
# block -> /dev/ceph-.../osd-block-b59f609a-...
Status: ✅ Symlink created
Solution 6.2: Add Metadata Files
Create all required files:
# OSD type
echo bluestore > /var/lib/ceph/osd/ceph-0/type
# OSD ID
echo 0 > /var/lib/ceph/osd/ceph-0/whoami
# Ready marker
echo ready > /var/lib/ceph/osd/ceph-0/ready
# OSD UUID (NOT cluster UUID!)
echo 'b59f609a-4c09-4795-89f6-29b30800a3c7' > /var/lib/ceph/osd/ceph-0/fsid
# Ceph cluster UUID
echo '79e21f45-2ab5-4caa-8848-30d0ee790bc8' > /var/lib/ceph/osd/ceph-0/ceph_fsid
# Fix permissions
chown ceph:ceph /var/lib/ceph/osd/ceph-0/*
chmod 644 /var/lib/ceph/osd/ceph-0/type /var/lib/ceph/osd/ceph-0/whoami /var/lib/ceph/osd/ceph-0/ready
Verify contents:
ls -la /var/lib/ceph/osd/ceph-0/
# block -> /dev/ceph-.../osd-block-...
# ceph_fsid
# fsid
# keyring
# ready
# type
# whoami
Status: ✅ All metadata created
Solution 6.3: Start OSD
First attempt - error:
systemctl start ceph-osd@0
journalctl -u ceph-osd@0
# bluestore(...) _read_multi_bdev_label label correct, but osd_uuid=b59f609a... need=79e21f45...
# No valid bdev label found
Problem: File fsid contained cluster UUID instead of OSD UUID!
Correction:
# Fix - fsid should be OSD UUID, not cluster UUID
echo 'b59f609a-4c09-4795-89f6-29b30800a3c7' > /var/lib/ceph/osd/ceph-0/fsid
Restart:
systemctl reset-failed ceph-osd@0
systemctl start ceph-osd@0
systemctl status ceph-osd@0
Result:
● [email protected] - Ceph object storage daemon osd.0
Active: active (running)
Main PID: 28977 (ceph-osd)
Memory: 293.6M
Status: ✅ OSD started successfully!
Diagnosis and Resolution - Phase 7: Ceph Verification
Verification 7.1: Cluster Status
Check after 10 seconds:
sleep 10
ceph -s
Result:
cluster:
id: 79e21f45-2ab5-4caa-8848-30d0ee790bc8
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
1 pool(s) have no replicas configured
OSD count 1 < osd_pool_default_size 3
services:
mon: 3 daemons, quorum node1,node2,node3 (age 23m)
mgr: node1(active, since 23m)
osd: 1 osds: 1 up (since 27s), 1 in (since 4M)
data:
pools: 1 pools, 32 pgs
objects: 970.87k objects, 3.7 TiB
usage: 3.7 TiB used, 2.3 TiB / 6.0 TiB avail
pgs: 29 active+clean
2 active+clean+scrubbing
1 active+clean+scrubbing+deep
Key Metrics:
- ✅ OSD: 1 up, 1 in - ACTIVE
- ✅ Objects: 970.87k objects, 3.7 TiB - DATA AVAILABLE!
- ✅ PGs: 29 active+clean - placement groups healthy
- ✅ Scrubbing: automatic data integrity verification
Status: ✅ Ceph fully functional!
Verification 7.2: RBD Image List
Check image availability:
rbd ls rbd | head -30
Result:
base-template-disk-0
vm-10025-disk-0
vm-10026-disk-0
...
vm-10061-cloudinit
vm-10061-disk-0
vm-10061-disk-1
vm-10061-disk-2
...
vm-10068-disk-0
...
vm-10230-disk-0
Status: ✅ All RBD images available (30+ images)
Diagnosis and Resolution - Phase 8: Storage Restoration
Solution 8.1: Enable ceph-rbd Storage in Proxmox
Edit storage configuration:
# Fix /etc/pve/storage.cfg
cat > /etc/pve/storage.cfg << 'EOF'
dir: local
path /var/lib/vz
content vztmpl,iso,snippets
shared 0
zfspool: local-zfs
pool rpool/data
content rootdir
sparse 1
dir: shared-nfs
path /mnt/shared/proxmox
content import,rootdir,snippets,images
nodes node2,node1,node3
prune-backups keep-all=1
shared 1
lvmthin: ssd-thin
thinpool thin_pool
vgname vg_data
content images,rootdir
nodes node2,node1,node3
pbs: backup-server
datastore local-backup
server 10.0.0.100
content backup
fingerprint xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
namespace proxmox-cluster
nodes node1,node3,node2
prune-backups keep-all=1
username backup-user@pbs
rbd: ceph-rbd
content rootdir,images
nodes node3,node1,node2
pool rbd
EOF
Status: ✅ Storage ceph-rbd active (removed disable 1 flag)
Diagnosis and Resolution - Phase 9: VM/Container Testing
Test 9.1: Start Container 10068
Start container:
pct start 10068
sleep 5
pct status 10068
Result:
status: running
Status: ✅ Container running!
Test 9.2: Verify VM 10061
Check status:
qm status 10061
Result:
status: running
Status: ✅ VM running (was already started earlier)
Final Summary
State After Repair
Ceph cluster:
Health: HEALTH_WARN (non-critical warnings)
OSD: 1 osds: 1 up, 1 in
Data: 970.88k objects, 3.7 TiB
PGs: 30 active+clean, 2 scrubbing+deep
I/O: active operations (25 KiB/s rd, 153 KiB/s wr)
VMs/Containers:
- All 27 VMs/containers using Ceph have access to data
- Critical machines (10061, 10068) tested and working
Warnings (non-critical):
- insecure global_id reclaim - security issue, can be fixed later
- no replicas configured - only 1 OSD, no replication (normal for single-OSD setup)
- OSD count 1 < osd_pool_default_size 3 - default config requires 3 OSDs
Complete List of Executed Steps
1. Disable Autostart for Ceph Containers
# Modify 27 configuration files
sed -i 's/^onboot: 1$/onboot: 0/' /etc/pve/nodes/*/lxc/*.conf
sed -i 's/^onboot: 1$/onboot: 0/' /etc/pve/nodes/*/qemu-server/*.conf
2. Restart All Nodes
# Sequentially: node3 -> node2 -> node1
ssh node3 "reboot"
# Wait for boot
ssh node2 "reboot"
# Wait for boot
ssh node1 "reboot"
3. Restore iSCSI
# Login to all targets
iscsiadm -m node -L all
# Verify 8 sessions
iscsiadm -m session
4. Activate Multipath
# Automatically created dm-13
multipath -ll
5. Activate LVM
# Scan and activate VG
pvscan
vgscan
vgchange -ay ceph-9763b04b-a846-47bc-984c-3c9da95d7329
6. Create OSD Metadata
cd /var/lib/ceph/osd/ceph-0
# Symlink to device
ln -sf /dev/ceph-9763b04b-a846-47bc-984c-3c9da95d7329/osd-block-b59f609a-4c09-4795-89f6-29b30800a3c7 block
# Metadata files
echo bluestore > type
echo 0 > whoami
echo ready > ready
echo 'b59f609a-4c09-4795-89f6-29b30800a3c7' > fsid
echo '79e21f45-2ab5-4caa-8848-30d0ee790bc8' > ceph_fsid
# Keyring (already existed)
# /var/lib/ceph/osd/ceph-0/keyring
# Permissions
chown ceph:ceph /var/lib/ceph/osd/ceph-0/*
chmod 644 type whoami ready fsid ceph_fsid
7. Start OSD
systemctl reset-failed ceph-osd@0
systemctl start ceph-osd@0
systemctl enable ceph-osd@0
8. Enable Proxmox Storage
# Edit /etc/pve/storage.cfg
# Remove 'disable 1' line from ceph-rbd section
9. Test VMs/Containers
pct start 10068 # Container
qm status 10061 # VM (already running)
Key Error Logs and Solutions
Error 1: Missing Keyring
Error: auth: unable to find a keyring on /var/lib/ceph/osd/ceph-0/keyring
Solution: ceph auth get osd.0 -o /var/lib/ceph/osd/ceph-0/keyring
Error 2: Missing type File
Error: missing 'type' file and unable to infer osd type
Solution: echo bluestore > /var/lib/ceph/osd/ceph-0/type
Error 3: Missing fsid
Error: bluestore(/var/lib/ceph/osd/ceph-0) _open_fsid (2) No such file or directory
Solution: echo 'b59f609a-4c09-4795-89f6-29b30800a3c7' > /var/lib/ceph/osd/ceph-0/fsid
Error 4: Incorrect UUID in fsid
Error: osd_uuid=b59f609a... need=79e21f45... (mixed OSD UUID with cluster UUID)
Solution: Use OSD UUID (not cluster) in fsid file
Error 5: Missing dm-13 Device
Error: ls: cannot access '/dev/dm-13': No such file or directory
Solution: Restore iSCSI sessions (iscsiadm -m node -L all)
Ceph Storage Architecture in This Environment
SAN Storage Array (10.0.1.x, 10.0.2.x)
|
|-- iSCSI Targets (8 LUNs)
| |
| |-- controller-a: 4 paths
| `-- controller-b: 4 paths
|
v
Proxmox node1
|
|-- iSCSI Initiator (8 sessions)
| |
| `-- Disks: sdf, sdg, sdh, sdi, sdj, sdk, sdl, sdm
|
|-- Multipath (dm-13)
| |
| `-- Device: /dev/mapper/ceph-lun-1 (6TB)
|
|-- LVM
| |
| |-- VG: ceph-9763b04b-a846-47bc-984c-3c9da95d7329
| `-- LV: osd-block-b59f609a-4c09-4795-89f6-29b30800a3c7
| |
| `-- /dev/dm-14
|
|-- Ceph OSD.0
| |
| |-- BlueStore backend
| |-- Data: 3.7 TiB used
| `-- Objects: 970k
|
`-- Ceph RBD Pool 'rbd'
|
`-- 30+ VM/container images
Conclusions and Recommendations
What Went Wrong
- No automatic iSCSI login - after node restart, iSCSI sessions weren't restored automatically
- Empty OSD directory - OSD metadata was previously deleted (probably during some repair attempt)
- Ceph container autostart - containers tried to start during boot before Ceph availability
Recommendations
-
Enable automatic iSCSI login:
bash # For each target in /etc/iscsi/nodes/ sed -i 's/^node.startup = manual$/node.startup = automatic/' /etc/iscsi/nodes/*/*/default -
Backup OSD metadata:
bash # Regular backups of directory tar czf /root/ceph-osd-0-metadata-$(date +%Y%m%d).tar.gz /var/lib/ceph/osd/ceph-0/ -
Monitor iSCSI and Ceph:
bash # Add to cron check for iSCSI sessions */5 * * * * /usr/local/bin/check-iscsi-sessions.sh -
Dependency documentation:
- Ceph containers should have delayed start
-
Or systemd dependency on ceph-osd
-
Keep autostart disabled for Ceph containers or add delay:
bash # In container config add: startup: order=100,up=300 # 300 second delay before start
Automated Recovery Script After Restart
Location: /usr/local/bin/ceph-recovery.sh
#!/bin/bash
# Automatic Ceph recovery after restart
echo "=== Ceph Recovery Script ==="
date
# 1. Check and restore iSCSI sessions
echo "Checking iSCSI sessions..."
if [ $(iscsiadm -m session | wc -l) -lt 8 ]; then
echo "Restoring iSCSI sessions..."
iscsiadm -m node -L all
sleep 5
fi
# 2. Check multipath
echo "Checking multipath..."
if [ ! -e /dev/mapper/ceph-lun-1 ]; then
echo "ERROR: Multipath device not found!"
multipath -r
sleep 5
fi
# 3. Activate LVM
echo "Activating Ceph LVM..."
vgchange -ay ceph-9763b04b-a846-47bc-984c-3c9da95d7329
# 4. Check OSD
echo "Checking OSD status..."
if ! systemctl is-active --quiet ceph-osd@0; then
echo "Starting OSD..."
systemctl start ceph-osd@0
fi
# 5. Wait for Ceph
echo "Waiting for Ceph cluster..."
for i in {1..30}; do
if ceph health &>/dev/null; then
echo "Ceph cluster is responding"
break
fi
sleep 2
done
# 6. Display status
echo "=== Final Status ==="
ceph -s
echo "=== Recovery Complete ==="
Add to systemd:
# /etc/systemd/system/ceph-recovery.service
[Unit]
Description=Ceph Recovery After Reboot
After=network-online.target iscsid.service
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/ceph-recovery.sh
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
systemctl enable ceph-recovery.service
Checklists
Checklist: Single Node Restart
- [ ] Check active VMs/containers on node
- [ ] Migrate critical machines to other nodes
- [ ] Perform restart
- [ ] After boot check iSCSI sessions:
iscsiadm -m session - [ ] Check multipath:
multipath -ll - [ ] Check LVM:
lvs | grep ceph - [ ] Check OSD:
systemctl status ceph-osd@0 - [ ] Check Ceph:
ceph -s
Checklist: Full Cluster Restart
- [ ] Disable autostart for VMs/CTs with Ceph (
onboot: 0) - [ ] Restart node3
- [ ] Wait for node3 boot + check Ceph
- [ ] Restart node2
- [ ] Wait for node2 boot + check Ceph
- [ ] Restart node1 (MON+MGR+OSD)
- [ ] Wait for node1 boot
- [ ] Execute recovery:
/usr/local/bin/ceph-recovery.sh - [ ] Check
ceph -s - [ ] Start critical machines manually
Checklist: OSD Problem Diagnosis
- [ ] Check iSCSI sessions:
iscsiadm -m session - [ ] Check disks:
lsblk | grep sd[f-m] - [ ] Check multipath:
ls -la /dev/dm-13 - [ ] Check LVM:
pvs; vgs; lvs - [ ] Check OSD directory:
ls -la /var/lib/ceph/osd/ceph-0/ - [ ] Check logs:
journalctl -u ceph-osd@0 -n 50 - [ ] Check status:
systemctl status ceph-osd@0 - [ ] Check cluster:
ceph -s