EC2 stuck in "Initializing"

Updated Feb 27, 2022 ·

Encountered after Stratis lab

After manually mounting a stratis pool, I unmount it to try to mount it persistenty by adding its entry to fstab. However after I reboot, the EC2 instance gets stuck to 'initializing'.

I tried to stop and start the instance through the console. It now returned a '1/2 checks passed'.

After some searching, I found this two online:

Checked system logs,

Nothing on it seems to show what caused the issue. Also searched specifically for 'xvd' to show all lines pertaining to the block devices but I don't seem to see anything wrong.

Launched a 2nd EC2 and attached 1st EC2's block device

I first stopped tst-rhel-1 and launched a new EC2 in the same zone, tst-rhel-a2. I then detached the first EC2's root volume and attached it the second as root. I'll try to check if I can get the second up using tst-rhel-1's volume.

It too return a 1/2 checks passsed. Checked the system logs, and I also don't seem anything wrong with it.

Troubleshooting the correct way

After a day, i came back to the problem and returned the root volume back to the tst-rhel and started again. This time I followed the proper troubleshooting guide: Troubleshoot an unreachable instance

System Logs

I opened the system logs again and search for 'unknown', 'error', and 'invalid'. None seem to reflect the root issue (or maybe I still don't know which line may have been telling me something).

In any case, here's the system logs.

[    1.453285] Freeing initrd memory: 68360K
[    1.478357] Initialise system trusted keyrings
[    1.491708] Key type blacklist registered
[    1.497091] workingset: timestamp_bits=36 max_order=18 bucket_order=0

Give root password for maintenance
(or press Control-D to continue): 

Instance screenshot

The next one I saw was taking a screenshot of the instance while it was booting.

The lines that caught my eye was these:

After some searching, I found this: Timed out waiting for device dev-disk-by\x2duuid-C829\x2dC4C1.device

Which showed almost the exact same lines,

 [ TIME ] Timed out waiting for device dev-disk-by\x2duuid-C829\x2dC4C1.device.
 [DEPEND] Dependency failed for file system check on /dev/disk/by-uuid/C829-C4C1.
 [DEPEND] Dependency failed for /boot/efi.
 [DEPEND] Dependency failed for Local File System.

From the same link,

As a filesystem with the old UUID no longer exists udev fails to find it and you get this error. Update the UUID in /etc/fstab using blkid and your system will manage to correctly boot again as this error will be gone.

Launched a new instance and attached the bad host's root volume

Launch a new instance for the second time around and attached the bad host's root volume but this time, I attached it as a secondary volume to the new instance.

Using /dev/sdx as device name:

Both EBS showing "Attached" and no issues.

Created a mount point and mounted the added block device

[ec2-user@ip-172-31-47-163 ~]$ sudo su
[root@ip-172-31-47-163 ec2-user]#
[root@ip-172-31-47-163 ec2-user]# ll /mnt/
total 0
[root@ip-172-31-47-163 ec2-user]# mkdir /mnt/dummy
[root@ip-172-31-47-163 ec2-user]# lsblk
NAME    MAJ:MIN  RM SIZE RO TYPE MOUNTPOINT
xvda    202:0     0   8G  0 disk
└─xvda1 202:1     0   8G  0 part /
xvdx    202:5888  0  50G  0 disk
├─xvdx1 202:5889  0   1M  0 part
└─xvdx2 202:5890  0  50G  0 part
[root@ip-172-31-47-163 ec2-user]# mount /dev/xvdx2 /mnt/dummy/
[root@ip-172-31-47-163 ec2-user]# ll /mnt/
total 0
dr-xr-xr-x. 18 root root 251 Dec 10 13:05 dummy

[root@ip-172-31-47-163 ec2-user]# ll /mnt/dummy/etc/fstab
-rw-r--r--. 1 root root 959 Dec 31 14:09 /mnt/dummy/etc/fstab

I proceeded to editing the /etc/fstab in the dummy directory. Commented out the line for Stratis.

[root@ip-172-31-47-163 ec2-user]# vim  /mnt/dummy/etc/fstab

#
# /etc/fstab
# Created by anaconda on Tue May  4 17:21:38 2021
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=d35fe619-1d06-4ace-9fe3-169baad3e421 /                       xfs     defaults        0 0
UUID=e6bcc068-628c-4555-b06e-9cda9563cf8c swap                    swap    defaults        0 0

# SWAP
/dev/xvdb2                                swap                    swap    defaults        0 0

# LVM
/dev/vgdata/lvdata                        /mnt/diskblvm           xfs     defaults        0 0

# STRATIS
# UUID="7c73f271-8b0a-4aa2-957e-9686b3733f3a" /mnt/diskmyfs       xfs     defaults        0 0

After this I unmounted the volume again and attached it back to the bad host.

[root@ip-172-31-47-163 ec2-user]# umount /mnt/dummy/
[root@ip-172-31-47-163 ec2-user]# ll /mnt/
total 0
drwxr-xr-x 2 root root 6 Jan  2 10:38 dummy

Attached volume back to the bad host

Attached the volume back to the bad host as /dev/sda1

While waiting for it to boot up, I monitored the instance screenshot again. It looked promising because it showed the login stage!

And finally!

EDIT: This actually didn't worked. There was nothign wrong with what I added in /etc/fstab. I tried both the UUID and the '/dev' path but they both returned the same error.

Slow boot: "a start job is running for dev-disk-by..."

After hours spent trying stuff and searching online, I found the solution that worked: Slow boot - "a start job is running for dev-disk-by..."

From the link:

If you have an external device configured to automount (usually with a nofail option in it), add this to the option to the device: x-systemd.device-timeout=1ms. This sets the wait time of the device to be mounted on boot time to 1ms of the default 90 seconds.

What I did:

$ vim /etc/fstab

#
# /etc/fstab
# Created by anaconda on Tue May  4 17:21:38 2021
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=d35fe619-1d06-4ace-9fe3-169baad3e421 /                       xfs     defaults        0 0
UUID="49e5d8a1-78e5-4fde-ad6d-b5692f697058"     /mnt/diskfs       xfs     nofail,x-systemd.device-timeout=1ms 0 0

I unmounted the block device and rebooted.

$ mount | grep diskfs
/dev/mapper/stratis-1-1f2ae191f25d4715bf14e80448521775-thin-fs-49e5d8a178e54fdead6db5692f697058 on /mnt/diskfs type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=128k,sunit=256,swidth=2048,noquota)

$ umount /mnt/diskfs
$ mount | grep diskfs
$ reboot

While rebooting, I monitored the instance screenshot through the EC2 console. The dependency errors didn't appeared anymore.

Logged-in to the instance and checked if the stratis pool was mounted,

$ mount | grep diskfs
/dev/mapper/stratis-1-1f2ae191f25d4715bf14e80448521775-thin-fs-49e5d8a178e54fdead6db5692f697058 on /mnt/diskfs type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=128k,sunit=256,swidth=2048,noquota)

Noice!

Encountered after Stratis lab​

Launched a 2nd EC2 and attached 1st EC2's block device​

Troubleshooting the correct way​

System Logs​

Instance screenshot​

Launched a new instance and attached the bad host's root volume​

Created a mount point and mounted the added block device​

Attached volume back to the bad host​

Slow boot: "a start job is running for dev-disk-by..."​