Saturday, June 5, 2010

Linux Logical Volume Management

There are a few reasons for using Logical Volume Management; extending the capacity of a file system beyond the available physical spindles by spanning disks, using it to have more dynamic control over disk capacity for example by adding or removing a drive, or to create backups in the form of snapshots. LVM can be applied against any block device such as a physical drive, software raid, or external hardware raid device. The file system is still separate, however, it must be managed in conjunction with LVM to make use of the available block appropriately.

In general there are three basic components:

Physical Disk
  • Initially, each drive is simply marked as available for use in a volume group. This writes a Universally Unique Identifier (UUID) to the initial sectors of the disk and prepares it to receive a volume group

Volume Group
  • A collection of physical disks (or partitions if desired). When created this will designate physical extents to all of its member disks, the default being 4MB. It will also record information about all other physical disks in the group and any logical volumes present.

Logical Volume
  • Most of the work happens at this layer. A logical volume is a mapping between a set of physical extents (PE) from the disk to a set of logical extents (LE). The size of these are always the same and generally the quantity matches one to one. However, it is possible to have two PEs mapping to one LE if mirroring is used.

In the example shown, there is one volume group with two physical drives and two logical volumes mapped. Physical blocks that are not assigned to a logical drive are free and can be used to expand either logical drive at a later time.



Creating a Logical Drive
I a not going to bother with mirrored or stripped volumes. You could make a case for a stripe to increase performance, however, in general I believe it is better to use either the hardware or software raid functions available as they are better suited for that purpose. The steps are fairly simple, mark the device with pvcreate, create a volume group and then assign a logical volume. Depending on how big your volume group is, you may want to consider altering the default physical extent size. The man page for vgcreate states, if the volume group metadata uses lvm2 format those restrictions [65534 extents in each logical volume] do not apply, but having a large number of extents will slow down the tools but have no impact on I/O performance to the logical volume. So if I was creating a terabyte or larger volume, its probably a good idea to increase this to 64MB or even 128MB.

# pvcreate /dev/sdb
No physical volume label read from /dev/sdb
Physical volume "/dev/sdb" successfully created
# pvcreate /dev/sdc
No physical volume label read from /dev/sdc
Physical volume "/dev/sdc" successfully created
# vgcreate -s 16M datavg /dev/sdb /dev/sdc
Volume group "datavg" successfully created
# pvdisplay /dev/sdb
--- Physical volume ---
PV Name /dev/sdb
VG Name datavg
PV Size 10.00GB / not Usable 16.00MB
Allocatable Yes
PE Size (KByte) 16384
Total PE 639
Free PE 639
Allocated PE 0
PV UUID apk7wQ-V9B2-vHVo-L5Yz-81U0-orx7-F8J0MI
# vgdisplay datavg
--- Volume group ---
VG Name datavg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 19.97GB
PE Size 16.00MB
Total PE 1278
Alloc PE / Size 0 / 0
Free PE / Size 1278 / 19.97GB
VG UUID Glyv9C-qRog-YZVk-08nR-csMe-quMp-A3Ksby

As you can see in this example, the volume group named datavg has two member disks each 10GB in size. I selected a different physical extent size not because I had to, just to show how it is done. You will also notice that the available PE size is one less than the total drive space. This is to accommodate the volume group metadata mentioned earlier. You can actually read this data yourself if you like.
# dd if=/dev/sdb of=vg_metadata bs=16M count=1
# strings vg_metadata

The last step is to create the Logical Volume itself. There are a myriad of options available depending on what you want to accomplish, the important ones are:

-L size[KMGTPE]
  • Specifies a size in kilobytes, megabytes, gigabytes, terabytes, petabytes, or exabytes. Let me know if you actually use the last two.

-l size
  • Specifies the size in extents. In this case 16MB each. You can also specify as a percentage of either the Volume Group, free space in the volume group, or free space for the physical volumes with %VG, %FREE, or %PVS respectively.

-n string
  • Gives a name to your logical volume

-i Stripes
  • Number of stripes to use. As I mentioned earlier, you should probably use raid to perform this functionality, but if you must, this should be equal to the number of spindles present in the volume group

-I stripeSize
  • The stripe depth in KB to use for each disk

Here is an example for a simple volume, and then a striped volume

# lvcreate -L 5G -n datalv datavg
Logical volume "datalv" created
# lvdisplay
--- Logical volume ---
LV Name /dev/datavg/datalv
VG Name datavg
LV UUID fCoaFl-7aQY-CX5U-zDwO-at52-udkI-ke6CZn
LV Write Access read/write
# open 0
LV size 5.00GB
Current LE 320
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 1024
Block device 253:0
# lvcreate -L 10G -i 2 -I 64 -n stripedlv datavg

If you are going to use striped volumes you should probably only use striped as it requires the proper number of blocks free on each physical volume. Once we have a volume we need a file system. For this exercise I am going to use ext4, but you can use what you like.

# mkfs.ext4 /dev/datavg/datalv
# mkdir /data
# mount /dev/datavg/datalv /data

Expanding a Logical Volume
# pvcreate /dev/sdd
# vgextend datavg /dev/sdd
Volume group "datavg" successfully extended
# lvresize -L 15G /dev/datavg/datalv
Extending logical volume datalv to 15.00 GB
Logical volume datalv successfully resized
# resize2fs /dev/datavg/datalv
resize2fs 1.41.9 (22-Aug-2009)
Resizing the filesystem on /dev/datavg/datalv to 3932160 (4k) blocks.
The filesystem on /dev/datavg/datalv is now 3932160 blocks long.

Depending on the state of your file system, you may not be able to expand online. You can check the output of tune2fs to ensure GDT blocks have been set aside, without those you will for sure have to be offline. For example, tune2fs -l /dev/datavg/datalv. You may also get a warning to run e2fsck first. The man page warns of running this on-line, so again you are probably best served by unmounting the file system first. If this was a system disk that generally means dropping back down to single user mode.

Reducing a Logical Volume
Before embarking on this journey, ensure you manage the file system first, which for the ext series anyway, means you have to have it unmounted. Once that is done you can go ahead and shrink the logical volume as shown here.

# umount /data
# resize2fs /dev/datavg/datalv 10g
resize2fs 1.41.9 (22-Aug-2009)
Resizing the filesystem on /dev/datavg/datalv to 2621440 (4k) blocks.
The filesystem on /dev/datavg/datalv is now 2621440 blocks long.
# lvreduce -L 10g /dev/datavg/datalv
WARNING: Reducing active and open logical volume to 10.00 GB
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce datalv? [y/n]: y
Reducing logical volume datalv to 10.00 GB
Logical volume datalv successfully resized

Again you may be prompted to check your file system but it's unmounted anyway, so it shouldn't be a problem. If the file system is highly fragmented the resize process can take quite a while so be prepared.

Snap shots
Another benefit of lvm is the ability to take point in time images of your file system. Snaps use a copy of write technology where a block that is about to be overwritten or changed is first copied to a new location and then allowed to be altered. This can cause a performance problem on writes which can compound as more snaps are added so bear that in mind. You will also have to set aside some space within the volume group for this purpose. The amount really depends on how many changes you are making, but 10-20% is probably a good starting point. For this example I am going to use 1G as I don't expect many changes.

# lvcreate -L 1g -s -n datasnap1 /dev/datavg/datalv 
Logical volume "datasnap" created

Notice the -s entry for snapshot and that the target isn't the volume group but rather the logical volume desired. It appears there is a bug in OpenSuSE that may be present in other distributions. It prevents the snap from being registered with the event monitor, to alert when full or reaching capacity. If you get this message you will have to upgrade both lvm2 and device-mapper packages as it was compiled against the wrong library versions.

OpenSuSE error:
datavg-datasnap: event registration failed: 10529:3 libdevmapper-event-lvm2snapshot.so.2.02 dlopen failed: /lib64/libdevmapper-event-lvm2snapshot.so.2.02: undefined symbol: lvm2_run
datavg/snapshot0: snapshot segment monitoring function failed.


To use your new snap, simply mount it like you would any other file system with mount /dev/datavg/datasnap1 /datasnap. You can view the snap useage through lvdisplay from the allocated to snapshot field.

# lvdisplay /dev/data/datasnap1
--- Logical volume ---
LV Name /dev/datavg/datasnap1
VG Name datavg
LV UUID 82IA4M-Md6s-MEI6-iNPW-6wFb-8pzD-eCQqmS
LV Write Access read/write
LV snapshot status active destination for /dev/datavg/datalv
LV Status available
# open 0
LV Size 20.00 GB
Current LE 5120
COW-table size 1.00 GB
COW-table LE 256
Allocated to snapshot 68.68%
Snapshot chunk size 4.00 KB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0

If the snap reserve space fills completely it will not be deleted but marked invalid and cannot be read from, even if it is currently mounted. Snaps aren't good forever but as a point in time image they can be invaluable for providing specific backup scenarios like quick reference points for database backups. Instead of moving the active file system to tape you can quiesce the database, snap it, and return it to normal operations and then perform a backup from the snapshot.

Moving Volume Groups
A handy utility that I have used many times under AIX is also available under Linux; the ability to move a volume group from one system to the next.

# umount /data
# vgchange -an datavg
0 logical volume(s) in volume group "datavg" now active
# vgexport datavg
Volume group "datavg" successfully exported

Shutdown the machine before removing and assigning to another machine.
# pvscan
PV /dev/sdb is in exported VG datavg [10.00 GB / 0 free]
PV /dev/sdc is in exported VG datavg [10.00 GB / 1.99 GB free]
Total: 2 [19.99 GB] / in use: 2 [19.99 GB] / in no VG: 0 [0 ]
# vgscan
Reading all physical volumes. This may take a while...
Found exported volume group "datavg" using metadata type lvm2
# vgimport datavg
Volume group "datavg" successfully imported

You should now be able to mount your file system on the new machine.

Other Commands
Some other important commands for volume management
# lvremove logical_volume_path
e.g. lvremove /dev/datavg/datasnap1
# pvremove device
e.g. pvremove /dev/sdd
# pvmove device
moves data from an existing drive to free extents on other disks in the volume group
e.g. pvmove /dev/sdc
# vgreduce volume_group device
removes a device from a volume group
e.g. vgreduce datavg /dev/sdc
# pvremove device
removes a physical device from lvm
e.g. pvremove /dev/sdc

No comments:

Post a Comment