using PCIe direct IO with ldoms

Recent versions of logical domains (or Oracle VM for SPARC) allow you to assign single PCIe devices to a guest ldom so IO from that ldom does not have to go through the primary domain. I am setting this up for 2 FC HBA on a T4-2 system with two domains, one for prod and one for test. Assigning DIO devices to a guest domain (which then becomes an IO-Domain) will prevent you from doing live migration of this domain and it will also provide a new dependancy to the primary domain because if the primary goes down or reboots, so does the PCI bus and with it the access to the HBA. But since we also boot from a ZFS provided by the primary domain, this dependancy was already there aswell. Another option would be to assign a whole PCIe bus to a guest domain (making it a so-called root domain) but extra caution needs to be taken if the primary domains boots from a disk controller attached to the PCIe bus to be shared. And some more thaught needs to be put into your networking configuration aswell.
The whole process is documented well, this post basically repeats the steps that I have taken and adds the multipath configuration from the guest domain.

The first step is to identify the device names of these FC adapters using ldm list-io from the primary domain (abbreviated output below).

root@primary:~# ldm list-io -l
NAME                                      TYPE   BUS      DOMAIN   STATUS   
----                                      ----   ---      ------   ------   
pci_0                                     BUS    pci_0    primary           
[pci@400]
niu_0                                     NIU    niu_0    primary           
[niu@480]
pci_1                                     BUS    pci_1    primary           
[pci@500]
niu_1                                     NIU    niu_1    primary           
[niu@580]
/SYS/MB/PCIE0                             PCIE   pci_0    primary  OCC      
[pci@400/pci@2/pci@0/pci@8]
    SUNW,qlc@0/fp/disk
    SUNW,qlc@0/fp@0,0
    SUNW,qlc@0,1/fp/disk
    SUNW,qlc@0,1/fp@0,0
/SYS/MB/PCIE1                             PCIE   pci_1    primary  OCC      
[pci@500/pci@2/pci@0/pci@a]
    SUNW,qlc@0/fp/disk
    SUNW,qlc@0/fp@0,0
    SUNW,qlc@0,1/fp/disk
    SUNW,qlc@0,1/fp@0,0

So in my case, this is /SYS/MB/PCI0 and /SYS/MB/PCI1 of both PCI busses. So next we’ll enable IO virtualization on both busses and remove the devices from the primary ldom. The primary ldom will need to be rebooted after this.

root@primary:~# ldm start-reconf primary
Initiating a delayed reconfiguration operation on the primary domain.
All configuration changes for other domains are disabled until the primary
domain reboots, at which time the new configuration for the primary domain
will also take effect.
root@primary:~# ldm set-io iov=on pci_0
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------
root@primary:~# ldm set-io iov=on pci_1
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------
root@primary:~# ldm remove-io /SYS/MB/PCIE0 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------
root@primary:~# ldm remove-io /SYS/MB/PCIE1 primary
------------------------------------------------------------------------------
Notice: The primary domain is in the process of a delayed reconfiguration.
Any changes made to the primary domain will only take effect after it reboots.
------------------------------------------------------------------------------
root@primary:~# reboot -- -r 

After the reboot the device(s) will show up as unassigned.

root@priamry:~# ldm list-io -l /SYS/MB/PCIE0
NAME                                      TYPE   BUS      DOMAIN   STATUS   
----                                      ----   ---      ------   ------   
/SYS/MB/PCIE0                             PCIE   pci_0             OCC      
[pci@400/pci@2/pci@0/pci@8]
    SUNW,assigned-device@0
    SUNW,assigned-device@0,1

And we can now assign these devices to the guest domains. They need to be stopped first (test was not installed at this point). The last steps sets up the dependancy relationship to the primary ldom so that the guests are also reset if the primary reboots.

root@primary:~# ldm stop-domain ldom-prod
LDom ldom-prod stopped
root@primary:~# ldm stop-domain ldom-test
Remote graceful shutdown or reboot capability is not available on ldom-test
LDom ldom-test stopped
root@primary:~# ldm add-io /SYS/MB/PCIE0 ldom-prod
root@primary:~# ldm add-io /SYS/MB/PCIE1 ldom-test
root@primary:~# ldm set-domain failure-policy=reset primary
root@primary:~# ldm set-domain master=primary ldom-prod
root@primary:~# ldm set-domain master=primary ldom-test

Last step is to boot the guest back up, verify that the device is available there and set up multipathing.

root@primary:~# ldm start-domain ldom-prod
LDom ldom-prod started
root@primary:~# telnet localhost 5001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "ldom-prod" in group "ldom-prod" ....
Press ~? for control options ..
[... console login ...]
root@ldom-prod:~# prtdiag -v
System Configuration:  Oracle Corporation  sun4v SPARC T4-2
Memory size: 204800 Megabytes

================================ Virtual CPUs ================================


CPU ID Frequency Implementation         Status
------ --------- ---------------------- -------
0      2848 MHz  SPARC-T4               on-line  
1      2848 MHz  SPARC-T4               on-line  
2      2848 MHz  SPARC-T4               on-line  
3      2848 MHz  SPARC-T4               on-line  

================================ IO Devices ================================
Slot +            Bus   Name +                            Model      Speed   
Status            Type  Path                                                 
----------------------------------------------------------------------------
PCIE0             PCIE  SUNW,qlc-pciex1077,2532           QLE2562    5.0GTx4
                        /pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0       
PCIE0             PCIE  SUNW,qlc-pciex1077,2532           QLE2562    5.0GTx4
                        /pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0,1     

root@ldom-prod:~# stmsboot -e

WARNING: stmsboot operates on each supported multipath-capable controller
         detected in a host. In your system, these controllers are

/pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0/fp@0,0
/pci@400/pci@2/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0

If you do NOT wish to operate on these controllers, please quit stmsboot
and re-invoke with -D { fp | mpt | mpt_sas | pmcs} to specify which controllers you wish
to modify your multipathing configuration for.

Do you wish to continue? [y/n] (default: y) y
WARNING: This operation will require a reboot.
Do you want to continue ? [y/n] (default: y) y
The changes will come into effect after rebooting the system.
Reboot the system now ? [y/n] (default: y) y

And after that we can use the FC HBA directly from our ldom with multipathing.

8 thoughts on “using PCIe direct IO with ldoms

  1. hi
    IMHO, younuse the pcie bus too lossely
    in your example, you assign pcie slots(end point) to the io-domain , not the pcie bus

  2. yes, in this example, I only assigned a single endpoint device to my ldom. One could also assign a whole PCIe bus (that feature has been included for a while) but I did not want to do this in this case. The primary ldom boots from disks on an HBA that is also on one of those busses and I also preferred to virtualize the NICs in the primary domain.

  3. Hi,
    nice work.
    i have a T4-2. as you said there are 2 pcie buses.
    when i divide the both pcie buses into 2 domain.
    like pcie0 to control domain and pcie1 to alternate io domain.
    i just need to know that which pice slots will be connected to pice bus0 and which slots will be connected to pcie bus1??

  4. Pingback: T4-2 PCIe physical slot layout | portrix systems

  5. does this work with Solaris 10 or you need solaris 11 to enable this functionality? Please HELP, Many thanks

  6. How to find boot device of secondary control domain….when secondary cdom is not booting up..

Leave a Reply

Your email address will not be published. Required fields are marked *