···11+Virtual Routing and Forwarding (VRF)22+====================================33+The VRF device combined with ip rules provides the ability to create virtual44+routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the55+Linux network stack. One use case is the multi-tenancy problem where each66+tenant has their own unique routing tables and in the very least need77+different default gateways.88+99+Processes can be "VRF aware" by binding a socket to the VRF device. Packets1010+through the socket then use the routing table associated with the VRF1111+device. An important feature of the VRF device implementation is that it1212+impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected1313+(ie., they do not need to be run in each VRF). The design also allows1414+the use of higher priority ip rules (Policy Based Routing, PBR) to take1515+precedence over the VRF device rules directing specific traffic as desired.1616+1717+In addition, VRF devices allow VRFs to be nested within namespaces. For1818+example network namespaces provide separation of network interfaces at L11919+(Layer 1 separation), VLANs on the interfaces within a namespace provide2020+L2 separation and then VRF devices provide L3 separation.2121+2222+Design2323+------2424+A VRF device is created with an associated route table. Network interfaces2525+are then enslaved to a VRF device:2626+2727+ +-----------------------------+2828+ | vrf-blue | ===> route table 102929+ +-----------------------------+3030+ | | |3131+ +------+ +------+ +-------------+3232+ | eth1 | | eth2 | ... | bond1 |3333+ +------+ +------+ +-------------+3434+ | |3535+ +------+ +------+3636+ | eth8 | | eth9 |3737+ +------+ +------+3838+3939+Packets received on an enslaved device and are switched to the VRF device4040+using an rx_handler which gives the impression that packets flow through4141+the VRF device. Similarly on egress routing rules are used to send packets4242+to the VRF device driver before getting sent out the actual interface. This4343+allows tcpdump on a VRF device to capture all packets into and out of the4444+VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied4545+using the VRF device to specify rules that apply to the VRF domain as a whole.4646+4747+[1] Packets in the forwarded state do not flow through the device, so those4848+ packets are not seen by tcpdump. Will revisit this limitation in a4949+ future release.5050+5151+[2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev5252+ set to real ingress device and egress is limited to NF_INET_POST_ROUTING.5353+ Will revisit this limitation in a future release.5454+5555+5656+Setup5757+-----5858+1. VRF device is created with an association to a FIB table.5959+ e.g, ip link add vrf-blue type vrf table 106060+ ip link set dev vrf-blue up6161+6262+2. Rules are added that send lookups to the associated FIB table when the6363+ iif or oif is the VRF device. e.g.,6464+ ip ru add oif vrf-blue table 106565+ ip ru add iif vrf-blue table 106666+6767+ Set the default route for the table (and hence default route for the VRF).6868+ e.g, ip route add table 10 prohibit default6969+7070+3. Enslave L3 interfaces to a VRF device.7171+ e.g, ip link set dev eth1 master vrf-blue7272+7373+ Local and connected routes for enslaved devices are automatically moved to7474+ the table associated with VRF device. Any additional routes depending on7575+ the enslaved device will need to be reinserted following the enslavement.7676+7777+4. Additional VRF routes are added to associated table.7878+ e.g., ip route add table 10 ...7979+8080+8181+Applications8282+------------8383+Applications that are to work within a VRF need to bind their socket to the8484+VRF device:8585+8686+ setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);8787+8888+or to specify the output device using cmsg and IP_PKTINFO.8989+9090+9191+Limitations9292+-----------9393+VRF device currently only works for IPv4. Support for IPv6 is under development.9494+9595+Index of original ingress interface is not available via cmsg. Will address9696+soon.