Linux experts Seravo background Linux Debian SUSE
Seravo blog: Linux and open source – technology and strategy

Virtualized bridged networking with MacVTap

Introduction

A virtual machine typically needs to be connected to a network to be useful. Because a virtual machine runs as an application inside the host computer, connecting it to the outside world needs support from the host operating system. There are a number of options for networking a virtual machine, both on the Link Layer and the Network layer. Please refer to the documentation of the virtualization system you are using (e.g. QEMU, KVM, etc.) The references list below also contains pointers to additional information.

MacVTap

In this article we’ll focus on a relatively new Linux device driver designed to ease the task of networking virtual machines: Mavtap. Macvtap is essentially a combination of the Macvlan driver and a Tap device. This probably does not say much to the uninitiated, so let’s see what it all means.

The Macvlan driver is a separate Linux kernel driver that the Macvtap driver depends on. Macvlan makes it possible to create virtual network interfaces that “cling on” a physical network interface. Each virtual interface has its own MAC address distinct from the physical interface’s MAC address. Frames sent to or from the virtual interfaces are mapped to the physical interface, which is called the lower interface.

Tap interfaces

 

A Tap interface is a software-only interface. Instead of passing frames to and from a physical Ethernet card, the frames are read and written by a user space program. The kernel makes the Tap interface available via the /dev/tapN device file, where N is the index of the network interface.

A Macvtap interface combines the properties of  these two; it is an virtual interface with a tap-like software interface. A Macvtap interface can be created using the ip command:

$ sudo ip link add link eth0 name macvtap0 type macvtap

This adds a new interface called macvtap0 as can be seen in the following listing:

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN mode DEFAULT
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
 link/ether 00:1f:d0:15:7b:e6 brd ff:ff:ff:ff:ff:ff
3: macvtap0@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500
 link/ether 42:96:80:ee:2d:23 brd ff:ff:ff:ff:ff:ff

The device file corresponding to the new macvtap interface with index 3 is /dev/tap3. This device file is created by udev.

$ ls -l /dev/tap3
 crw------- 1 root root 252, 1 Oct 21 12:10 /dev/tap3

A user space program can open this device file and use it to send and receive Ethernet frames over it. When the kernel transmits a frame via the interface macvtap0, instead of sending it to a physical Ethernet card,  it makes it available for reading from this file by the user space program. Correspondingly, when the user space program writes the content of an Ethernet frame to the file /dev/tap3, the kernel’s networking code sees the frame as if it had been received via the device macvtap0.

The user space program is normally an emulator like QEMU, which virtualizes network cards to the guest operating systems. When QEMU reads an Ethernet frame using the file descriptor, it emulates what a real network card would do. Typically it triggers an interrupt in the virtual machine, and the guest operating system can then read the frame from the emulated network card. The exact details on how this is done is dependent on the emulator and the guest operating system, and is not the focus of this article.

Macvtap is implemented in the Linux kernel, and must be configured when compiling the kernel, either as a module or as a built-in feature. The setting can be found under Device Drivers → Network device support → MAC-VLAN based tap driver. The tap driver is dependent on ‘MAC-VLAN support’ in the same category, so you need to enable that too.

A Macvtap device can function in one of three modes: Virtual Ethernet Port Aggregator (VEPA) mode, Bridge mode, and Private mode. The modes determine how the tap endpoints communicate between each other.

1. Virtual Ethernet Port Aggregator mode

In this mode, which is the default, data between endpoints on the same lower device are sent via the lower device (Ethernet card) to the physical switch the lower device is connected to. This mode requires that the switch supports ‘Reflective Relay’ mode, also known as ‘Hairpin’ mode. Reflective Relay means the switch can send back a frame on the same port it received it on. Unfortunately, most switches today do not yet support this mode.

Hairpin mode

 

2. Bridge mode

When the MacVTap device is in Bridge mode, the endpoints can communicate directly without sending the data out via the lower device. When using this mode, there is no need for the physical switch to support Reflective Relay mode.

3. Private mode

In Private mode the nodes on the same MacVTap device can never talk to each other, regardless if the physical switch supports Reflective Relay mode or not. Use this mode when you want to isolate the virtual machines connected to the endpoints from each other, but not from the outside network.

 

At a first glance, the VEPA mode seems a bit odd. What makes it a good idea to send out frames on the physical wire, only to be sent back to the Ethernet card via the same port on the switch? VEPA mode simplifies the task of the host computer by letting the physical switch do the switching, which the switch is very good at. A further advantage is that network administrators can monitor traffic between virtual machines using familiar tools on a managed switch, which would not be possible if the data never entered the switch.

Switches have not traditionally supported Reflective Relay mode, because the Spanning Tree Protocol (STP) has prevented it, and before the advent of virtualization it made no sense for a frame to be passed back through the same port.

Using MacVTap with libvirt

If you are using the libvirt (libvirt.org) toolkit to manage your virtual machines, add a network interface definition like the following in your domain XML file:

<devices>
   <interface type='direct'>
      <mac address='d0:0f:d0:0f:00:01'/>
       <source dev='eth0' mode='vepa'/>
   </interface>
   <!-- More devices here... -->
</devices>

Change the mode to ‘bridge’ if you don’t have a VEPA capable switch. Also make sure each tap interface has a unique and sensible value for the MAC  address.

This directive causes libvirt to create a Macvtap device associated with the specified source device. Libvirt also opens the corresponding device file (as described above) and passes the file descriptor to QEMU. Thus, when using libvirt, there is no need to create the tap interfaces by hand, as was shown in the example above.

Conclusion

Connecting virtual machines to a virtual switch as described above makes them present on the local network just as if they were physical machines connected to the LAN. They belong to the same subnet as the physical machines and their IP addresses can be configured by the same DHCP server as the physical machines. Note that the connection is at the data link layer (L2) and is thus independent of which network layer protocol is used on top of it. The network protocol can be IPv4, IPv6 or even IPX, if you wish.

References

Kernelnewbies.org Linux Virtualization Wiki — MacVTap

Linux information for IBM systems — Virtualization blueprints

Libvirt Domain XML format

Tun/Tap interface tutorial (background information on the tap interface)

Wikibon — Edge Virtual Bridging

 

7 thoughts on “Virtualized bridged networking with MacVTap

  1. Luca Francesca says:

    Very interesting, thanks ;)

  2. arl says:

    Nice article.

  3. Ilkka Tengvall says:

    Thanks for clear explanation!

  4. Anand says:

    Great Explanation – Gives you just the right overview of what you want to know.
    I was wondering if you could add the newer modes (Passthrough) to this article,
    along with some more information like ‘promiscuous’ mode.
    Regards

  5. anon says:

    Brilliant – thanks!!!
    Everything makes so much more sense now.

Leave a Reply

Your email address will not be published.