| All About Netgraph
 Part I: What is Netgraph? The motivation Imagine the following scenario: you are developing a TCP/IP router product based on
FreeBSD. The product needs to support bit-synchronous serial WAN connections, i.e.,
dedicated high speed lines that run up to T1 speeds, where the basic framing is done via
HDLC. You need to support the following protocols for the transmission of IP packets over
the wire: 
 
  IP frames delivered over HDLC (the simplest way to transmit IP) IP frames delivered over ``Cisco HDLC'' (basically, packets are prepended with a
    two-byte Ethertype, and there are also periodic keep-alive packets). IP delivered over frame relay (frame relay provides for up to 1000 virtual
    point-to-point links which are multiplexed over a single physical wire). IP inside RFC 1490 encapsulation over frame relay (RFC 1490 is a way to multiplex
    multiple protocols over a single connection, and is often used in conjunction with frame
    relay). Point-to-Point Protocol (PPP) over HDLC PPP over frame relay PPP inside RFC 1490 encapsulation over frame relay PPP over ISDN There are even rumors you might have to support frame relay over ISDN (!)  Figure 1 graphically indicates all of the possible combinations:    Figure 1: Ways to talk IP over synchronous serial and ISDN WAN connections
 This was the situation faced by Julian Elischer <julian@freebsd.org>
and myself back in 1996 while we were working on the Whistle
InterJet. At that time FreeBSD had very limited support for synchronous serial
hardware and protocols. We looked at OEMing from Emerging
Technologies, but decided instead to do it ourselves.  The answer was netgraph. Netgraph is an in-kernel networking subsystem that
follows the UNIX principle of achieving power and flexibility through combinations of
simple tools, each of which is designed to perform a single, well defined task. The basic
idea is straightforward: there are nodes (the tools) and edges that connect
pairs of nodes (hence the ``graph'' in ``netgraph''). Data packets flow
bidirectionally along the edges from node to node. When a node receives a data packet, it
performs some processing on it, and then (usually) forwards it to another node. The
processing may be something as simple as adding/removing headers, or it may be more
complicated or involve other parts of the system. Netgraph is vaguely similar to System V
Streams, but is designed for better speed and more flexibility.  Netgraph has proven very useful for networking, and is currently used in the Whistle
InterJet for all of the above protocol configurations (except frame relay over ISDN), plus
normal PPP over asynchronous serial (i.e., modems and TAs) and Point-to-Point Tunneling
Protocol (PPTP), which includes encryption. With all of these protocols, the data packets
are handled entirely in the kernel. In the case of PPP, the negotiation packets are
handled separately in user-mode (see the FreeBSD port for mpd-3.0b5).  Nodes and edges Looking at the picture above, it is obvious what the nodes and edges might be.  Less
obvious is the fact that a node may have an arbitrary number of connections to other
nodes. For example, it is entirely possible to have both IP, IPX, and PPP running inside
RFC 1490 encapsulation at the same time; indeed, multiplexing multiple protocols is
exactly what RFC 1490 is for. In this case, there would be three edges connecting into the
RFC 1490 node, one for each protocol stack. There is no requirement that data flow in any
particular direction or that a node have any limits on what it can do with a data packet.
A node can be a source/sink for data, e.g., associated with a piece of hardware, or it can
just modify data by adding/removing headers, multiplexing, etc.  Netgraph nodes live in the kernel and are semi-permanent. Typically, a node will
continue to exist until it is no longer connected to any other nodes. However, some nodes
are persistent, e.g., nodes associated with a piece of hardware; when the number of
edges goes to zero typically the hardware is shutdown. Since they live in the kernel,
nodes are not associated with any particular process.  Control messages This picture is still oversimplified. In real life, a node may need to be configured,
queried for its status, etc. For example, PPP is a complicated protocol with lots of
options. For this kind of thing netgraph defines control messages. A control
message is ``out of band data.'' Instead of flowing from node to node like data packets,
control messages are sent asynchronously and directly from one node to another. The two
nodes don't have to be (even indirectly) connected. To allow for this, netgraph provides a
simple addressing scheme by which nodes can be identified
using simple ASCII strings.  Control messages are simply C structures with a fixed header (a struct ng_mesg)
and a variable length payload. There are some control messages that all nodes understand;
these are called the generic control messages and are implemented in the base system.
For example, a node can be told to destroy itself or to make or break an edge. Nodes can
also define their own type-specific control messages. Each node type that defines its own
control messages must have a unique typecookie.  The combination of the typecookie
and command fields in the control message header determine how to interpret it. Control messages often elicit responses in the form of a reply control message.
For example, to query a node's status or statistics you might send the node a ``get
status'' control message; it then sends you back a response (with the identifying token
copied from the original request) containing the requested information in the payload
area. The response control message header is usually identical to the original header, but
with the reply flag set.  Netgraph provides a way to convert these structures to and from ASCII strings, making
human interaction easier.  Hooks In netgraph, edges don't really exist per se. Instead, an edge is simply an
association of two hooks, one from each node. A node's hooks define how that node
can be connected. Each hook has a unique, statically defined name that often indicates
what the purpose of the hook is. The name is significant only in the context of that node;
two nodes may have similarly named hooks.  For example, consider the Cisco HDLC node. Cisco HDLC is a very simple protocol
multiplexing scheme whereby each frame is prepended with its Ethertype before transmission
over the wire. Cisco HDLC supports simultaneous transmission of IP, IPX, AppleTalk, etc.
Accordingly, the netgraph Cisco HDLC node (see ng_cisco(8)) defines hooks
namedinet,atalk, andipx.
These hooks are intended to connect to the corresponding upper layer protocol engines. It
also defines a hook nameddownstreamwhich connects to the lower
layer, e.g., the node associated with a synchronous serial card. Packets received oninet,atalk, andipxhave the appropriate two byte header prepended,
and then are forwarded out thedownstreamhook. Conversely, packets received
ondownstreamhave the header stripped off, and are forwarded out the
appropriate protocol hook. The node also handles the periodic ``tickle'' and query packets
defined by the Cisco HDLC protocol. Hooks are always either connected or disconnected; the operation of connecting or
disconnecting a pair of hooks is atomic. When a data packet is sent out a hook, if that
hook is disconnected, the data packet is discarded.  Some examples of node types Some node types are fairly obvious, such as Cisco HDLC. Others are less obvious but
provide for some interesting functionality, for example the ability to talk directly to a
device or open a socket from within the kernel.  Here are some examples of netgraph node types that are currently implemented in
FreeBSD. All of these node types are documented in their corresponding man pages. 
 
  Echo node type: ng_echo(8)This node type accepts connections on any hook. Any data packets it receives are simply
    echoed back out the hook they came in on. Any non-generic control messages are likewise
    echoed back as replies. Discard node type: ng_disc(8)This node type accepts connections on any hook. Any data packets and control messages it
    receives are silently discarded. Tee node type: ng_tee(8)This node type is like a bidirectional version of the tee(1)utility. It
    makes a copy of all data passing through it in either direction (``right'' or ``left''),
    and is useful for debugging. Data packets arriving in ``right'' are sent out ``left'' and
    a copy is sent out ``right2left''; similarly for data packets going from ``left'' to
    ``right''. Packets received on ``right2left'' are sent out ``left'' and packets received
    on ``left2right'' are sent out ``right''.  Figure 2: Tee node type
Interface node type: ng_iface(8)This node type is both a netgraph node and a point-to-point system networking interface.
    It has (so far) three hooks, named ``inet'', ``atalk'', and ``ipx''. These hooks represent
    the protocol stacks for IP, AppleTalk, and IPX respectively. The first time you create an
    interface node, interface ng0 shows up in the output of ifconfig -a.
    You can then configure the interface with addresses like any other point-to-point
    interface, ping the remote side, etc. Of course, the node must be connected to something
    or else your ping packets will go out the inet hook and disappear.Unfortunately,
    FreeBSD currently cannot handle removing interfaces, so once you create an ng_iface(8)node, it remains persistent until the next reboot (however, this will be fixed soon).   Figure 3: Interface node type
TTY node type: ng_tty(8)This node type is both a netgraph node and an asynchronous serial line discipline (see tty(4)).
    You create the node by installing theNETGRAPHDISCline discipline on a
    serial line. The node has one hook called ``hook''. Packets received on ``hook'' are
    transmitted (as serial bytes) out the corresponding serial device; data received on the
    device is wrapped up into a packet and sent out ``hook''. Normal reads and writes to the
    serial device are disabled.  Figure 4: TTY node type
Socket node type: ng_socket(8)This node type is very important, because it allows user-mode programs to participate in
    the netgraph system. Each node is both a netgraph node and a pair of sockets in the family
    PF_NETGRAPH. The node is created when a user-mode program creates the
    corresponding sockets via thesocket(2)system call. One socket is used for
    transmitting and receiving netgraph data packets, while the other is used for control
    messages. The node supports hooks with arbitrary names, e.g. ``hook1,'' ``hook2,'' etc.  Figure 5: Socket node type
BPF node type: ng_bpf(8)This node type performs bpf(4)pattern matching and filtering on packets as
    they flow through it.Ksocket node type: ng_ksocket(8)This node type is the reverse of ng_socket(8). Each node is both a node and
    a socket that is completely contained in the kernel. Data received by the node is written
    to the socket, and vice-versa. The normalbind(2),connect(2),
    etc. operations are effected instead using control messages. This node type is useful for
    tunneling netgraph data packets within a socket connection (for example, tunneling IP over
    UDP).Ethernet node type: ng_ether(8)If you compiled your kernel with options NETGRAPH, then every Ethernet
    interface is also a netgraph node with the same name as the interface. Each Ethernet node
    has two hooks, ``orphans'' and ``divert''; only one hook may be connected at a time. If
    ``orphans'' is connected, the device continues to work normally, except that all received
    Ethernet packets that have an unknown or unsupported Ethertype are delivered out that hook
    (normally these frames would simply be discarded). When the ``divert'' hook is connected,
    then all incoming packets are delivered out this hook. Packets received on either
    of these hooks are transmitted on the wire. All packets are raw Ethernet frames with the
    standard 14 byte header (but no checksum). This node type is used, for example, for PPP
    over Ethernet (PPPoE).Synchronous drivers: ar(4)andsr(4)If you compiled your kernel with options NETGRAPH, thear(4)andsr(4)drivers will have their normal functionality disabled and instead
    will operate as simple persistent netgraph nodes (with the same name as the device
    itself). Raw HDLC frames can be read from and written to the ``rawdata'' hook. Meta information In some cases, a data packet may have associated meta-information which needs
to be carried along with the packet. Though rarely used so far, netgraph provides a
mechanism to do this. An example of meta-information is priority information: some packets
may have higher priority than others. Node types may define their own type-specific
meta-information, and netgraph defines a struct ng_metafor this purpose.
Meta-information is treated as opaque information by the base netgraph system. Every netgraph node is addressable via an ASCII string called a node address or path.
Node addresses are used exclusively for sending control messages.  Many nodes have names. For example, a node associated with a device will
typically give itself the same name as the device. When a node has a name, it can always
be addressed using the absolute address consisting of the name followed by a colon.
For example, if you create an interface node named ``ng0'' it's address will be ``ng0:''. If a node does not have a name, you can construct one from the node's unique ID
number by enclosing the number in square brackets (every node has a unique ID number).
So if node ng0:has ID number 1234, then ``[1234]:'' is also a
valid address for that node. Finally, the address ``.:'' or ``.'' always refers to the
local (source) node. Relative addressing is also possible in netgraph when two nodes are indirectly
connected. A relative address uses the names of consecutive hooks on the path from the
source node to the target node. Consider this picture:    Figure 6: Sample node configuration
 If node1 wants to send a control message to node2, it can use the address
``.:hook1a'' or simply ``hook1a''. To address node3, it
could use the address ``.:hook1a.hook2b'' or just ``hook1a.hook2b''.
Conversely, node3 could address node1 using the address ``.:hook3a.hook2a''
or just ``hook3a.hook2a''. Relative and absolute addressing can be combined, e.g., ``node1:hook1a.hook2b''
refers to node3. Part II: Using Netgraph Netgraph comes with command line utilities and a user library that allow interaction
with the kernel netgraph system. Root privileges are required in order to perform netgraph
operations from user-land.  From the command line There are two command line utilities for interacting with netgraph, nghook(8)andngctl(8).nghook(8)is fairly simple: it connects to any
unconnected hook of any existing node and lets you transmit and receive data packets via
standard input and standard output. The output can optionally be decoded into human
readable hex/ASCII format. On the command line you supply the node's absolute address and
the hook name. For example, if your kernel was compiled with options NETGRAPHand you
have an Ethernet interfacefxp0, this command will redirect all packets
received by the Ethernet card and dump them to standard output in hex/ASCII format:       nghook -a fxp0: divert
 The ngctl(8)is a more elaborate program that allows you to do most things
possible in netgraph from the command line. It works in batch or interactive mode, and
supports several commands that do interesting work, among them: 
  
    
      | connect | Connects a pair of hooks to join two nodes |  
      | list | List all nodes in the system |  
      | mkpeer | Create and connect a new node to an existing node |  
      | msg | Send an ASCII formatted message to a node |  
      | name | Assign a name to a node |  
      | rmhook | Disconnect two hooks that are joined together |  
      | show | Show information about a node |  
      | shutdown | Remove/reset a node, breaking all connections |  
      | status | Get human readable status from a node |  
      | types | Show all currently installed node types |  
      | quit | Exit program |  These commands can be combined into a script that does something useful. For example,
suppose you have two private networks that are separated but both connected to the
Internet via an address translating FreeBSD machine. Network A has internal address range
192.168.1.0/24 and external IP address 1.1.1.1, while network B has internal address range
192.168.2.0/24 and external IP address 2.2.2.2. Using netgraph you can easily set up a UDP
tunnel for IP traffic between your two private networks. Here is a simple script that
would do this (this script is also found in /usr/share/examples/netgraph): 
  #!/bin/sh
# This script sets up a virtual point-to-point WAN link between
# two subnets, using UDP packets as the ``WAN connection.''
# The two subnets might be non-routable addresses behind a
# firewall.
#
# Here define the local and remote inside networks as well
# as the local and remote outside IP addresses and the UDP
# port number that will be used for the tunnel.
#
LOC_INTERIOR_IP=192.168.1.1
LOC_EXTERIOR_IP=1.1.1.1
REM_INTERIOR_IP=192.168.2.1
REM_EXTERIOR_IP=2.2.2.2
REM_INSIDE_NET=192.168.2.0
UDP_TUNNEL_PORT=4028
# Create the interface node ``ng0'' if it doesn't exist already,
# otherwise just make sure it's not connected to anything.
#
if ifconfig ng0 >/dev/null 2>&1; then
	ifconfig ng0 inet down delete >/dev/null 2>&1
	ngctl shutdown ng0:
else
	ngctl mkpeer iface dummy inet
fi
# Attach a UDP socket to the ``inet'' hook of the interface node
# using the ng_ksocket(8) node type.
#
ngctl mkpeer ng0: ksocket inet inet/dgram/udp
# Bind the UDP socket to the local external IP address and port
#
ngctl msg ng0:inet bind inet/${LOC_EXTERIOR_IP}:${UDP_TUNNEL_PORT}
# Connect the UDP socket to the peer's external IP address and port
#
ngctl msg ng0:inet connect inet/${REM_EXTERIOR_IP}:${UDP_TUNNEL_PORT}
# Configure the point-to-point interface
#
ifconfig ng0 ${LOC_INTERIOR_IP} ${REM_INTERIOR_IP}
# Add a route to the peer's interior network via the tunnel
#
route add ${REM_INSIDE_NET} ${REM_INTERIOR_IP}
 Here is an example of playing around with ngctl(8)in interactive mode.
User input is shown in blue. Start up ngctlin interactive mode. It lists the available commands... 
  $ ngctl
Available commands:
  connect    Connects hook <peerhook> of the node at <relpath> to <hook>
  debug      Get/set debugging verbosity level
  help       Show command summary or get more help on a specific command
  list       Show information about all nodes
  mkpeer     Create and connect a new node to the node at "path"
  msg        Send a netgraph control message to the node at "path"
  name       Assign name <name> to the node at <path>
  read       Read and execute commands from a file
  rmhook     Disconnect hook "hook" of the node at "path"
  show       Show information about the node at <path>
  shutdown   Shutdown the node at <path>
  status     Get human readable status information from the node at <path>
  types      Show information about all installed node types
  quit       Exit program
 
ngctl creates a ng_socket(8)type node when it starts. This is our
local netgraph node which is used to interact with other nodes in the system. Let's look
at it. We see that it has a name ``ngctl652'' assigned to it byngctl, it is
of type ``socket,'' it has ID number 45, and has zero connected hooks, i.e., it's not
connected to any other nodes... 
  + show .
  Name: ngctl652        Type: socket          ID: 00000045   Num hooks: 0
 Now we will create and attach a ``tee'' node to our local node. We will connect the
``right'' hook of the tee node to a hook named ``myhook'' on our local node. We can use
any name for our hook that we want to, as ng_socket(8)nodes support
arbitrarily named hooks. After doing this, we inspect our local node again to see that it
has an unnamed ``tee'' neighbor... 
  + help mkpeer
Usage:    mkpeer [path] <type> <hook> <peerhook>
Summary:  Create and connect a new node to the node at "path"
Description:
  The mkpeer command atomically creates a new node of type "type" 
  and connects it to the node at "path". The hooks used for the 
  connection are "hook" on the original node and "peerhook" on 
  the new node. If "path" is omitted then "." is assumed.
+ mkpeer . tee myhook right
+ show .
  Name: ngctl652        Type: socket          ID: 00000045   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  myhook          <unnamed>       tee          00000046        right
 Similarly, if we check the tee node, we see that it has our local node as it's neighbor
connected to the ``right'' hook. The ``tee'' node is still an unnamed node. However we
could always refer to it using the absolute address ``[46]:'' or the relative
addresses ``.:myhook'' or ``myhook''... 
  + show .:myhook
  Name: <unnamed>       Type: tee             ID: 00000046   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  right           ngctl652        socket       00000045        myhook         
 Now let's assign our tee node a name and make sure that we can refer to it that way...  
  + name .:myhook mytee
+ show mytee:
  Name: mytee           Type: tee             ID: 00000046   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  right           ngctl652        socket       00000045        myhook         
 Now let's connect a Cisco HDLC node to the other side of the ``tee'' node and inspect
the ``tee'' node again. We are connecting to the ``downstream'' hook of the Cisco HDLC
node, so it will act like the tee node is the WAN connection. The Cisco HDLC is to the
``left'' of the tee node while our local node is to the ``right'' of the tee node...  
  + mkpeer mytee: cisco left downstream
+ show mytee:
  Name: mytee           Type: tee             ID: 00000046   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  left            <unnamed>       cisco        00000047        downstream
  right           ngctl652        socket       00000045        myhook         
+ 
Rec'd data packet on hook "myhook":
0000:  8f 00 80 35 00 00 00 02 00 00 00 00 00 00 00 00  ...5............
0010:  ff ff 00 20 8c 08 40 00                          ... ..@.        
+ 
Rec'd data packet on hook "myhook":
0000:  8f 00 80 35 00 00 00 02 00 00 00 00 00 00 00 00  ...5............
0010:  ff ff 00 20 b3 18 00 17                          ... ....        
 Hey, what's that?! It looks like we received some data packets on our ``myhook'' hook.
The Cisco node is generating periodic keep-alive packets every 10 seconds. These packets
are passing through the tee node (from ``left'' to ``right'') and ending up being received
on ``myhook'', where ngctlis displaying them on the console. Now let's take inventory of all the nodes currently in the system. Note that our two
Ethernet interfaces show up as well, because they are persistent nodes and we compiled our
kernel with options NETGRAPH... 
  + list
There are 5 total nodes:
  Name: <unnamed>       Type: cisco           ID: 00000047   Num hooks: 1
  Name: mytee           Type: tee             ID: 00000046   Num hooks: 2
  Name: ngctl652        Type: socket          ID: 00000045   Num hooks: 1
  Name: fxp1            Type: ether           ID: 00000002   Num hooks: 0
  Name: fxp0            Type: ether           ID: 00000001   Num hooks: 0
+ 
Rec'd data packet on hook "myhook":
0000:  8f 00 80 35 00 00 00 02 00 00 00 00 00 00 00 00  ...5............
0010:  ff ff 00 22 4d 40 40 00                          ..."M@@.        
 OK, let's shutdown (i.e., delete) the Cisco HDLC node so we'll stop receiving that
data...  
  + shutdown mytee:left
+ show mytee:
  Name: mytee           Type: tee             ID: 00000046   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  right           ngctl652        socket       00000045        myhook         
 Now let's get the statistics from the tee node. Here we send it a control message and
it sends back an immediate reply. The command and reply are converted to/from ASCII
automatically for us by ngctl, as control messages are binary structures...  
  + help msg
Usage:    msg path command [args ... ]
Aliases:  cmd
Summary:  Send a netgraph control message to the node at "path"
Description:
  The msg command constructs a netgraph control message from the 
  command name and ASCII arguments (if any) and sends that 
  message to the node.  It does this by first asking the node to 
  convert the ASCII message into binary format, and re-sending the 
  result. The typecookie used for the message is assumed to be 
  the typecookie corresponding to the target node's type.
+ msg mytee: getstats
Rec'd response "getstats" (1) from "mytee:":
Args:   { right={ outOctets=72 outFrames=3 } left={ inOctets=72 inFrames=3 }
  left2right={ outOctets=72 outFrames=3 } }
 The reply is simply an ASCII version of the struct ng_tee_statsreturned
in the control message reply (this structure is defined inng_tee.h).
We see that three frames (and 72 octets) passed through the tee node from left to right.
Each frame was duplicated and passed out the ``left2right'' hook (but since this hook was
not connected those duplicates were dropped). OK, now let's play with a ng_ksocket(8)node... 
  + mkpeer ksocket myhook2 inet/stream/tcp
+ msg .:myhook2 connect inet/127.0.0.1:13
ngctl: send msg: Operation now in progress
Rec'd data packet on hook "myhook":
0000:  54 75 65 20 46 65 62 20 20 31 20 31 31 3a 30 32  Tue Feb  1 11:02
0010:  3a 32 38 20 32 30 30 30 0d 0a                    :28 2000..      
 Here we created a TCP socket in the kernel using a ng_ksocket(8)node and
connected it to the ``daytime'' service on the local machine, which spits out the current
time. How did we know we could use ``inet/127.0.0.1:13'' as an argument to the ``connect''
command? It's documented in theng_ksocket(8)man page. OK, enough playing...  
  + quit
 libnetgraph(3) There is also a user library libnetgraph(3)for use by netgraph programs.
It supplies many useful routines which are documented in the man page. See the source code
in/usr/src/usr.sbin/ngctlfor an example of using it. Part III: The Implementation Functional nature How is netgraph implemented? One of the main goals of netgraph is speed, which
is why it runs entirely in the kernel. Another design decision is that netgraph is
entirely functional. That is, no queuing is involved as packets traverse from node to
node. Instead, direct function calls are used. Data packets are packet header mbuf's,
while meta-data and control messages are heap-allocated C structures (using malloc type M_NETGRAPH). Object oriented nature Netgraph is somewhat object-oriented in its design. Each node type is defined by
an array of pointers to the methods, or C functions, that implement the specific
behavior of nodes of that type. Each method may be left NULLto fall back to
the default behavior. Similarly, there are some control messages that are understood by all node types and
which are handled by the base system (these are called generic control messages).
Each node type may in addition define its own type-specific control messages. Control
messages always contain a typecookie and a command, which together identify how to
interpret the message. Each node type must define its own unique typecookie if it wishes
to receive type-specific control messages. The generic control messages have a predefined
typecookie.  Memory Netgraph uses reference counting for node and hook structures. Each pointer to a node
or a hook should count for one reference. If a node has a name, that also counts as a
reference. All netgraph-related heap memory is allocated and free'd using malloc type M_NETGRAPH. Synchronization Running in the kernel requires attention to synchronization. Netgraph nodes normally
run at splnet()(seespl(9)). For most node types, no
special attention is necessary. Some nodes, however, interact with other parts of the
kernel that run at different priority levels. For example, serial ports run atspltty()and song_tty(8)needs to deal with this. For these cases netgraph provides
alternate data transmission routines that handle all the necessary queuing auto-magically
(seeng_queue_data()below). How to implement a node type To implement a new node type, you only need to do two things: 
 
  Define a struct ng_type.Link it in using the NETGRAPH_INIT()macro. Step 2 is easy, so we'll focus on step 1. Here is struct ng_type, taken
fromnetgraph.h: 
  /*
 * Structure of a node type
 */
struct ng_type {
    u_int32_t       version;        /* must equal NG_VERSION */
    const char      *name;          /* Unique type name */
    modeventhand_t  mod_event;      /* Module event handler (optional) */
    ng_constructor_t *constructor;  /* Node constructor */
    ng_rcvmsg_t     *rcvmsg;        /* control messages come here */
    ng_shutdown_t   *shutdown;      /* reset, and free resources */
    ng_newhook_t    *newhook;       /* first notification of new hook */
    ng_findhook_t   *findhook;      /* only if you have lots of hooks */
    ng_connect_t    *connect;       /* final notification of new hook */
    ng_rcvdata_t    *rcvdata;       /* date comes here */
    ng_rcvdata_t    *rcvdataq;      /* or here if being queued */
    ng_disconnect_t *disconnect;    /* notify on disconnect */
    const struct    ng_cmdlist *cmdlist;    /* commands we can convert */
    /* R/W data private to the base netgraph code DON'T TOUCH! */
    LIST_ENTRY(ng_type) types;              /* linked list of all types */
    int                 refs;               /* number of instances */
};
 The versionfield should be equal toNG_VERSION. This is to
prevent linking in incompatible types. Thenameis the unique node type name,
e.g., ``tee''. Themod_eventis an optional module event handler (for when
the node type is loaded and unloaded) -- similar to a static initializer in C++ or Java. Next are the node type methods, described in detail below. The cmdlistprovides (optional) information for converting control messages to/from ASCII (see below), and the rest is private to the base netgraph code. Node type methods Each node type must implement these methods, defined in its struct ng_type.
Each method has a default implementation, which is used if the node type doesn't define
one. 
  int constructor(node_p *node);Purpose: Initialize a new node by calling ng_make_node_common()and
    settingnode->privateif appropriate. Per-node initialization and memory
    allocation should happen here.ng_make_node_common()should be called first;
    it creates the node and sets the reference count to one.Default action: Just
    calls ng_make_node_common(). When to override: If you require node-specific initialization or resource
    allocation. int rcvmsg(node_p node, struct ng_mesg *msg,const char *retaddr, struct ng_mesg **resp);
Purpose: Receive and handle a control message. The address of the sender is in retaddr.
    Thercvmsg()function is responsible for freeingmsg. The
    response, if any, may be returned synchronously ifresp != NULLby setting*respto point to it. Generic control messages (except forNGM_TEXT_STATUS) are
    handled by the base system and need not be handled here.Default action: Handle
    all generic control messages; otherwise returns EINVAL. When to override: If you define any type-specific control messages, or you want
    to implement control messages defined by some other node type. int shutdown(node_p node);Purpose: Shutdown the node. Should disconnect all hooks by calling ng_cutlinks(),
    free all private per-node memory, release the assigned name (if any) viang_unname(),
    and release the node itself by callingng_unref()(this call releases the
    reference added byng_make_node_common()).In the case of persistent
    nodes, all hooks should be disconnected and the associated device (or whatever) reset, but
    the node should not be removed (i.e., only call ng_cutlinks()). Default action: Calls ng_cutlinks(),ng_unname(), andng_unref(). When to override: When you need to undo the stuff you did in the constructor
    method. int newhook(node_p node, hook_p hook, const char *name);Purpose: Validate the connection of a hook and initialize any per-hook resources.
    The node should verify that the hook name is in fact one of the hook names supported by
    this node type. The uniqueness of the name will have already been verified (but it doesn't
    hurt to double-check). If the hook requires per-hook information, this method should
    initialize hook->privateaccordingly. Default action: Does nothing; the hook connection is always accepted.  When to override: Always, unless you plan to allow arbitrarily named hooks, have
    no per-hook initialization or resource allocation, and treat all hooks the same upon
    connection. hook_p findhook(node_p node, const char *name);Purpose: Find a connected hook on this node. It is not necessary to override this
    method unless the node supports a large number of hooks, where a linear search would be
    too slow. Default action: Performs a linear search through the list of hooks
    connected to this node.  When to override: When your node supports a large number of simultaneously
    connected hooks (say, more than 50). int connect(hook_p hook);Purpose: Final verification of hook connection. This method gives the node a last
    chance to validate a newly connected hook. For example, the node may actually care who
    it's connected to. If this method returns an error, the connection is aborted. Default
    action: Does nothing; the hook connection is accepted.  When to override: I've never had an occasion to override this method. int rcvdata(hook_p hook, struct mbuf *m, meta_p meta);Purpose: Receive an incoming data packet on a connected hook. The node is
    responsible for freeing the mbuf if it returns an error, or wishes to discard the data
    packet. Although not currently the case, in the future it could be that sometimes m
    == NULL(for example, if there is only ametato be sent), so node
    types should handle this possibility.Default action: Drops the data packet and
    meta-information.  When to override: Always, unless you intend to discard all received data
    packets. int rcvdataq(hook_p hook, struct mbuf *m, meta_p meta);Purpose: Queue an incoming data packet for reception on a connected hook. The
    node is responsible for freeing the mbuf if it returns an error, or wishes to discard the
    data packet. The intention here is that some nodes may want to send data using a
    queuing mechanism instead of a functional mechanism. This requires cooperation of the
    receiving node type, which must implement this method in order for it to do anything
    different from rcvdata(). Default action: Calls the rcvdata()method. When to override: Never, unless you have a reason to treat incoming ``queue''
    data differently from incoming ``non-queue'' data. int disconnect(hook_p hook);Purpose: Notification to the node that a hook is being disconnected. The node
    should release any per-hook resources allocated during connect().Although
    this function returns int, it should really returnvoidbecause
    the return value is ignored; hook disconnection cannot be blocked by a node. This function should check whether the last hook has been disconnected (hook->node->numhooks
    == 0) and if so, callng_rmnode()to self-destruct, as is the custom.
    This helps avoid completely unconnected nodes that linger around in the system after their
    job is finished. Default action: Does nothing.  When to override: Almost always. int mod_event(module_t mod, int what, void *arg);Purpose: Handle the events of loading and unloading the node type. Note that both
    events are handled through this one method, distinguished by whatbeing
    eitherMOD_LOADorMOD_UNLOAD. Theargparameter is
    a pointer to thestruct ng_typedefining the node type.This method will
    never be called for MOD_UNLOADwhen there are any nodes of this type
    currently in existence. Currently, netgraph will only ever try to MOD_UNLOADa node type whenkldunload(2)is explicitly called. However, in the future more proactive unloading of node types may be
    implemented as a ``garbage collection'' measure. Default action: Does nothing. If not overridden, MOD_LOADandMOD_UNLOADwill succeed normally. When to override: If your type needs to do any type-specific initialization or
    resource allocation upon loading, or undo any of that upon unloading. Also, if your type
    does not support unloading (perhaps because of unbreakable associations with other parts
    of the kernel) then returning an error in the MOD_UNLOADcase will prevent
    the type from being unloaded. Netgraph header files There are two header files all node types include. The netgraph.hheader file defines the basic netgraph structures (good object-oriented design would
dictate that the definitions ofstruct ng_nodeandstruct ng_hookreally don't belong here; instead, they should be private to the base netgraph code). Node
structures are freed when the reference counter drops to zero after a call tong_unref().
If a node has a name, that counts as a reference; to remove the name (and the reference),
callng_unname(). Of particular interest isstruct ng_type,
since every node type must supply one of these. The ng_message.hheader file defines structures
and macros relevant to handling control messages. It defines thestruct ng_mesgwhich every control message has as a prefix. It also serves as the ``public header file''
for all of the generic control messages, which all have typecookieNGM_GENERIC_COOKIE.
The following summarizes the generic control messages: 
  
    
      | NGM_SHUTDOWN | Disconnect all target node hooks and remove the node (or just reset if persistent) |  
      | NGM_MKPEER | Create a new node and connect to it |  
      | NGM_CONNECT | Connect a target node's hook to another node |  
      | NGM_NAME | Assign the target node a name |  
      | NGM_RMHOOK | Break a connection between the target node and another node |  
      | NGM_NODEINFO | Get information about the target node |  
      | NGM_LISTHOOKS | Get a list of all connected hooks on the target node |  
      | NGM_LISTNAMES | Get a list of all named nodes * |  
      | NGM_LISTNODES | Get a list of all nodes, named and unnamed * |  
      | NGM_LISTTYPES | Get a list of all installed node types * |  
      | NGM_TEXT_STATUS | Get a human readable status report from the target node (optional) |  
      | NGM_BINARY2ASCII | Convert a control message from binary to ASCII |  
      | NGM_ASCII2BINARY | Convert a control message from ASCII to binary |  
      | * Not node specific |  For most of these commands, there are corresponding C structure(s) defined in ng_message.h. The netgraph.handng_message.hheader files also define several commonly used functions and macros: 
  int ng_send_data(hook_p hook, struct mbuf *m, meta_p meta);What it does: Delivers the mbuf mand associated meta-datametaout the hookhookand setserrorto the resulting error code.
    Either or both ofmandmetamay beNULL. In all
    cases, the responsibility for freeingmandmetais lifted when
    this functions is called (even if there is an error), so these variables should be set toNULLafter the call (this is done automatically if you use theNG_SEND_DATA()macro instead).int ng_send_dataq(hook_p hook, struct mbuf *m, meta_p meta);What it does: Same as ng_send_data(), except the recipient node
    receives the data via itsrcvdataq()method instead of itsrcvdata()method. If the node type does not overridercvdataq(), then calling this is
    equivalent to callingng_send_data().int ng_queue_data(hook_p hook, struct mbuf *m, meta_p
    meta);What it does: Same as ng_send_data(), except this is safe to call
    from a non-splnet()context. The mbuf and meta-information will be queued and
    delivered later atsplnet().int ng_send_msg(node_p here, struct ng_mesg *msg,const char *address, struct ng_mesg **resp);
What it does: Sends the netgraph control message pointed to by msgfrom the local nodehereto the node found ataddress, which may
    be an absolute or relative address. Ifrespis non-NULL, and the
    recipient node wishes to return a synchronous reply, it will set*respto
    point at it. In this case, it is the calling node's responsibility to process and free*resp.int ng_queue_msg(node_p here, struct ng_mesg *msg, const char *address);What it does: Same as ng_send_msg(), except this is safe to call
    from a non-splnet()context. The message will be queued and delivered later
    atsplnet(). No synchronous reply is possible.NG_SEND_DATA(error, hook, m, meta)What it does: Slightly safer version of ng_send_data(). This simply
    callsng_send_data()and then setsmandmetatoNULL.
    Either or both ofmandmetamay beNULL, though
    they must be actual variables (they can't be the constantNULLdue to the way
    the macro works).NG_SEND_DATAQ(error, hook, m, meta)What it does: Slightly safer version of ng_send_dataq(). This simply
    callsng_send_dataq()and then setsmandmetatoNULL.
    Either or both ofmandmetamay beNULL, though
    they must be actual variables (they can't be the constantNULLdue to the way
    the macro works).NG_FREE_DATA(m, meta)What it does: Frees mandmetaand sets them toNULL.
    Either or both ofmandmetamay beNULL, though
    they must be actual variables (they can't be the constantNULLdue to the way
    the macro works).NG_FREE_META(meta)What it does: Frees metaand sets it toNULL.metamay beNULL, though it must be an actual variable (it can't be the constantNULLdue to the way the macro works).NG_MKMESSAGE(msg, cookie, cmdid, len, how)What it does: Allocates and initializes a new netgraph control message with lenbytes of argument space (lenshould be zero if there are no arguments).msgshould be of typestruct ng_mesg *. Thecookieandcmdidare the message typecookie and command ID.howis one ofM_WAITorM_NOWAIT(it's safer to useM_NOWAIT).Sets msgto NULL if memory allocation fails. Initializes the message token to zero.NG_MKRESPONSE(rsp, msg, len, how)What it does: Allocates and initializes a new netgraph control message that is
    intended to be a response to msg. The response will havelenbytes of argument space (lenshould be zero if there are no arguments).msgshould be a pointer to an existingstruct ng_mesgwhilerspshould be of typestruct ng_mesg *.howis one ofM_WAITorM_NOWAIT(it's safer to useM_NOWAIT).Sets rspto NULL if memory allocation fails.int ng_name_node(node_p node, const char *name);What it does: Assign the global name nameto nodenode.
    The name must be unique. This is often called from within node constructors for nodes that
    are associated with some other named kernel entity, e.g., a device or interface. Assigning
    a name to a node increments the node's reference count.void ng_cutlinks(node_p node);What it does: Breaks all hook connections for node. Typically this
    is called during node shutdown.void ng_unref(node_p node);What it does: Decrements a node's reference count, and frees the node if that
    count goes to zero. Typically this is called in the shutdown()method to
    release the reference created byng_make_node_common().void ng_unname(node_p node);What it does: Removes the global name assigned to the node and decrements the
    reference count. If the node does not have a name, this function has no effect. This
    should be called in the shutdown()method before freeing the node (viang_unref()). A real life example Enough theory, let's see an example. Here is the implementation of the tee node
type. As is the custom, the implementation consists of a public header file, a C file, and
a man page. The header file is ng_tee.hand the C file
isng_tee.c. Here are some things to notice about the header file: 
 Here are some things to notice about the C file: 
 
  Nodes typically store information private to the node or to each hook. For the ng_tee(8)node type, this information is stored in astruct privdatafor each node, and
    astruct hookdatafor each hook.The ng_tee_cmdsarray defines how to convert the type specific control
    messages from binary to ASCII and back. See below.The ng_tee_typestructat the beginning actually defines the node type for
    tee nodes. This structure contains the netgraph system version (to avoid
    incompatibilities), the unique type name (NG_ECHO_NODE_TYPE), pointers to the
    node type methods, and a pointer to theng_tee_cmdsarray. Some methods don't
    need to be overridden because the default behavior is sufficient.The NETGRAPH_INIT()macro is required to link in the type. This macro works
    whether the node type is compiled as a KLD or directly into the kernel (in this case,
    usingoptions NETGRAPH_TEE).Netgraph node structures (type struct ng_node) contain reference counts to
    ensure they get freed at the right time. A hidden side effect of callingng_make_node_common()in the node constructor is that one reference is created. This reference is released by
    theng_unref()call in the shutdown methodngt_rmnode().Also in ngt_rmnode()is a call tong_bypass(). This is a bit
    of a kludge that joins two edges by disconnecting the node in between them (in this case,
    the tee node).Note that in the function ngt_disconnect()the node destroys itself when
    the last hook is disconnected. This keeps nodes from lingering around after they have
    nothing left to do.No spl synchronization calls are necessary; the entire thing runs at splnet(). Netgraph provides an easy way to convert control messages (indeed, any C structure)
between binary and ASCII formats. A detailed explanation is beyond the scope of this
article, but here we'll give an overview.  Recall that control messages have a fixed header (struct ng_mesg) followed
by a variable length payload having arbitrary structure and contents. In addition, the
control message header contains a flag bit indicating whether the messages is a command or
a reply. Usually the payload will be structured differently in the command and the
response. For example, the ``tee'' node has aNGM_TEE_GET_STATScontrol
message. When sent as a command ((msg->header.flags & NGF_RESP) == 0),
the payload is empty. When sent as a response to a command ((msg->header.flags
& NGF_RESP) != 0), the payload contains astruct ng_tee_statsthat
contains the node statistics. So for each control message that a node type understands, the node type defines how to
convert the payload area of that control message (in both cases, command and response)
between its native binary representation and a human-readable ASCII version. These
definitions are called netgraph parse types.  The cmdlistfield in thestruct ng_typethat defines a node
type is a pointer to an array ofstruct ng_cmdlists. Each element in this
array corresponds to a type-specific control message understood by this node. Along with
the typecookie and command ID (which uniquely identify the control message), are an ASCII
name and two netgraph parse types that define how the payload area data is structured i.e.
one for each direction (command and response). Parse types are built up from the predefined parse types defined in ng_parse.h. Using these parse types, you can describe
any arbitrarily complicated C structure, even one containing variable length arrays and
strings. The ``tee'' node type has an example of doing this for thestruct
ng_tee_statsreturned by theNGM_TEE_GET_STATScontrol message (seeng_tee.handng_tee.c). You can also define your own parse types from scratch if necessary. For example, the
``ksocket'' node type contains special code for converting a struct sockaddrin the address familiesAF_INETandAF_LOCAL, to make them more
human friendly. The relevant code can be found inng_ksocket.handng_ksocket.c, specifically the section labeled
``STRUCT SOCKADDR PARSE TYPE''. Parse types are a convenient and efficient way to effect binary/ASCII conversion in the
kernel without a lot of manual parsing code and string manipulation. When performance is a
real issue, binary control messages can always be used directly to avoid any conversion.  The gory details about parse types are available in ng_parse.handng_parse.c. Programming gotcha's Some things to look out for if you plan on implementing your own netgraph node type: 
 Part IV: Future Directions Netgraph is still a work in progress, and contributors are welcome!
Here are some ideas for future work.
 Node types There are many node types yet to be written:
 
  A ``slip'' node type that implements the SLIP protocol. This should be pretty easy and
    may be done soon. More PPP compression and encryption nodes that can connect to a ng_ppp(8)node, e.g., PPP Deflate compression, PPP 3DES encryption, etc.An implementation of ipfw(4)as a netgraph node.An implementation of the Dynamic
    Packet Filter as a netgraph node. DPF is sort of a hyper-speed JIT compiling version
    of BPF. A generic ``mux'' node type, where each hook could be configured with a unique header to
    append/strip from data packets.  FreeBSD currently has four PPP implementations: sppp(4),pppd(8),ppp(8), and the MPD
port. This is pretty silly. Using netgraph, these can all be collapsed into a single
user-land daemon that handles all the configuration and negotiation, while routing all
data strictly in the kernel viang_ppp(8)nodes. This combines the
flexibility and configuration benefits of the user-land daemons with the speed of the
kernel implementations. Right now MPD is the only implementation that has been fully
``netgraphified'' but plans are in the works forppp(8)as well. Control message ASCII-fication Not all node types that define their own control messages support
conversion between binary and ASCII.  One project is to finish this
work for those nodes that still need it.
 Control flow One issue that may need addressing is control flow. Right now when you send a data
packet, if the ultimate recipient of that node can't handle it because of a full transmit
queue or something, all it can do is drop the packet and return ENOBUFS.
Perhaps we can define a new return codeESLOWDOWNor something that means
``data packet not dropped; queue full; slow down and try again later.'' Another
possibility would be to define meta-data types for the equivalents of XOFF (stop flow) and
XON (restart flow). Code cleanups Netgraph is somewhat object oriented, but could benefit from a more rigorous object
oriented design without suffering too much in performance. There are still too many
visible structure fields that shouldn't be accessible, etc., as well as other
miscellaneous code cleanups.  Also, all of the node type man pages (e.g., ng_tee(8))
really belong in section 4 rather than section 8. Electrocution It would be nice to have a new generic control message NGM_ELECTROCUTE,
which when sent to a node would shutdown that node as well as every node it was connected
to, and every node those nodes were connected to, etc. This would allow for a quick
cleanup of an arbitrarily complicated netgraph graph in a single blow. In addition, there
might be a new socket option (seesetsockopt(2)) that you could set on ang_socket(8)socket that would cause anNGM_ELECTROCUTEto be automatically generated when
the socket was closed. Together, these two features would lead to more reliable avoidance of netgraph ``node
leak.''  Infinite loop detection It would be easy to include ``infinite loop detection'' in the base netgraph code. That
is, each node would have a private counter. The counter would be incremented before each
call to a node's rcvdata()method, and decremented afterwards. If the counter
reached some insanely high value, then we've detected an infinite loop (and avoided a
kernel panic). New node types There are lots of new and improved node types that could be created,
for example:
 
  A routing node type. Each connected hook would correspond to a route destination, i.e.,
    an address and netmask combination. The routes would be managed via control messages. A stateful packet filtering/firewall/address translation node type (replacement for ipfw
    and/or ipfirewall) Node type for bandwidth limiting and/or bandwidth accounting Adding VLAN support to the existing Ethernet nodes.  If you really wanted to get crazy In theory, the BSD networking subsystem could be entirely replaced by netgraph. Of
course, this will probably never happen, but it makes for a nice thought experiment. Each
networking device would be a persistent netgraph node (like Ethernet devices are now). On
top of each Ethernet device node would be an ``Ethertype multiplexor.'' Connected to this
would be IP, ARP, IPX, AppleTalk, etc. nodes. The IP node would be a simple ``IP protocol
multiplexor'' node on top of which would sit TCP, UDP, etc. nodes. The TCP and UDP nodes
would in turn have socket-like nodes on top of them. Etc, etc.  Other crazy ideas (disclaimer: these are crazy ideas): 
 
  Make all devices appear as netgraph nodes. Convert between ioctl(2)'s
    and control messages. Talk directly to your SCSI disk withngctl(8)! Seamless
    integration between netgraph and DEVFS.A netgraph node that is also a VFS layer? A filesystem view of the space of netgraph
    nodes? If NFS can work over UDP, it can work over netgraph. You could have NFS disks remotely
    mounted via an ATM link, or simply do NFS over raw Ethernet and cut out the UDP middleman.
  A ``programmable'' node type whose implementation would depend on its configuration
    using some kind of node pseudo-code. Surely there are
lots more crazy ideas
we haven't thought of yet.  |  |