November 3, 2022

BSD TCP/IP for Kyu - Bind - accepting a connection

I am beginning to play with the TCP code. An easy thing to do is to use telnet on my linux machine to try to make a connection to my Kyu machine. I use something like telnet 192.168.0.138 13.

No doubt it will just reject the attempted connection unless I have done something to set up a server on the Kyu side, so I am interested in how "bind" works. Ultimately I will be looking at how BSD handled socket, bind, accept, and listen, but we will start with bind.

The man page for "bind" says that it binds a name to a socket. The call looks like: "bind ( socket, addr, addrlen )". bind() can be used for other protocol families besides AF_INET. Here is a quick skeleton of what an INET server would look like:

    struct sockaddr_in serv_addr;

    sockfd = socket ( AF_INET, SOCK_STREAM, 0 );
    bzero ( &serv_addr, sizeof(serv_addr) );
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = htonl ( INADDR_ANY );
    serv_addr.sin_port = htons ( MY_TCP_PORT );
    bind ( sockfd, &serv_addr, sizeof(serv_addr) );
    listen ( sockfd, 10 );
    for ( ;; ) {
	newsock = accept ( sockfd, addr, len );
    }

The system call is handled by kern/uipc_syscalls.c in sys_bind(). This calls "sobind()" in kern/uipc_socket.c. This makes the following call:

error = (*so->so_proto->pr_usrreq)(so, PRU_BIND, (struct mbuf *)0, nam, (struct mbuf *)0, p);
Before chasing down that call, let's investigate "nam", which is an mbuf with interesting stuff. This seems to be generated back in sys_bind() by the following call:
sockargs(&nam, SCARG(uap, name), SCARG(uap, namelen), MT_SONAME);
This is also in uipc_syscalls() with an interesting comment about passing around socket control arguments in mbufs. It is simple and just copies the user argument into an mbuf.

The call using the pr_usrreq pointer takes us to the file tcp_usrreq.c and tcp_usrreq(). This is a long function that is a big switch on the second argument ("req", which is PRU_BIND). This calls in_pcbbind ( inp, nam, p ). Here "inp" is a struct inpcb. This comes from:

inp = sotoinpcb(so);
The function in_pcbbind() is a good sized routine in in_pcb.c

It ends up setting inp_lport via a pointer "inp" to "struct inpcb *inp" which is received as the first argument. (from sotoinpcb() above). Let's look at struct inpcb (see netinet/in_pcb.h). Chapter 22 in the book is devoted to this.

struct inpcb {
        LIST_ENTRY(inpcb) inp_hash; 
        CIRCLEQ_ENTRY(inpcb) inp_queue;
        struct    inpcbtable *inp_table;
        int       inp_state;            /* bind/connect state */
        u_int16_t inp_fport;            /* foreign port */
        u_int16_t inp_lport;            /* local port */
        struct    socket *inp_socket;   /* back pointer to socket */
        caddr_t   inp_ppcb;             /* pointer to per-protocol pcb */
        struct    route inp_route;      /* placeholder for routing entry */
        int       inp_flags;            /* generic IP/datagram flags */
        struct    ip inp_ip;            /* header prototype; should have more */
        struct    mbuf *inp_options;    /* IP options */
        struct    ip_moptions *inp_moptions; /* IP multicast options */
        int       inp_errormtu;         /* MTU of last xmit status = EMSGSIZE */
};
#define inp_faddr       inp_ip.ip_dst
#define inp_laddr       inp_ip.ip_src
The incpb is set up by the socket (or accept) call. It holds everything needed for UDP, but a TCP socket (STREAM) also requires a tcp control block (struct tcpcb). This would also be set up by the socket call.

The function in_pcbbind() attempts to look up an existing inpcb structure using the in_pcblookup_port() function (and if it finds one already set up with the requested port, it returns an error. It also checks for reserved ports against IPPORT_RESERVED (set to 1024 in netinet/in.h)

What about sotoinpcb()? It is a macro in netinet/in_pcb.h

#define sotoinpcb(so)           ((struct inpcb *)(so)->so_pcb)
But hold on -- let's look at a sockaddr_in structure (see netinet/in.h):
struct sockaddr_in {
        u_int8_t  sin_len;
        u_int8_t  sin_family;
        u_int16_t sin_port;
        struct    in_addr sin_addr;
        int8_t    sin_zero[8];
};

struct in_addr {
        u_int32_t s_addr;
};
So, all told this is 16 bytes, which is worth keeping in mind. The final 8 bytes look like padding so that this address is the same size as addresses for other protocol families (consider IPv6 for example).

So, what goes on in tcp_input() ?

The interest here is how this function decides if TCP has a bind set up and is listening on a given port. There is a list of inpcb structures, with "tcb" being the head of the list. The protocol maintains a one entry "hint" called tcp_last_inpcb. It checks this to see if it corresponds to a received packet. If not, it calls in_pcblookup() to scan the entire list to find a match. If it finds one, the match goes into tcp_last_inpcb for the next time. If the search fails, the protocol will drop the packet, but it does this via "dropwithreset" which will send a RST.

Note that the tcpcp structure is pointed to by the inpcb. The tcp structures do not form a linked list of their own, they are effectively extensions of the inpcb structure. This all makes things simpler than one might have feared. The code looks like this:

struct  inpcb *tcp_last_inpcb = &tcb;

findpcb:
        inp = tcp_last_inpcb;
        if (inp->inp_lport != ti->ti_dport ||
            inp->inp_fport != ti->ti_sport ||
            inp->inp_faddr.s_addr != ti->ti_src.s_addr ||
            inp->inp_laddr.s_addr != ti->ti_dst.s_addr) {
                inp = in_pcblookup(&tcb, ti->ti_src, ti->ti_sport,
                    ti->ti_dst, ti->ti_dport, INPLOOKUP_WILDCARD);
                if (inp)
                        tcp_last_inpcb = inp;
                ++tcpstat.tcps_pcbcachemiss;
        }

        if (inp == 0)
                goto dropwithreset;
        tp = intotcpcb(inp);
        if (tp == 0)
                goto dropwithreset;


Have any comments? Questions? Drop me a line!

Kyu / [email protected]