January 7, 2023

BSD TCP/IP for Kyu - Testing on the BBB

I have done all this development (over the past 3 months) on a little Nano Pi Neo board. I often call it an Orange Pi because it uses the Allwinner H3 chip, just like some of the Orange Pi boards. Now that I am feeling good about the testing and how everything is working there, I thought it would be interesting to test it on the BBB (Beaglebone Black). I haven't actually run Kyu on the BBB for about 4 years, so there is every chance that Kyu won't build the first time (and it doesn't).

Set up a BBB to run Kyu

I have a fair sized brown cardboard box labeled "BBB" that should have all the hardware and a board all set up and ready to go. Let's see. Nope, things have gotten scattered, it will take a bit of work.

Power - we need a source of 5 volt power. I find a 5V, 2A supply with a "fat" coaxial connector that plugs into the BBB.

Serial console - we need a USB to serial dongle with 3.3 volt levels (everything I have uses 3.3 volts, so this is not much of an issue). The pin connections are unique though. We have 6 pins, but only need to connect to 3 of them. If you hold the board so you can read "J1", then pin 1 is to the left and marked with a dot. The pins (L to R) are:

Gnd - - Rx Tx -
Somewhere I have a cable all set up for this, but it may as well be on Mars since it isn't in the box marked "BBB". I have a bag with 10 or so SIL2104 USB to serial gizmos, and with short jumper wires provided to boot. Linux detects this as a CP2104 device and sets it up as ttyUSB0, so this ought to work just fine (and it does).

On the BBB I connect brown to Gnd, Red to Rx, Green to Tx. We connect this to my dongle (but Red to Tx and Green to Rx).

I connect a network cable also, apply power, and it boots Xinu!

Fiddle with U-Boot

This is my unit "C" which is labeled (by me) as booting Xinu and "running slow". The "running slow" business has to do with U-Boot not enabling caches. Later versions of U-Boot do enable the cache and run properly. Of course Kyu ought to enable the caches rather than relying on U-Boot, but apparently, just like Xinu, it simply doesn't. But we are getting ahead of ourselves.

This version of U-Boot identifies itself as:

U-Boot 2013.04-dirty (Jul 10 2013 - 14:02:53)
I cycle power and hit a key and I am at the U-Boot prompt. There is nothing here about booting Xinu. But I remember how this was done back in the day.

This board is now booting from eMMC (though it could boot from an SD card). The way I handled this in the past was that I hacked on the file /uEnv.txt inside the filesystem. In nicer systems, I can modify the U-Boot environment variables and then use "saveenv", but on the BBB, this gives me:

Saving Environment to NAND...
Erasing Nand...
Attempt to erase non block-aligned data
Happily this has not bricked my board or even violated the setup to boot Xinu. The root problem is that the BBB doesn't have NAND and U-Boot is not set up to save environment settings on the emmc.

Boot "Angstrom rescue"

I dig through my collection of various SD cards for various systems and find one labeled "BBB Angstrom rescue". I stick in in the SD slot, cycle power, and it boots! No holding down buttons or anything. I type "root" and I am in with no password. Typing "mount" shows:
/dev/mmcblk0p1 on /media/BEAGLE_BONE type vfat ...
/dev/mmcblk1p2 on /media/Angstrom type ext4 (rw ...
/dev/mmcblk1p1 on /media/BEAGLEBONE type vfat (rw,nosuid ...
I do this:
cd /media/BEAGLEBONE
cat uEnv.txt

optargs=quiet
uenvcmdx=echo Booting via tftp (Xinu); setenv saloadaddr 0x81000000; setenv ipad
dr 192.168.0.54; setenv serverip 192.168.0.5; tftpboot ${saloadaddr} xinu.bin; g
o ${saloadaddr}
uenvcmd=run uenvcmdx
So, this is the file I would like to overhaul. One problem is that host "54" is now one of my routers here at the house, so I would like to change the IP number. The other is the name of the executable. And I just had an idea. I could have just put "kyu.bin" into the file xinu.bin and been able to boot Kyu without fussing with the board, albeit causing untold confusion in months and years to come. A better idea is to make this boot a file like "bbb-c.bin" and then make that a link to whatever I want to boot (kyu or xinu or whatever). Xinu wants to boot at 0x81000000, but Kyu wants 0x80000000, so that address also needs to be changed. The file ends up like this.
optargs=quiet
uenvcmdx=echo Booting via tftp (BBBC); setenv saloadaddr 0x80000000; setenv ipaddr 192.168.0.37; setenv serverip 192.168.0.5; tftpboot ${saloadaddr} bbb-c.bin; go ${saloadaddr}
uenvcmd=run uenvcmdx

Rearrange things in /var/lib/tftpboot

cd /var/lib/tftpboot
ln -s kyu.bin bbb-c.bin
This boots and runs (although I never see a console prompt). It displays this line while starting up:
Kyu version 0.8.0 for bbb, Compiled by tom: Sun Nov  4 14:02:15 MST 2018 starting
We clearly need to build a fresh new version of Kyu.

Rebuilding Kyu

I commit all my changes as per the OrangePi/H3 and then do this:
make clean
./config bbb
make clean
make

It is important to remember that the "config" command installs a Makefile specific to the bbb or orangepi. I get surpisingly few errors, only these missing items at link time:

arm-linux-gnu-ld.bfd: board.o: in function `i2c_error':
(.text+0x211c): undefined reference to `i2c_hw_error'
arm-linux-gnu-ld.bfd: main.o: in function `sys_init':
main.c:(.text+0x210): undefined reference to `console_use_ints'
arm-linux-gnu-ld.bfd: thread.o: in function `change_thread':
thread.c:(.text+0x20e4): undefined reference to `wang_hook1'
arm-linux-gnu-ld.bfd: thread.c:(.text+0x2120): undefined reference to `wang_hook2'
arm-linux-gnu-ld.bfd: net.o: in function `kyu_tcp_init':
(.text+0x45ac): undefined reference to `tcp_bsd_init'
arm-linux-gnu-ld.bfd: tcp_xinu.o: in function `tcp_xinu_init':
(.text+0x5c00): undefined reference to `net_timer_hookup'
The "wang_hook" things are cruft in thread.c from some of the TCP debugging.
I need to copy some things from the orangepi Makefile (to include tcp_bsd).
After this, I just need to supply: console_use_ints(). On the Orange Pi this is in serial.c. I provide a stub and see how far I get booting this up. This really is slow!! Maybe I do need to enable the caches myself (or just go to a different board for now).

PRI_SHELL is defined in user.c. I have it set to 11 (since the orange pi was using interrupts to receive characters). I think I will set it to 60 and reboot. I also disable "WANT_NET" in kyu.h and rebuild. It tells me "CPU clock 1000 Mhz" (this is from board.c board_init()). I get this disturbing message:

32K timer running at ~ 2420665644 Hz
Enabling interrupts
Then there is quite a long delay and finally I get the prompt.
Kyu, ready> l
  Thread:       name (  &tp   )    state     sem       pc       sp     pri
  Thread:      shell (8008e1b0)   READY C          8000b59c 80550000   60
  Thread:       idle (8008e0bc)   READY C          800114d4 80552000 1234

Switch to unit "E"

I pull another unit out of my drawer. Many times I thank myself for buying several of boards like this. This indicates that I installed a Debian console image onto it. That is what I ought to do to unit C, try reflashing it and getting rid of the ancient U-Boot.

Since it boots Debian from eMMC, all I need to do is to create the file /uEnv.txt (not /boot/uEnv.txt as I explain elsewhere).
I make it look like this:

optargs=quiet
uenvcmdx=echo Booting via tftp (bbb-e.bin); setenv saloadaddr 0x80000000; setenv ipaddr 192.168.0.38; setenv serverip 192.168.0.5; tft
pboot ${saloadaddr} bbb-e.bin; go ${saloadaddr}
uenvcmd=run uenvcmdx
We do this in /var/lib/tftpboot:
ln -s kyu.bin bbb-e.bin
And the reboot now is quite snappy and I am running Kyu in under a second it seems like. So that board C really does need some work.

I reenable WANT_NET in kyu.h, rebuild, reboot and in a flash I am up and looking at:

RAM 505M+ available starting at 806a0000
Kyu version 0.8.0 for bbb, Compiled by tom: Sat Jan  7 09:35:51 PM MST 2023 running
Kyu, ready> l
  Thread:       name (  &tp   )    state     sem       pc       sp     pri
  Thread:     net-in (8008e610)     SEM J  net-inq 80011824 80558000   20
  Thread:    net-out (8008e51c)     SEM I net-outq 80011cc4 8055a000   21
  Thread:   net-slow (8008e428)  REPEAT C          80017104 8055c000   22
  Thread:  tcp-input (8008e7f8)     SEM J  tcp-inq 80011824 80554000   24
* Thread:  tcp-timer (8008e704)  REPEAT C          800256a8 80556000   25
  Thread:      shell (8008e9e0)   READY I          80006fec 80550000   60
  Thread:       idle (8008e8ec)   READY C          80011d18 80552000 1234
Kyu, ready>
Along the way I noticed:
Host info obtained via DHCP
My IP = 192.168.0.127, netmask = ffffff00
And indeed this is the address I can ping it on. This is in the range of pool DHCP addresses on my network, and entirely OK. The mystery is why the OrangePi build did not do DHCP like this.

Run some TCP tests on the BBB

Typing "t 7" yields a time from the linux timeserver. This is a nice start. I edit my /etc/hosts file and give it the name "tequila".

I use "t 5" to start up the wangdoodle server on port 114 then I do:

[tom@trona tcp]$ echo 'big' | ncat -v tequila 114 >try.out
Ncat: Version 7.93 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.0.127:114.
Ncat: 4 bytes sent, 200000 bytes received in 0.07 seconds.
The shocker is that the transfer takes 0.07 seconds on the BBB. On the OrangePi it took 1.14 seconds. That is 16 times faster! What is/was wrong with the Orange Pi? My first guess is simply that the OrangePi is running the network at 10 Mbit rather than 100. Either that or there is some drastic problem with the Orange Pi ethernet driver.

Some simple math. We are moving (ignoring all network overhead) 200,000 bytes or 1,600,000 bits in .07 seconds. That is just short of 23 megabits per second. The orange pi was moving the same information at 1.4 megabits per second.

So I will run an overnight test. Just learning about the 100 megabit issue has made this quite worthwhile. I bumped up the transfer size to a nice round 1M. It is now Sunday afternoon and I am looking at it after church.
Each transfer now takes 0.16 seconds, and it has done 300,000 of them, so I will bring it to a halt.

Typing "t 1" gets a data abort now (during and/or after the test is stopped).

Socket: 80151298 -inactive- pcb = 00000000, state = data abort in thread n_wrapper

CPU: 0
pc : [<800129ec>]	   lr : [<800129e8>]
sp : 80565e80  ip : 03264f7a	 fp : 80565e94
r10: 9ffa56c4  r9 : 9ef40ed8	 r8 : 9ef4f528
r7 : 9ff63c99  r6 : 00000002	 r5 : 80000000  r4 : 9ef4f52c
r3 : 00000000  r2 : 00000064	 r1 : 0000026a  r0 : 60000113
cpsr: 60000013  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32

80565e80  80565e9c 80003fb0 00000003 00000000
80565e90  80565e9c 8000a954 80565eb4 80003fb0
80565ea0  00000000 00000044 80565f9c 48200000
80565eb0  80565ec4 80001128 ff10210f 00000044
80565ec0  80565edc 80000434 80038c85 00000030
80565ed0  80565f84 44e09000 80565eec 800070e8
80565ee0  9ef4f52c 80565f1e 80565efc 8000f0a0
80565ef0  9ef4f528 80565f04 80565f8c 80013408
80565f00  80565f94 62637020 30203d20 30303030
80565f10  2c303030 61747320 3d206574 30303020
80565f20  80560030 2078616d 3320203d 202c000a
80565f30  636f6c00 69203a6b 0a656c64 000a6500
80565f40  ffffffff 80092de7 00000000 00000000
80565f50  9ef4f52c 80000000 00000002 9ff63c99
80565f60  9ef4f528 9ef40ed8 80565f88 80021748
80565f70  80565f94 80092de6 80565f90 00000001
80565f80  80565f94 0000001d 80565fac 800261f0
80565f90  80038c70 00000000 00000000 00000000
80565fa0  00000000 80151298 80565fbc 8002634c
80565fb0  80038d00 00000014 80565fcc 80026440
80565fc0  0000000d 00000004 80565fd4 80025234
80565fd0  80565fe4 800336d8 4a100a00 00000001
80565fe0  80565ffc 8000b378 8008e704 8008c264
80565ff0  80565ffc 8008c264 00000000 80010b34
80566000  ae000000 324d5750 5f344a5f 00444542
80566010  03000000 04000000 b8000000 00000000
80566020  03000000 04000000 c0000000 00000000
80566030  03000000 05000000 56000000 79616b6f
80566040  00000000 02000000 02000000 02000000
80566050  01000000 67617266 746e656d 00313240
80566060  03000000 04000000 2d000000 efbeadde
80566070  01000000 766f5f5f 616c7265 005f5f79
80566080  01000000 6d6e6970 625f7875 706f6265
80566090  74735f72 65707065 69705f72 0000736e
805660a0  03000000 05000000 56000000 79616b6f
805660b0  00000000 03000000 60000000 34000000
805660c0  b0000000 05000000 ac000000 05000000
805660d0  a8000000 05000000 bc000000 05000000
805660e0  b8000000 05000000 b4000000 05000000
805660f0  e8000000 05000000 e4000000 05000000
80566100  e0000000 05000000 84000000 05000000
80566110  80000000 05000000 ec000000 05000000
80566120  03000000 04000000 48000000 0a000000
80566130  03000000 04000000 4e000000 0a000000
80566140  02000000 02000000 02000000 01000000
80566150  67617266 746e656d 00323240 03000000
80566160  04000000 2d000000 efbeadde 01000000
80566170  766f5f5f 616c7265 005f5f79 01000000
80566180  6f626562 735f7270 70706574 00737265
80566190  03000000 0f000000 00000000 6f697067
805661a0  2d666f2d 706c6568 00007265 03000000
805661b0  05000000 56000000 79616b6f 00000000
805661c0  03000000 08000000 5d000000 61666564
805661d0  00746c75 03000000 04000000 6b000000
805661e0  0a000000 01000000 74735f78 00000070
805661f0  03000000 0d000000 75000000 6f626562
80566200  783a7270 7074735f 00000000 03000000
80566210  0c000000 7f000000 efbeadde 08000000
80566220  00000000 03000000 00000000 84000000
80566230  03000000 00000000 c5000000 02000000
80566240  01000000 69645f78 00000072 03000000
80566250  0d000000 75000000 6f626562 783a7270
80566260  7269645f 00000000 03000000 0c000000
80566270  7f000000 efbeadde 09000000 00000000
PC = 800129ec ( thread_tick+1a4 )
Called from timer_tick+c8 -- 8000a954
Called from dmtimer_int+54 -- 80003fb0
Called from do_irq+b8 -- 80001128
Called from irq+34 -- 80000434
Called from serial_puts+2c -- 800070e8
Called from console_puts+18 -- 8000f0a0
Called from printf+3c -- 80013408
Called from socket_show_one+70 -- 800261f0
Called from socket_show_all+60 -- 8002634c
Called from tcp_statistics+cc -- 80026440
Called from bsd_debug_info+14 -- 80025234
Called from bsd_test_show+14 -- 800336d8
Called from n_wrapper+30 -- 8000b378
Called from thr_exit -- 80010b34
Everything seems to be working (other than typing "t 1"). It is interesting that it looks like a timer interrupt took place in the midst of serial_puts(). I can try this over and over and always get the same. It would be worth doing a clean reboot and see if this bug is there. It is not, but other bad things happen. I use "t 5" to start the server on port 114 and after one transfer it is locked up. It will no longer accept new connections.
Things look like this:
 t 1
INPCB: 80150348    SYN received -- local, foreign: 192.168.0.127 114 .. 192.168.0.5 40106
INPCB: 801500a8          Listen -- local, foreign: 0.0.0.0 114 .. 0.0.0.0 0
locker count: 0
Input thread: idle
Timer thread: idle
User lock: wait (idle)
  mbuf: alloc =  1, free =   2, max =  3
mbufcl: alloc =  0, free = 256, max =  0
  sock: alloc =  2, free =   0, max =  2
 inpcb: alloc =  2, free =   0, max =  2
Socket: 80150018   ACTIVE   pcb = 801500a8, state = 4000, rcv =     0, snd =     0
Socket: 801502b8   ACTIVE   pcb = 80150348, state = 4001, rcv =     0, snd =     0 NOFDREF
Kyu input queue size: 0
TCP input queue size: 0
Kyu output queue size: 0
Netbuf head: 8069d320
511 netbuf available
511 netbuf on free list
512 netbuf configured
Clock: 190
Kyu, ready> l
  Thread:       name (  &tp   )    state     sem       pc       sp     pri
  Thread:     net-in (8008e610)     SEM J  net-inq 80011824 80558000   20
  Thread:    net-out (8008e51c)     SEM J net-outq 80011824 8055a000   21
  Thread:   net-slow (8008e428)  REPEAT C          80017104 8055c000   22
  Thread:  tcp-input (8008e7f8)     SEM J  tcp-inq 80011824 80554000   24
* Thread:  tcp-timer (8008e704)  REPEAT C          800256a8 80556000   25
  Thread: wangdoodle (8008e240)     SEM J   socket 80011824 80560000   31
  Thread:      shell (8008e9e0)   READY I          80006fec 80550000   60
  Thread:       idle (8008e8ec)   READY C          80011d18 80552000 1234
So, we are not out of the woods yet. I can do a client connection via "t 7", but when I use "t 2" to start a server on port 111, that works, but I am unable to connect to it. Now things look like the following. Note that both server threads are blocked on "socket" -- this may well be the same semaphore that should be letting us know that a connection has completed.
l
  Thread:       name (  &tp   )    state     sem       pc       sp     pri
  Thread:     net-in (8008e610)     SEM J  net-inq 80011824 80558000   20
  Thread:    net-out (8008e51c)     SEM J net-outq 80011824 8055a000   21
  Thread:   net-slow (8008e428)  REPEAT C          80017104 8055c000   22
  Thread:  tcp-input (8008e7f8)     SEM J  tcp-inq 80011824 80554000   24
* Thread:  tcp-timer (8008e704)  REPEAT C          800256a8 80556000   25
  Thread:    tcp-111 (8008e334)     SEM J   socket 80011824 8055e000   30
  Thread: wangdoodle (8008e240)     SEM J   socket 80011824 80560000   31
  Thread:      shell (8008e9e0)   READY I          80006fec 80550000   60
  Thread:       idle (8008e8ec)   READY C          80011d18 80552000 1234

t 1
INPCB: 80150560          Listen -- local, foreign: 0.0.0.0 111 .. 0.0.0.0 0
INPCB: 80150348    SYN received -- local, foreign: 192.168.0.127 114 .. 192.168.0.5 40106
INPCB: 801500a8          Listen -- local, foreign: 0.0.0.0 114 .. 0.0.0.0 0
It looks like the server on port 114 has launched a connection, which thinks it is waiting to complete. That being hung prevents the port 111 server from even starting a new connection. Wireshark might add interesting details.


Have any comments? Questions? Drop me a line!

Kyu / [email protected]