DHCP, dagnabbit!

Posted by: mlord

DHCP, dagnabbit! - 17/07/2003 08:24

Using DHCP with my Empegs has never worked reliably here, so a long time ago I switched over to static IP addresses for all of them.

But today, I finally took time to delve deeper into the situation, and still no resolution.

First, I upgraded my 1997-era DHCP server to a more recent DHCP v3.01 reference implementation. Once again, all of my various 10mb and 100mb clients work perfectly (as before) with it, except for the Empegs (actually, they are all RioCar Mk2a units).

The network here is very simple: one 16-way SureCom 10/100 switch (model EP-816X), into which everything is plugged directly, including the DHCP server.

I have tried two other DHCP servers as well, both plugged into that same switch, and again, everybody except the Empegs work just fine with it.

What happens is, the empeg sends the bootp request, the server replies with the requested info (IP address etc..), and the empeg then retries, as if it never received the reply.

So.. I tried connecting the empeg directly to the DHCP server, bypassing the swtich. Works just fine that way, so the server seems innocent. Same deal for the other two (different) servers I have at my disposal.

Next step: I connected a 10mb shared hub to the switch, and plugged the empeg into that, along with a laptop so I could capture a trace of the messages. Bummer -- never fails with this setup. Just the presence of the 10mb hub, with the empeg connected to it, "fixes" the problem.

But I don't really want a 10mb shared hub as a permanent fixture in my otherwise simple and fast network here.

So, one step left to try, I suppose --> I'll patch the empeg kernel to dump out a trace of all messages in/out to the serial port, I suppose. Messy, but at least that way it oughta be possible to trace this issue further.

Any other suggestions?

In the past, I used to have an 8-port shared 10/100 hub instead of the switch, but the same problem occured with that. I think my Empegs just don't play nicely with 10/100 hubs and switches.

Cheers
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 08:27

I suppose I ought to already know this, but..

Where is the DHCP client code on the Empeg? Is this part of the player app?

And is that code exactly the same in the v3alpha series (same as the v2final)?

Thanks
Posted by: wfaulk

Re: DHCP, dagnabbit! - 17/07/2003 08:35

There's a tcpdump that I compiled for the empeg on riocar.org.
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 08:36

I don't believe I can use tcpdump on the Empeg until AFTER it does the DHCP negotiation.. or have you tried this and gotten a trace?

Thanks
Posted by: wfaulk

Re: DHCP, dagnabbit! - 17/07/2003 08:38

Dammit. Good point.

You could try to start it with Hijack and have it write its output (the -w binary output) to an unused partition. I don't know that that would work, but it's worth a shot.
Posted by: genixia

Re: DHCP, dagnabbit! - 17/07/2003 09:02

I thought that tcpdump bound to the device, and not the address. So as long as the device has been plumbed...
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 09:03

Okay, here is what the SMC driver's "DEBUG" setting generates.
I have attached a FAILED trace to this message, and a SUCCESSful trace to the next message.
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 09:04

And here is the successful trace, which happens only when I insert a 10mb/s shared hub between the Mk2a and the 10/100 switch that everything else uses:
Posted by: genixia

Re: DHCP, dagnabbit! - 17/07/2003 09:09

Oh, for what it's worth, DHCP rarely fails completely for me (Netgear 100BaseTX 8port switch). It sometimes takes a retry or 2.

I reckon that it's got something to do with the switch autosensing the line speed taking longer than the empeg is willing to wait. In most other platforms the time between plumbing the interface and requesting an IP is probably longer.
If this is the case, I wonder if hijack could plumb the interface and send a dummy packet out earlier, thus forcing the switch to autosense earlier.
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 09:09

Another interesting tidbit:

When I hardcode (static) IP address on the player, the VERY FIRST few packets sent to it (when I FTP or PING later on) are lost, and then it "wakes up" and begins responding..

Probably the exact same problem as with the DHCP (not receiving packets initially).

??
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 09:18

Here is a log, showing what the player dumps out when pinged 4 times immediately after booting with a static IP address.

The first three ping packets are not successfully replied to -- actually, it appears that the player never sees at least two of them. All is fine after this initial loss of data, though..

Posted by: genixia

Re: DHCP, dagnabbit! - 17/07/2003 09:30

2.2.25 only has one change in drivers/net/smc9194.c
08/20/00 Arnaldo Melo fix kfree(skb) in smc_hardware_send_packet

2.4.3 also has;
12/15/00 Christian Jullien fix "Warning: kfree_skb on hard IRQ"

Haven't checked any other 2.4 versions...
Posted by: mlord

Re: DHCP, dagnabbit! - 17/07/2003 09:46

I reckon that it's got something to do with the switch autosensing the line speed taking longer
I agree that there must be a defect in the autosense mechanism -- but how do I trigger autosense? Well, it happens when the Empeg is connected to the hub, right? Cuz that's when the hub lights up and says "10mb/sec" instead of 100 on the front panel LEDs.

So.. I just assigned static IP, rebooted the player, waited 10 minutes, and then pinged it. It always misses (doesn't see) the first ping.

???
Posted by: wfaulk

Re: DHCP, dagnabbit! - 17/07/2003 10:17

As I understand it, some data must be passed in order for the switch to sense speed and duplex. What kind of switch do you have? I know that even some of the residential-grade switches are forceable to 10/half.
Posted by: genixia

Re: DHCP, dagnabbit! - 17/07/2003 10:46

I'm no layer2 expert either. But as I understand it from what I have read, what basically happens on the HW level is; (100BaseTX)

some_timed_IRQ_handler(){
if (Autonegotiation_Enabled) autonegotiate;
else send_periodic_100BaseTX_link_pulse;
send_link_pulse
}

received_something_handler(){
if (received_autonegotiate) negotiate();
else if (received_100BaseTX_link_pulse) set_100BaseTX_link;
else if (received_link_pulse) set 10BaseT_link;


Now a 10BaseT chipset such as the empeg knows nothing about the 100BaseTX stuff, so is much simpler;

some_timed_IRQ_handler(){ send_link_pulse(); }

received_something_handler() { if (received_link_pulse) set_link; }

Now, I'm assuming that autonegotiate() actually involves sending some layer2 data and doing a full handshake.

So from this and what you posted earlier (getting link, later ping failing), the switch has already done what it is supposed to. Unless it has mysteriously forgotten and does it again the first time data is sent. I can't see how the empeg can foul this up - it _only_ knows about 10BaseT link pulses.

The other possibility is that the switch has a buffer problem (possibly in HW) whereby the first X bytes get lost.
Posted by: tfabris

Re: DHCP, dagnabbit! - 17/07/2003 11:19

Next step: I connected a 10mb shared hub to the switch, and plugged the empeg into that, along with a laptop so I could capture a trace of the messages. Bummer -- never fails with this setup. Just the presence of the 10mb hub, with the empeg connected to it, "fixes" the problem.
Right, that's been a long-known problem. The empeg isn't compatible with some switches, and daisy chaining into a plain 10mb hub fixes it. We don't know why. Maybe we can get to the bottom of the problem through your research?
Posted by: TheAmigo

Re: DHCP, dagnabbit! - 26/07/2003 16:33

My LinkSys 16port 10/100 autosensing switch also works flawlessly with my Mk2a for DHCP.

I reckon that it's got something to do with the switch autosensing the line speed taking longer than the empeg is willing to wait. In most other platforms the time between plumbing the interface and requesting an IP is probably longer.

I remember learning that lesson the hard way with Cisco switches... setting spanning-tree portfast was the only way my PCs would get DHCP.

With the Empeg booting so fast, I wouldn't be surprised to see that be a problem with non-Cisco switches.

Now that I think about it, I'm running TTSClock. That runs before the player boots so I've got several extra seconds of link before DHCP requests are sent... maybe that's why it always works for me.
Posted by: benjammin

Re: DHCP, dagnabbit! - 27/07/2003 22:54

FYI,

I use my Empeg Mk2a with ISC-DHCP (and IBM's goofed up DHCP as well) and they both work dandy.

I have found MORE than one case of improper link-pulses sent by some brands of 10/100 switches that cause link negotiation problems.

I've used my empeg with the little 5port D-Link 10/100's with success, my Cisco 2924 works dandy (although I think I have the port set for PortFast so it brings up the link speedy without freaking SpanningTree getting in the way)

I would speculate it's NOT the DHCP client the empeg has. I've always had good luck with it. I would suspect the hub/switch. I've seen other problems with other devices and various brands of 10/100 autosensing switches.
Posted by: benjammin

Re: DHCP, dagnabbit! - 27/07/2003 22:57

BTW, if you'd like Packet Traces, I can happily provide them. I use my Empeg at work and recently was working on a DHCP problem... so using TCPdump, I can easily give you a .cap file to look at.

-Ben

tcpdump -I -e -w <filename> port 67 and port 68

Posted by: lothar

Re: DHCP, dagnabbit! - 29/07/2003 11:48

"Any other suggestions? "

Purchase a new non-90's era switch. Cmon, you can pick one up for less than $100 now.
Posted by: mlord

Re: DHCP, dagnabbit! - 29/07/2003 14:42

My present switch is a 2001 unit.

Cheers
Posted by: Daria

Re: DHCP, dagnabbit! - 29/07/2003 15:29

Actually, you should be able to tcpdump as soon as whatever sends the DHCP requests starts doing so, as it must UP the interface to do anything useful. I routinely dump on a device with no address... every time my cable modem loses sync, it eventually loses its address, and tcpdump is useful for "well, is it just me, or what?"

And unless this is a boot-time-only problem, you shouldn't have to make hijack do anything more weird than fork tcpdump at the right time; you can have the player up and some partition mounted read/write, and then plug in the ethernet cable.
Posted by: wfaulk

Re: DHCP, dagnabbit! - 30/07/2003 06:42

Actually, you should be able to tcpdump ...
The interface isn't the problem. It's how to start tcpdump and see its output. The player must be running, since it's the DHCP client, which means that you can't do it over the serial port, and if it doesn't yet have an IP address, you can't do it via telnet.

Which leaves your later suggestion, which is essentially the same as the one I made previously, except you logically point out that you could conceivably mount a filesystem r/w before then and write to a regular file.

Regardless, Mark seems to have found where the problem lies. I suppose he could use this to verify that the empeg never sees the initial DHCP packet(s).
Posted by: wfaulk

Re: DHCP, dagnabbit! - 30/07/2003 06:42

What model switch is it, Mark? Maybe we can figure out something if we can look at its documentation.
Posted by: mlord

Re: DHCP, dagnabbit! - 30/07/2003 08:40

The current switch is a Surecom EP-816X. But the same problem existed long before I installed this switch (in hope of it fixing things.. blah!) with my 5-port and 8-port Linksys shared 10/100 hubs.

I have literally dozens of other devices around here, some 10mb/s and many 100mb/s, and NONE of them EVER have any such troubles. Just the Empegs.

-ml
Posted by: wfaulk

Re: DHCP, dagnabbit! - 30/07/2003 08:57

I can't seem to find a manual on their site. The only two things I can think of are speed/duplex negotiation and STP, both already mentioned. See if you can force the speed/duplex and/or disable STP for that port somehow.
Posted by: lothar

Re: DHCP, dagnabbit! - 30/07/2003 18:20

Buy something name brand.

I have a netgear 8 port at home with my linksys soho providing DHCP and NEVER have a problem with mine!
Posted by: tman

Re: DHCP, dagnabbit! - 30/07/2003 18:47

Buying name brand doesn't really help. It seems to be luck of the draw about whether you get a specific hub that works or not. The exact chipset used inside the hub/switch must be making the difference. People have reported problems with Linksys hubs whilst others have it working perfectly. Also I believe somebody did have problems with 3Com hubs.

I've never had any problems and I've tried it on 3Com, NetGear, DynaMode, LinkSys and Buffalo equipment.