Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#298281 - 09/05/2007 20:40 Odd problem with networking on CentOS 5 (Linux)
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
I figured I'd throw this odd one out there to see if anyone has seen similar and has any ideas.

Software: CentOS 5 running kernel 2.6.18
Hardware: Multiple Dell PowerEdge 1950s and a Precision 490
Network: Broadcom 5708 in the servers, 5752 in the Precision, also an Intel card in the Precision.

The problem is that whenever these boxes talk on the LAN, they show extremely slow performance. 30k/s to a remote office that other boxes can easily pull down files at 1000k/s. At first we thought it was networking related with the switches the servers were on, so they tried a different switch. Then a different patch panel. Then just running direct network wires to different switches in the server room. And then the Precision was set up on a different switch and also had the same problem.

The idea came up that it could be the broadcom NICs somehow, possibly due to the tg3 module. So toss in the Intel NIC into the Precision, with the onboard NIC turned off, and same result, even with the NIC now using the e100 module. Also talked with Dell about the servers, they suggested turning off some odd feature called "TOE" that does some network offloading from the CPU to the NIC. We pulled the RJ11 enabler key off the motherboard, same issue.

A Fedora Core 3 box running 2.4 on the same network port sees the expected megabit speeds to the remote office, and near 100mbit speeds on the local LAN.

Currently, the server is being loaded with CentOS 4.4, and I'm trying to put Red Hat Enterprise 4 back on the Precision to see what happens next.

Top
#298282 - 09/05/2007 23:32 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Are these new systems, or were they working okay before you upgraded the OS?
_________________________
Bitt Faulk

Top
#298283 - 10/05/2007 00:18 Re: Odd problem with networking on CentOS 5 (Linux) [Re: wfaulk]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
They are all new and haven't been loaded with any other OS. To add to what I have tried now, I booted a Knoppix disc on the Precision running 2.6.17 and it too had the same issue.

Just really puzzled why only these boxes are having this issue on multiple switches. Might try Windows on the Precision just as a curiosity tomorrow.

Top
#298284 - 10/05/2007 00:28 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
The only thing that immediately comes to mind is that some network admins like to force speed and duplex on their switches, disabling ethernet negotiation. But if the NIC isn't also forced, it defaults to 10/half, which still causes problems, unsurprisingly.

If you can get ahold of your net admin, have him check the ports you're plugging into to see if they are forced to something. If so, have him turn negotiation back on, or, failing that, force your cards to the speed the switch is forced to. (Under Linux, that would be done using the "ethtool" utility.) If you can't get a net admin to tell you, just try forcing your NIC to each possible setting and see if any of them work. If they are forced, they're likely to be forced to 100/full, so start there.
_________________________
Bitt Faulk

Top
#298285 - 10/05/2007 00:51 Re: Odd problem with networking on CentOS 5 (Linux) [Re: wfaulk]
jimhogan
carpal tunnel

Registered: 06/10/1999
Posts: 2591
Loc: Seattle, WA, U.S.A.
In line with Bitt's thoughts, it would be interesting to go to Best Buy and snag a dumb little $80 Netgear gig switch like a GS108 and hook two of the Centos5 machines to it and then run (iperf) back and forth. I don't see any on-line complaints about RHEL5/Centos5/2.6.1x. You'd think that if it were kernel-specific or even Dell-specific, there would be more howling.

I'm interested. I have some hope of loading Centos 5 on a brand new box with Broadcoms on Friday. We'll see
_________________________
Jim


'Tis the exceptional fellow who lies awake at night thinking of his successes.

Top
#298286 - 10/05/2007 00:53 Re: Odd problem with networking on CentOS 5 (Linux) [Re: wfaulk]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Nothing is forced on the switched we had tried. 2 switches were gigabit, while 2 others were 100. On each switch, the nics auto-negotiated to either gigabit full, or 100 full.

The net admins couldn't find anything odd on the switches error wise either.

Really puzzling problem so far. At first we had just figured it was a tg3 issue, but once the Intel nic was thrown in, it really confused us.

Top
#298287 - 10/05/2007 05:01 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
The only other times I've seen severe slowdowns like this are when there are duplicate MAC addresses on the LAN and when routing relies on ICMP redirects, but I can't imagine either one of those is your issue.
_________________________
Bitt Faulk

Top
#298288 - 10/05/2007 12:07 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Try comparing the values from these "files" on the two kernels:

/proc/sys/net/ipv4/tcp_window_scaling
/proc/sys/net/ipv4/tcp_wmem


Then try toggling the value of /proc/sys/net/ipv4/tcp_window_scaling
(eg. just do echo 0 > /proc/sys/net/ipv4/tcp_window_scaling).

Or set it to 1, and change the other one to something like this:
echo "4096 16384 65536" > /proc/sys/net/ipv4/tcp_wmem

Experiment like that. If nothing much changes, then we can eliminate them
from the possible issues.

-ml


Edited by mlord (10/05/2007 12:07)

Top
#298289 - 10/05/2007 12:17 Re: Odd problem with networking on CentOS 5 (Linux) [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Quote:
Try comparing the values from these "files" on the two kernels:

/proc/sys/net/ipv4/tcp_window_scaling
/proc/sys/net/ipv4/tcp_wmem


Then try toggling the value of /proc/sys/net/ipv4/tcp_window_scaling
(eg. just do echo 0 > /proc/sys/net/ipv4/tcp_window_scaling).

Or set it to 1, and change the other one to something like this:
echo "4096 16384 65536" > /proc/sys/net/ipv4/tcp_wmem

Experiment like that. If nothing much changes, then we can eliminate them
from the possible issues.

-ml


Mmm.. bad recollection here. It's actually the tcp_rmem setting we should be playing with, rather than the tcp_wmem setting.

In particular, I have used this in the past to work around slowness problems since linux kernel 2.6.17 was released.

echo 4096 87380 174760 > /proc/sys/net/ipv4/tcp_rmem

Top
#298290 - 10/05/2007 14:53 Re: Odd problem with networking on CentOS 5 (Linux) [Re: mlord]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Quote:
Then try toggling the value of /proc/sys/net/ipv4/tcp_window_scaling
(eg. just do echo 0 > /proc/sys/net/ipv4/tcp_window_scaling).


Initially it was 1, set it to 0, and the speeds went to what was expected on the next attempt. Looks like it only works for new connections, won't change existing ones. No big deal though since we can just toss this in a startup script and be done with it.

Thank you so much for the help. The only reason I got involved was because this issue had been going on long enough to now impact my development schedule for this part of the scrum. Now I can get back to my development tasks.

Top
#298291 - 10/05/2007 15:00 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Quote:
Quote:
Then try toggling the value of /proc/sys/net/ipv4/tcp_window_scaling
(eg. just do echo 0 > /proc/sys/net/ipv4/tcp_window_scaling).


Initially it was 1, set it to 0, and the speeds went to what was expected on the next attempt. Looks like it only works for new connections, won't change existing ones. No big deal though since we can just toss this in a startup script and be done with it.

Thank you so much for the help. The only reason I got involved was because this issue had been going on long enough to now impact my development schedule for this part of the scrum. Now I can get back to my development tasks.


Peachy. A better solution, than simply disabling it, is to try the other fix of setting the tcp_rmem numbers back to their pre-2.6.17 values. The numbers I posted earlier should work, but if not, try reducing the last number by half (powers of two) until it does work.

This will give the best throughput.

Cheers


Edited by mlord (10/05/2007 15:01)

Top
#298292 - 10/05/2007 15:05 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Quote:
Quote:
Then try toggling the value of /proc/sys/net/ipv4/tcp_window_scaling
(eg. just do echo 0 > /proc/sys/net/ipv4/tcp_window_scaling).


Initially it was 1, set it to 0, and the speeds went to what was expected on the next attempt. Looks like it only works for new connections, won't change existing ones. No big deal though since we can just toss this in a startup script and be done with it.


I think that most distros have a "proper" place for setting stuff like this, for what it's worth.

The file is /etc/sysctl.conf, into which you could place this line:

net/ipv4/tcp_window_scaling=0

Or this better alternative:

net/ipv4/tcp_rmem=4096 87380 174760

Doing it this way (via that file) is somewhat preferable, because the /proc/sys/* interface is supposed to go away in a few years, in favour of a binary socket "netlink" style of setting the same value. The tools that use the sysctl.conf file are supposed to "just work" either way.

Cheers


Edited by mlord (10/05/2007 15:06)

Top
#298293 - 10/05/2007 15:09 Re: Odd problem with networking on CentOS 5 (Linux) [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Oh, and the whole reason for this problem in the first place, is that somewhere between you and the remote office there is probably a buggy OpenBSD router that cannot deal with negotiation of large TCP window scale factors --> it just truncates away the upper bits, leaving it with a very tiny TCP window size that just crawls along..

-ml

Top
#298294 - 10/05/2007 15:47 Re: Odd problem with networking on CentOS 5 (Linux) [Re: mlord]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Quote:

The file is /etc/sysctl.conf, into which you could place this line:

net.ipv4.tcp_window_scaling=0

Or this better alternative:

net.ipv4.tcp_rmem=4096 87380 174760


Thanks again. I ended up going with the rmem adjustment you gave here. It seems to be doing about as well as turning the whole window_scaling option to 0. Going lower on the third number started to slow things down.

I also tweaked the quoted examples above to use periods as separators, seems to be the way everything else was working in the file.

Top
#298295 - 10/05/2007 18:44 Re: Odd problem with networking on CentOS 5 (Linux) [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Quote:

I also tweaked the quoted examples above to use periods as separators, seems to be the way everything else was working in the file.


Great!

From the sysctl(8) manpage:
Quote:
The ’/’ separator is also accepted in place of a ’.’.


Cheers

Top