AIX Process Question

Posted by: phi144

AIX Process Question - 26/10/2005 17:30

Does anyone know of a command to check what processes are running off what CPU? We have a 12 CPU system and the load balance seems to be way off. Certain CPU's are getting hammered while others are barely being used. I'm curious to see what processes are using what CPU. Can it even be done?

Thanks
Posted by: wfaulk

Re: AIX Process Question - 26/10/2005 17:47

I'm sure the answer lies somewhere in SMIT.

But that's a stock answer.

I'm not an AIX expert, but it seems to me that the way Unix timesharing works is that even if you have a single-threaded CPU-intensive process, it doesn't get parked on a single CPU, but will use whatever CPU becomes available. That means that it'll still use one-nth of the total CPU power in the machine, though, where "n" is the number of CPUs in the box.

If you're seeing particular CPUs overutilized, either AIX works very differently from other Unices (which is, of course, true in so many other cases) or there's something odd going on.

You might check out the "bindprocessor" command. It might lead you somewhere.
Posted by: phi144

Re: AIX Process Question - 26/10/2005 17:56

I havn't seen anything that jumps out at me in SMIT. You're right, Unix seems to spread the load out evenly with what is available. I don't actually think this is an AIX issue but an application issue somewhere. Some developers came to me last minute testing an Oracle install and hardware upgrade I did this past weekend and then they reconfigured the environment. I ran topas and noticed the paging space was a bit high and ran some reports in TeamQuest and noticed one CPU which seemed to get utilized much more than the rest. It seemed odd and wanted to pin down the process(es) they were testing. I actually think this is a problem with the way they setup their JVM's but I don't want to point fingers without more evidence.
Posted by: drakino

Re: AIX Process Question - 26/10/2005 18:31

Could it be something like a certain processor dedicated to dealing with all the interrupts? I'd imagine a 12 processor IBM box would spread that out, but maybe something is misconfigured. I'm only taking a wild guess based off my understanding of SMP on x86.
Posted by: wfaulk

Re: AIX Process Question - 26/10/2005 18:35

Well the thing is that it's the OS that schedules what processes are running on which processors. Like I've said, I don't know much about AIX, but I doubt that an incorrect application installation could "trick" the OS into only scheduling on a processor subset.

In Solaris, there is a command called psrset (and a stripped-down companion called pbind) that allow the user to control what processors will be available for specified processes. bindprocessor seems to be the closest thing I can find in AIX, but it seems much more limited, so I was hoping it might lead you to the more advanced command in AIX, but maybe it doesn't exist.

Actually, now that I look, top gives me the CPU ID of running processes in Solaris. Maybe it'll give it to you in AIX, too.
Posted by: phi144

Re: AIX Process Question - 26/10/2005 18:42

Wild guesses are more than welcome at this point. I doubt one cpu is handling that, although something is odd.

I wonder if I can associate a PID with an hdisk and then associate the hdisk with a specific cpu (thinking out loud).
Posted by: phi144

Re: AIX Process Question - 26/10/2005 18:48

True. I only think the way I do because I never noticed anything like this before today. Your point is very valid and makes the most sense.

Again, you're right, bindprocessor seems very limited in AIX although I haven't done much with in the past so I'll have to dig deeper.

Thank you for your efforts. If anyone else has any thoughts I'd like to hear them.
Posted by: peter

Re: AIX Process Question - 26/10/2005 18:53

What's your load average? What is the box mainly running (Oracle, apache, sendmail)? If your load average, N, is always less than 12, then there's no reason for the OS to use N/12 of each processor when it can just fill N processors and leave the others idle. Also check that your application is actually capable of doing 12 things at once (i.e. that you haven't configured apache to use only 6 threads, or whatever).

Peter
Posted by: wfaulk

Re: AIX Process Question - 26/10/2005 19:03

While it's true that there's no reason it has to swap processors for a CPU-bound thread, my experience is that the way SVR4 timesharing works is that even on an otherwise unloaded machine, the thread will move amongst all of the processors. The computer is always doing something else, so there will always be context switches. On the other hand, this is probably less true on a 12-CPU machine than on a 2-CPU machine, and AIX is about as distant from "standard" SVR4 as you can get, so my assumptions may be wrong.
Posted by: Mataglap

Re: AIX Process Question - 26/10/2005 21:12

I've seen (am living with) similar problems with Java applications. I think that there are issues where children of some java threads are implicitly bound to the same processor as their parent, probably due to the way resources and objects are passed to the child processes.

This is one of the things that makes multi-threaded and/or parallel proccessing harder. C.f., Pfister

--Nathan
Posted by: Mataglap

Re: AIX Process Question - 26/10/2005 21:17

Quote:
If your load average, N, is always less than 12, then there's no reason for the OS to use N/12 of each processor when it can just fill N processors and leave the others idle.


I don't know how AIX works, but both Solaris and Irix normalize the load average by the number of processors, where Linux doesn't. In other words, a four processor Linux box is is working as hard as it can when the load average is 4, but a 4 proc Sun box would report the same workload as 1. Enough work to max out a single CPU would be reported as 1 on Linux and 0.25 on Solaris.

--Nathan
Posted by: wfaulk

Re: AIX Process Question - 26/10/2005 21:39

Actually, IIRC, Solaris doesn't normalize the load number, but does normalize the CPU utilization percentage.