Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#267715 - 21/10/2005 19:51 shared scsi storage in cluster
muzza
Pooh-Bah

Registered: 21/07/1999
Posts: 1765
Loc: Brisbane, Queensland, Australi...
work is implementing Exchange Server 2k3 and we're putting together a hardware list for installation. there are nearly as many theories in setting this up as there are books to describe them. One system we're going with is a clustered server back end to handle the database.
This arrangement has a shared scsi storage system.
I'd like to know how this works, on a scsi level.
I thought that a scsi bus had one controller and many slave devices. How does the control function work with two controllers?
_________________________
-- Murray I What part of 'no' don't you understand? Is it the 'N', or the 'Zero'?

Top
#267716 - 21/10/2005 23:44 Re: shared scsi storage in cluster [Re: muzza]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
The controller, or "initiator" in SCSI-speak, is assigned a SCSI ID just like every other device in the chain. It is virtually always ID 7. However, you can get your controller to have a different SCSI ID. If you do that, you can have one controller at ID 7 and another at another ID on the same chain. As long as they don't both try to control the devices at the same time, you should be okay. That said, there will still be some downtime in the event of a changeover. In a catastrophic failover, the new machine will at the very least have to repair the filesystems left damaged by the sudden shutdown of the other machine, if not repair the Exchange store. It might even have to reset the whole SCSI chain. It's still better than having to move cables manually, though. Even if it's a scheduled changeover, the filesystems will have to be sync'd on the primary machine before the other machine can take over, and a reset might have to be performed there, too.

It's a pertty unstable configuration, but many people have put it into production. You have to be very, very, careful about making sure both computers don't access the machine at the same time. This is more difficult than it sounds. You wouldn't want a simple network failure to cause both machines to think the other is down and cause them both to start writing. Maybe folks have implemented a check on the SCSI level these days.

So, to sum up, it's electrically possible and protocol-possible, but, I think, only as a side effect of the specfication, not as a design point. In addition, it's much harder under Windows, where the OS wants to grab every storage device it can see right off the bat. But it's doable.
_________________________
Bitt Faulk

Top
#267717 - 21/10/2005 23:46 Re: shared scsi storage in cluster [Re: wfaulk]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
It also might be more likely to use a FibreChannel-based SAN where this sort of thing is expected. And there are SCSI-to-SAN adapters out there if you had some existing SCSI storage you wanted to use. This is a much more expensive option, but less hacky.
_________________________
Bitt Faulk

Top
#267718 - 22/10/2005 00:06 Re: shared scsi storage in cluster [Re: wfaulk]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Apparently I'm a post-whore tonight.

I was just doing a little research, as it's been quite a while since I've done this (the proper term is "multi-initiator SCSI", BTW), and it seems that the SCSI spec explicitly allows for multiple initiators. In fact, it seems that they are actually allowed to both be active at the same time. I didn't realize that. The upshot of that, if it's true, is that you could conceivably have something set up to allow both computers to function at the same time, probably not accessing the same disks, but allowing one to take over the functions and storage of the other if and when the other goes down. Pretty snazzy: performance and redundancy.
_________________________
Bitt Faulk

Top
#267719 - 22/10/2005 00:12 Re: shared scsi storage in cluster [Re: muzza]
Mataglap
enthusiast

Registered: 11/06/2003
Posts: 384
WIthout knowing specifics of the hardware and software config and components, it's really hard to say. As weird as it sounds, the hardware level is the easiest part to do. Making the application work is what's hard.

Generally, in that configuration each controller can access all of the disks equally well. A SCSI bus (probably LVD SCSI) isn't like the IDE/ATA bus, and it's normally not a problem to have multiple controllers on the same bus -- at a base functional level -- as long as they have different SCSI ids, or the storage device has exposes the same SCSI id on two different buses. Each controller will be able to do whatever it needs to.

The problem is that each controller knows =nothing= about what the other controller is doing. And so, commonly, you only ever see one controller on a bus.

In a single controller setup, like just about every computer you've ever used, the operating system has complete control of the disk controller and the content that is on the disk. So there's coherency between all processes running on that computer and the contents of the disk. If one user deletes a file, the operating system actually does the delete and any cached knowledge about the contents of the disk are consistent with what's actually on the disk.

In a shared storage situation each controller (and the operating system that has access to that controller) can access the disks equally. So here's a scenerio:

Code:
Computer A                                          Computer B

========== ==========
1. Email account created -
2. Email recieved and stored for user -
3. - User reads email and deletes it
4. Another email received, and \ -
writen to the disk location \
immediately after previous \
email
5. BOOM! Neither server really knows what's going on



There are a number of solutions to prevent these kind of race conditions.
1. Fine grained locking internal to each instance of the application that is stored on the shared disk, but even then it's really damn hard to prevent a race for creating the lock

2. Have both/all instances of the application that accesses the disk communicate with each other constantly about who's accessing what part of the shared storage. Oracle's Distributed Lock Manager is a form of this, which is what makes Oracle Parallel Server/RAC work. Active-active cluster.

3. Only one node is actively running the application at a time, and the disks are only mounted on the active node. (Both nodes are up and running other applications that talk to each other with heartbeats and all sorts of stuff.) Active-passive/standby cluster. When there's a problem with one machine, the other machine notices this, maybe because it doesn't get the heartbeat message or whatever, and the node state transitions from passive to active, forcibly taking control of the disks and the application starts up, recovering the shared data store. Veritas Cluster Server frequently works like this, and the application doesn't need to be cluster aware.

That's a very basic description. It's reasonably easy to write a cluster solution that works 80%-90% of the time, writing one that you can count on 100% of the time is very, very hard.

Generally, installations of this type will also have a high level of redundancy in the disk storage. (You're more likely to have a disk failure than a server failure that wouldn't corrupt data.) RAID 5 with multiple hot spares if performance isn't a major concern, or two disk arrays & two servers that are connected to each other.

Code:
serverA    serverB

| \ / |
| \ / |
| \ |
| / \ |
| / \ |
diskA diskB



And do RAID 1 (either software with two seperate SCSI controllers OR hardware on a dual port controller) so that when either server does a disk write it's dispatched to both arrays.

edit: And it's probably worth mentioning that the disk arrays you'd use in a situation like this aren't just simple external SCSI disks. They tend to have many different SCSI buses -- one SCSI bus that the disks are actually on which is seperate from the SCSI bus(es) that the external connectors connect to the servers on.

--Nathan


Edited by Mataglap (22/10/2005 00:16)

Top
#267720 - 22/10/2005 00:27 Re: shared scsi storage in cluster [Re: wfaulk]
Mataglap
enthusiast

Registered: 11/06/2003
Posts: 384
Quote:
You wouldn't want a simple network failure to cause both machines to think the other is down and cause them both to start writing. Maybe folks have implemented a check on the SCSI level these days.


In real world implementations you have dedicated network connections between the nodes that use special protocols, that are both high-bandwidth and low-latency. Back in the 10/100 days this was frequently SCI (scalable coherent interface) which was more like token ring than ethernet. Now, gigE is generally sufficent, though not running TCP/IP.

A best practices Veritas Cluster Server install will require two dedicated network interfaces running GAB/LLT plus a third (the normal network NIC) running TCP/IP. Node-to-node chatter including the heartbeat is sent over all three interfaces -- two of them dedicated to intra-cluster comms via x-over cables, and a lighter weight backup over your normal network connection and switch.

--Nathan


Edited by Mataglap (22/10/2005 00:36)

Top
#267721 - 22/10/2005 00:36 Re: shared scsi storage in cluster [Re: wfaulk]
Mataglap
enthusiast

Registered: 11/06/2003
Posts: 384
Quote:
It also might be more likely to use a FibreChannel-based SAN where this sort of thing is expected. And there are SCSI-to-SAN adapters out there if you had some existing SCSI storage you wanted to use. This is a much more expensive option, but less hacky.


And much, much easier to scale past two machines. Direct Attached SCSI clustering is almost always just two nodes, though there are a =few= disk arrays that can support more SCSI connections.

FibreChannel is actually much more like a network protocol than a peripheral bus. And so it's much easier to do >2 node clusters with it, you have devices that are exactly like ethernet switches and hubs for FC connections. Once again though, you want hardware redundancy on these devices also. (hence the multiple GAB/LLT interfaces in a VCS configuration, each would connect to a seperate switch.)

Top
#267722 - 23/10/2005 10:29 Re: shared scsi storage in cluster [Re: Mataglap]
muzza
Pooh-Bah

Registered: 21/07/1999
Posts: 1765
Loc: Brisbane, Queensland, Australi...
the SCSI to SATA controller we're considering is this one from Areca which sits in this case from Chenbro. Do you know if it supports multiple slave nodes on the bus? I would have thought that in the arrangement you described, the passive node simply takes over the whole bus.
However, the way the system was described to us was that both servers were active. Was their description wrong?
Thanks for your help here btw!
_________________________
-- Murray I What part of 'no' don't you understand? Is it the 'N', or the 'Zero'?

Top
#267723 - 24/10/2005 01:39 Re: shared scsi storage in cluster [Re: muzza]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
The array controller you are looking at means that it is possible a solution similar to the MSA500. I can't confirm that this is the exact case, as the web site isn't letting me get to any product documentation. I can however share how the MSA500 works.

Basically, it's a poor mans SAN. Each server needs an Ultra 320 SCSI card (or rather a Smart Array card ), and the MSA500 has a Smart Array based controller in it. The SCSI connections from the server to the MSA500 are isolated from each other, keeping the two devices alone on their chains to avoid contention issues in high load environments. Then the disks are attached to the MSA500 on their own SCSI chain. The configuration we allow has a maximum of 4 computers running Linux or Novell clustering solutions, or two node Windows clustered. There are options for redundant controllers in the MSA for automatic failover, and also an option for 2 node, 2 cable/hba setups with some "multipath" software to keep Windows from having issues when the storage path changes.

Generally the classic two controllers on a single SCSI chain is a bad idea in any environment that might see a high load. Since in that type of configuration, one of the two controllers would have a higher priority ID and would be able to monopolize the bus if it ever got busy. As far as I know, Compaq and Digital never sold such a solution, instead always going towards setups like the MSA500 that isolated the two server SCSI controllers from each other.

Top
#267724 - 24/10/2005 12:22 Re: shared scsi storage in cluster [Re: wfaulk]
schofiel
carpal tunnel

Registered: 25/06/1999
Posts: 2993
Loc: Wareham, Dorset, UK
Quote:
it's electrically possible and protocol-possible, but, I think, only as a side effect of the specfication, not as a design point.


Multi-master was one of the original concepts in SCSI 1 following on from the original SASI single master spec. The concept was properly codified in the SCSI 2 spec. I operated a multi-master setup in a medical imager, but the targets were not block (ie. disk) device classes (at least not with modifiable device attributes).

A lot of the access control mechanisms are actually built into SCSI at the protocol level (ie. devices and in the transport) but due to the massively Windows-centric nature of the user world since the mid-90s, master-to-disk type single controller setups have become prevalent, and about 90% of the SCSI spec remains unused, with many devices not using a significant proportion of the available capabilities of the bus. Our system talked to a CPU, Co-processor and an ultrasound Scanner (as a block class device) in parallel between two host controllers. A really impressive, rapid system able to do a lot of parallelisation - just by exploiting more of SCSI's feature set.

Shame. SCSI had enormous potential. Never exploited, and under-used: now it's been committed to the graveyard.
_________________________
One of the few remaining Mk1 owners... #00015

Top
#267725 - 24/10/2005 12:30 Re: shared scsi storage in cluster [Re: wfaulk]
schofiel
carpal tunnel

Registered: 25/06/1999
Posts: 2993
Loc: Wareham, Dorset, UK
There are many, many features built into SCSI that are not used. Disk to disk copying and mirroring without host intervention. Command list sequencing between multiple devices to get the devices to carry out the work without host intervention. Multiple device classes (block, serial, CPU, Co-pro, sink, scanner, all sorts). Multi-Master operation. Host to host rapid data transfer (remember IP over SCSI? pre-dated gigabit and was DAMN fast over optical). File system support primitives for device locking down to single-sector operations. Multiple overlapping access requests queued by the target device itself, calling back to the initiating device (a host, or even another device).

The concept of a "Host" on SCSI never really existed, it was always Initiator and Target, and this means that any device, not just computer host adaptors, could initiate control actions on any other, in any order, at any time. Wonderful system.

If you haven't guessed yet, I was seriously into SCSI at one time
_________________________
One of the few remaining Mk1 owners... #00015

Top
#267726 - 24/10/2005 19:08 Re: shared scsi storage in cluster [Re: schofiel]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Quote:
calling back to the initiating device


This gets used quite a bit actually in newer devices. For example, a disk with a good chunk of cache can get the data from the initiator that need to be written to the disk, then stop it's connection on the bus opening it back up. Then, once it flushes the cache out to the disk, it will call back to the initiator and say it is complete, allowing more data to be sent.

Quite a few interesting features of SCSI did get moved into Fibre Channel as well, allowing a tape library to back up a disk array without talking to a server. Ironically, instead of IP over SCSI though, things are moving towards SCSI over IP on fibre lines, to help integrate and simplify high speed data and TCP/IP networks.

Top
#267727 - 25/10/2005 19:15 Re: shared scsi storage in cluster [Re: muzza]
Mataglap
enthusiast

Registered: 11/06/2003
Posts: 384
Quote:
the SCSI to SATA controller we're considering is this one from Areca which sits in this case from Chenbro. Do you know if it supports multiple slave nodes on the bus? I would have thought that in the arrangement you described, the passive node simply takes over the whole bus.
However, the way the system was described to us was that both servers were active. Was their description wrong?
Thanks for your help here btw!


Simultaneaous multiple controllers on the same bus isn't a problem, as the others who know far more about SCSI than I have explained. There is no concept of active/active or active/passive nodes on the SCSI bus, that's a higher level application thing.

I've only glanced at the parts you mention. If you don't trust your consultants to reccomend the right parts you have other, larger problems.

As the controller is a SCSI-SATA device, it should be clear that the individual disks aren't on the SCSI bus that is connected to the servers. The SCSI device (SCSI side of the controller) in your array is going to be an abstract block device -- logical disks --, and not individual disks. The SATA side is going to manage the indvidual -- physical -- disks, and there's non-trivial logic in between the two that will "translate" between the two.

--Nathan

Top
#267728 - 26/10/2005 10:58 Re: shared scsi storage in cluster [Re: Mataglap]
muzza
Pooh-Bah

Registered: 21/07/1999
Posts: 1765
Loc: Brisbane, Queensland, Australi...
Quote:
The SCSI device (SCSI side of the controller) in your array is going to be an abstract block device -- logical disks --, and not individual disks. The SATA side is going to manage the indvidual -- physical -- disks, and there's non-trivial logic in between the two that will "translate" between the two.


That's what i thought.
I found out today that it really is just a single 'main' server with a 'secondary' hot spare. If the main dies, the secondary takes over seamlessly.
Because the RAID scsi<->sata controller is one, maybe two, ID's on the buss, is it concievable that other similar units could be added if needed? Wouldn't the computers just see several individual scsi units on the buss?
_________________________
-- Murray I What part of 'no' don't you understand? Is it the 'N', or the 'Zero'?

Top
#267729 - 26/10/2005 14:43 Re: shared scsi storage in cluster [Re: muzza]
Mataglap
enthusiast

Registered: 11/06/2003
Posts: 384
Concievable? Yes. Practical? Not so much.

Top