Information Redundancy

Information Redundancy

Information can be defined a group of bits organised in a well defined manner. In real time system large amount is information has to be transferred in a very short time in a precisely secure manner. There are many casualties during the transfer of information. For example the information can be corrupted due to the interference with another stream of bits or can be duplicated. Basically there are many types, of these let us discuss four most important types.

Duplication

The duplication is process of creating more number of similar bits of the information and passing them along with the stream that was actually sent. So, at the receiver end the data received will not meet the requirements. For example, consider 11001010 as the stream of bits forwarded by the user and in the middle of the transmission the last bits 0 were duplicated and sent as 110010100, leads to an invalid stream of bits. This is duplication of the bits which will result in data loss and causes security issues.

Parity Bits

A parity bit is a bit that is added to ensure that the number of bits with value of one in a given set of bits is always even or odd. Parity bits are used as the simplest error detecting code. As for binary digits, there are two variants of parity bits: even parity bit and odd parity bit. An even parity bit is set to 1 if the number of ones in a given set of bits is odd (making the total number of ones, including the parity bit, even). An odd parity bit is set to 1 if the number of ones in a given set of bits is even (making the total number of ones, including the parity bit, odd). Even parity is actually a special case of a cyclic redundancy check (CRC), where the 1-bit CRC is generated by the polynomial x+1. If the parity bit is present but not used, it may be referred to as mark parity, where the parity bit is always 1, or as space parity, where the bit is always 0. If an odd number of bits (including the parity bit) are changed in transmission of a set of bits then parity bit will be incorrect and will thus indicate that an error in transmission has occurred.

Therefore, parity bit is an error detecting code, but is not an error correcting code as there is no way to determine which particular bit is corrupted. The data must be discarded entirely, and re-transmitted from scratch. On a noisy transmission medium a successful transmission could take a long time, or even never occur.
Parity does have the advantage, however, that it is about the best possible code that uses only a single bit of space and it requires only a number of XOR gates to generate. Parity bit checking is used often for transmission of ASCII characters, since they have 7 bits and the 8th bit can conveniently be used as a parity bit. The duplication problem can be eradicated by the use of the parity bits. Here in this parity bit method, it will count the number of 0’s and 1’s in the stream of the information and will notify the receiver about the number of 0’s and 1’s. So, consider the case of 1001 as information bit stream, now during transmission, if the transmission is interrupted and modified as 10010. Then this can be detected using the parity bit.

A wants to transmit : 1001

A computes even parity value: 1^0^0^1 = 0

A sends: 10010

*** TRANSMISSION ERROR ***

B receives: 10011

B computes overall parity: 1^0^0^1^1 = 1

B reports incorrect transmission after observing unexpected odd result. The drawback of this method is that it can’t identify the error if two bits of same type are modified i.e., consider 1001 and 1100, they both will result in same parity bit and so this can’t be identified.

Checksum

A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and comparing it with the stored one. If the checksums do not match, the data was certainly altered. The procedure that yields the checksum from the data is called a checksum function or checksum algorithm. A good checksum algorithm will yield a different result with high probability when the data is accidentally corrupted; if the checksums match, the data is very likely to be free of accidental errors. There are many types of Checksum algorithms, of these three are important, namely single precision, double precision and honey well. Let us see each in brief.

Single Precision:

Let us consider 1011 1000 1001 1000 as the information stream. Here in this single precision algorithm we add all these in a 4 bit passion as shown below.

1011

1000

1001

1000

After adding this we have to discard the carry we obtain and send the check sum along with the data and on the receiver side also, the should obtain the same checksum else, it is certain that the data have been corrupted.

Double Precision:

This is similar to the single precision as described above. The only difference between the single precision and the double precision is that, in double precision we calculate the checksum and we don’t discard the carry and we send the carry along with checksum and the same is calculated at the receiver’s side and expected to get the same checksum with same carry.

Honey well:

In case of this honey well method we add the information bits, but here in 8 bits which will generate a 8 or 9 bit checksum which will be forwarded to the receiver. The format is as specified.

1011 1000

1001 1000

————-

1101 0000

————-

This is beneficial in case of the large amount of the information to be transferred in a secure manner. There are many other checksum functions like Parity byte or parity word, Modular sum, Position-dependent checksums etc.

AN CODES ( Arithmetic codes):

There are many AN codes for preventing the information redundancy during the transmission. Let us discuss some of the major AN codes

ANCODE x3:

4 x 3 = 0100 x 3 = 1100

5 x 3 = 0101 x 3 = 1111

9 x 3 = 1001 x 3 = 11101

Similarly many Arithmetic codes can be used to check the information redundancy in the system. Every time an arithmetic code was used by the transmitter, the receiver must be intimated about the code about to be transferred to the transmitter. All these are useful for reducing the information redundancy during the transmission of the information.

A Special Thanks to Rayabhagi Srikanth for his contribution. (The contents are not checked for its originality)

Advertisements

Hardware Redundancy

Hardware Redundancy

  • Use of additional hardware to compensate for failures
  • This can be done in two ways

    • Fault detection, correction and Masking. Multiple hardware units may be assigned to do the same task in parallel and their results compared. If one or more units are faulty, we can express this to show up as a disagreement in the results.
    • The second is to replace the malfunctioning units.
  • Redundancy is expensive, duplicating or triplicating the hardware is justified only in most critical applications

Two methods of hardware redundancy is given below are,

  • Static Pairing
  • N modular Redundancy (NMR)

Static Pairing

  • Hardwire processors in pairs and to discard the entire pair if one of the processors fails, this is very simple scheme
  • The Pairs runs identical software with identical inputs and should generate idientical outputs. If the output is not identical, then the pair is non functional, so the entire pair is discarded
  • This approach is depicted in the following figure, and it will work only when the interface is working fine and both the processors do not fail identically and around the same time.
one
Static Pairing with Monitor

  • So the interface is monitored by means of a monitor which monitors the interface. If the interface fails, the monitor takes care and if the monitor fails, the interface takes care. If both interface and monitor fails, then the system is down. The monitor block is added as a dotted box in the above figure

N Modular Redundancy

  • It is a scheme for Forward Error Recovery.
  • It works with N processors instead of one and voting on their output and N is usually odd.
  • NMR can be illustrated by means of the following two ways

    • There are N voters and the entire cluster produces N outputs
    • There is just one voter

    two
    N Modular Redundancy
  • NMR clusters are designed to allow the purging of malfunctioning units. That is, when a failure is detected, the failed unit is checked to see whether or not the failure is transient. If it is not, it must be electrically isolated from the rest of the cluster and a replacement unit is switched on. The faster the unit is replaced, the more reliable the cluster.

    three
    Self Purging
  • Purging can be done either by hardware or by the operating system. Self purging consists of a monitor at each unit comparing its output against the voted output. If there is a difference, the monitor disconnects the unit from the system. The monitor can be described as a finite state machine with two states connect and isolate. There are two signals, diff which is set to 1 whenever the module output disagrees with the voter output and reconnect, which is a command from the system to reconnect the module.

Fault and Error Containment

A Fault in one part of the system cause large voltage swings in the other parts of the system. So it is necessary to prevent from spreading through the system. This is called as containment.

This can be divided into

  • Fault Containment Zone (FCZ) and
    • A failure of some part of the computer outside an FCZ cannot cause any element inside that FCZ to fail
    • Hardware inside the FCZ must be isolated from the outside system.
    • Each FCZ should be have independent power supply and its own clock (may be synchronized with the other clocks)
    • Typically, the FCZ consists of a whole computer which includes processors, memory I/O and control interfaces.
  • Error Containment Zone (ECZ)
    • Prevent errors from propagating across zone boundaries. This is achived by means of voting redundant outputs.
      • Hardware Redundancy
      • Software Redundancy
      • Time Redundancy
      • Information Redundancy

Introduction to Fault Tolerance

Fault Tolerance Techniques

  • Introduction
    • Hardware Faults – Occurs due to a physical defect of a system like a broken wire or a logic struck at 0 in a gate.
    • Software faults – occurs due to a bug introduced in a system so the software misbehaves for a given set of inputs
    • Error – the manifestation of a fault is the error (Fault may occur anytime, but only the error manifests that fault)
    • Fault Latency – the time between the onset of fault and its manifestation as an error is the fault latency
    • Error Recovery
      • Forward Error Recovery – the error is masked without any computations having to be redone.
      • Backward Error Recovery – the system is rolled back to a moment in time before the error is believed to have occurred.

What Causes Failures?

There are three main causes of failures:

  • Errors in the specification or design
    • Mistakes in the specification and Design are very difficult to guard.
    • Many hardware failures and all software failures occur due to such mistakes.
    • It is difficult to ensure that the specification is completely right.
  • Defects in the components
    • Hardware components can develop defects.
    • Wear and tear of components
  • Environmental effects
    • Devices can be subjected to whole array of stresses, depending on the application.
    • High ambient temperatures can melt components or otherwise damage them.

Fault Types

Faults are classified according to temporal and output behavior

  • Temporal behavior classification

    A(t)    B(t)

    C(t)    D(t)

  • Permanent faults
    • Does not die away with time, remains until it is repaired
    • Ex. Broken wires
    • From the above Diagram: A(t)>0; B(t) =C(t) = D(t)=0
  • Intermittent Fault
    • It cycles between the fault active and fault benign states.
    • Eg. Caused by loose wires    
    • From the above Diagram: A(T)>0; B(t)>0; D(t)>0; C(t)=0
  • Transient Fault
    • Dies away after some time
    • Ex: environmental effects
    • From the above Diagram: A(t)>0; C(t)>0; B(t)=D(t)=0
  • Output behavior classification
    • Malicious
    • Non malicious