Representation of File System in the Kernel

Representation of data on a floppy and hard disk may consistently differ, but representing them in the kernel is almost same.

VFS mounts the file system through the function

  • register_filesystem(struct file_system_type *fs),
  • Example

register_filesystem(&ext2_fs_type);

Mounting

  • Before a file can be accessed, the file system containing the file should be mounted.
  • Done by the call using mount or function mount_root()
  • Every mounted file system is represented by a super_block structure.
  • The function read_super()of the VFS is used to initialize the superblock
    • Managing the file system
    • Will issue some flags during the mounting like

MS_RDONLY //File system is read only

MS_REMOUNT //flags have been changed

 

Superblock Operations

Super block provides functions needed to access the file system and its processing. Some of the operation includes

read_inode();// must execute and Responsible for filling the submitted inode structure

write_inode(); //must execute and Used to store information about the inode structure

put_inode();// optional and Release all the blocks occupied by the inodes and release other resources

delete_inode(); //optional and Delete the inode and no longer be referenced to the filesystem

 

Inode Operation

  • Inode is useful for file management
  • Each inode contains a unique number which addressable to a file
  • Inode operations

create();//fills the mode attributes of the inode dentry

lookup();

link(); //used to create a hard link

unlink(); //deletes the file indicated by dentry.

mkdir();

rmdir();//deletes the subdirectory entry

rename(); moves a file or changes its name

permission();// checking for access rights

File and its Operation

File structure is helpful is providing

Access rights like reading and writing

Current position

Access flags and number of accesses

File Operations

read(); //read the data and put in the user address space

write();//copy data from user address space to the file

mmap(); // the file is mapped to user address space

open(); //create a file structure

flush(file); //called when the file is closed

lock(); // lock is called if file locks are set

Linux File System

Linux supports many number of file systems, that’s why Linux is accepted quickly as one of the best Operating systems.

Linux has the unified interface called as Virtual File system (VFS) which serves as the interface defined in between the OS kernel and the different file systems as illustrated below.

clip_image002

The Virtual file System

  • Supplies the applications with the system calls for file management
  • Maintains the internal structures
  • Passes tasks on to the appropriate actual file system
  • Performance of default actions

Basic Principles

  • Two main factors taken into consideration while designing any file system
    • Speed of access of data and
    • Facility for Random access
      • is made possible by means of the block oriented devices which are divided into specific number of equal sized blocks.
      • When using the blocks, linux uses a buffer cache to speed up the process in random access.
  • In Linux/Unix, the data is stored in a hierarchical file system containing not only files and directories, but also the device files, FIFO (Named pipes), symbolic links and sockets.
  • Every file is represented by a file structure and an inode structure. It is always possible to access a particular file through the inode file with the help of the inode unique number.
  • Directories
    • allow the file system to be given a hierarchical structure. These are also implemented as files, but the kernel assumes them to contain pairs consisting of a filename and its inode number.
    • In older version of unix, it was possible to modify directory files using simple text editor, but for consistency this is no longer available in the new version of Unix.

Structure of UNIX File system

The UNIX file system passes through various processes like

clip_image004

· Boot block

  1. Each file system starts with a boot block. The block is reserved for the code required to boot the operating system.

· Super block

o All information which is essential for managing the file system is held in the superblock

· Inode block

o Contains the inode structure for the files

· Data blocks

o The data blocks contains ordinary files along with the directory entries and the indirect blocks.

In Unix, the separate file systems are not accessed via device identifiers as is the case for other operating systems, but are combined in a hierarchical tree.

The arrangement is made of mounting the file system, which adds another file system to the existing directory tree. A new file system can be mounted to any directory. The original directory is then called as the mount point. Unmounting the file system release the directory structure again.

Sockets for Inter process communication

Socket programming interfaces provides communication via a network as well as locally on a single computer. Example is INET daemon which waits for incoming network service requests and then call the appropriate service program using the socket file descriptor as standard input and output.

Implementation of Unix domain sockets

  • Represented by a kernel data structure socket
  • Socket specific functions like Socket(), setsockout()
  • The functions are implemented with a single system call socketcall which calls all the necessary functions by reference to the first parameter. The file operation read(), write(), poll(), ioctl(), lseek(), close() are called directly
  • Operations of unix domain sockets
    • long sys_socket(int family, int type, int protocol);// creates a socket file descriptor
    • long sys_connect(int fd, struct sockaddr * uservaddr, int addrlen); //bind the socket to the unix domain address with specified length
    • long sys_listen(int fd, int backlog);//checks whether any connections are being accepted at the server address.
    • long sys_accept(int fd, struct sockaddr *upeer_sockaddr, int *upeer_addrlen));//server informs the kernel that the connections are being accepted from now on
    • long sys_getsockname(int fd, struct sockaddr *usockaddr, int *usockaddr_len); //the address bound to the socket is returned.
    • long sys_shutdown(int fd, int how); //to shutdown but status to be given that whether the sending and receiving still allowed
    • sending and receiving from the sockets
      • Messages can be sent either as a datagram or stream of bytes
      • long sys_send(int fd, void *buff, int len, unsigned flags);
      • long sys_sendmsg(int fd, struct msghdr *msg, unsigned int flags);
      • long sys_recv(int fd, void *buff, int len, unsigned flags);
      • long sys_recvmsg(int fd, struct msghdr *msg, unsigned int flags)

Debugging using ptrace

Ptrace is a system call provided in unix for one process to take control of another process to debug errors or bugs in the system

The process under control can be run step by step and its memory can be read and modified.

int sys_ptrace(long request, long pid, long addr, long data);

the function processes various request defined in the parameter request and pid indicates the process id of the process to be controlled.

Using the request PTRACE_TRACEME, a process can specify that its parent process controls iut via ptrace().

System V IPC

Three forms of System V IPC

  • Semaphores

  • Message Queues

  • Shared Memory

System V IPC is different from POSIX API, but both are available in the linux kernel.

GNU C library in kernel version 2.2 includes the interfaces for shared memory and the semaphore according to POSIX.

Access Rights and Numbers

struct kern_ipc_perm

{

key_t key;  //key

uid_t uid; //Owner

gid_t gid;  //Owner

uid_t cuid;  //Creator

gid_t cgid; //Creator

mode_t mode; //Access Mode

unsigned long seq; //counter used to calculate the identifier

};

the user and group id needs 32bit for the Intel 32 bit architecture, so the kernel supported both the IPC_OLD and IPC_64.

Semaphores (System V)

  • Array of semaphores can be setup using the system calls
  • It is always possible to modify a number of semaphores.
  • They can be incremented or decremented in steps greater than 1.

Semaphores are created using the following structure

struct sem_array

{

struct kern_ipc_perm sem_perm; //access permission

time_t sem_otime; //time of the last semaphore operation

time_t sem_ctime; //time of the last change

struct sem *sem_base; //pointer to the first semaphore

struct sem_queue *sem_pending; //operation to be reversed

struct sem_queue **sem_pending_last; //last operation to be carried out

struct sem_undo *undo; //undo operation to be carried out

unsigned long sem_nsems:// Number of semaphores in this array

};

struct sem

{

int semval; //current value of the semaphore

int sempid; //Process ID of the last operation

};

Message Queues

  • Message consists of sequence of bytes and a code.
  • Processes send messages to the queue and can receive message
  • Messages are read in the same order in which they are entered in the message queue.

struct msg_queue

{

struct kern_ipc_perm q_perm; //Access rights

time_t q_stime; //time of last send

time_t q_rtime; //time of last receive

time_t q_ctime; //time of last change

unsigned long q_cbytes; //number of bytes in the queue

unsigned long q_qnum; //number of message in the queue

unsigned long q_qbytes; //capacity of wait queue in bytes

pid_t lspid; //pid of the last sender

pid_t q_lrpid; //pid of the last receiver

};

To send message, the processes use these functions

int sys_msgsnd(int msgid, struct msgbuf *magp, size_t msgsz, int msgflg);

int sys_msgrcv(int msgid, struct msgbuf *magp, size_t msgsz, long msgtyp, int msgflg)

Shared Memory

  • shared memory is the fastest form of Inter process communication
  • exchange data between processes using the machine code commands for reading and writing
  • the main drawback is that the processes need to use additional synchronization mechanism to avoid the race condition
  • Shared segment of memory is identified by a number.
  • The structure shmid_kernel is for the kernel segment and mapped to the user segment in the virtual address space by the processes with the help of attach function, the reverse action will be through the help of detach.

struct shmid_kernel

{

struct kern_ipc_perm shm_perm; //access rights

struct file *shm_file; //file in the shared memory

int id;

unsigned long shm_nattach; //number of attachments

unsigned long shm_segsz; //size of segment

time_t shm_atim; //time of last attach

time_t shm_dtim; //time of last detach

time_t shm_ctim; //time of creation

pid_t shm_cprid; //creator process id

pid_t shm_lprid; //process id of the last operation

};

Information for Semaphore, message queue and shared memory

seminfo

Value

semmni (maximum number of semaphore arrays)

128

semmns (maximum number of semaphores in the system

32000

semmsl (number of semaphores per array)

250

semvmx (maximum value of semaphores

32767

msginfo

Value

msgmni (maximum number of message queue)

16

msgmax (maximum size of a message in bytes)

8192

msgmnb (standarad value for the maximum size of a message queue in bytes)

16384

shminfo

Value

shmmni (maximum no of shared memory segment)

4096

shmmax (maximum size of SHM segment in bytes)

33,554,432

shmmin (minimum size of SHM segment in bytes)

1

shmseg (permitted no of segments/processes)

4096

Pipes and Named Pipes (FIFO)

Pipes are the classical method of interprocess communication.

For example

# ls –l | more

The symbol | indicates a pipe and the shell in the above example runs the processes ls and more which are linked with a pipe. ls writes data to the pipe and more reads it.

Named pipes otherwise called as the FIFO (First in First Out) is the other variant of pipe.

They can be created like this

# mkfifo pathname

Example

# mkfifo hello

# ls –l hello

prw-r- -r- – 1 temp users 0 Aug 28 10.45 hello |

there are many similarities between the pipes and the FIFO, but the inode specification for the both are more or less the same.

following is the inode specification for the pipe

struct pipe_inode_info

{

wait_queue_head_t wait; //wait queue

char * base; //address of the FIFO buffer

unsigned int readers; //no of processes reading at this moment

unsigned int writers; //no of processes writing at this moment

unsigned int waiting_readers; //no of blocked process reading at this moment

unsigned int waiting_writers; //no of blocked processes writing at this moment

unsigned int r_counter; //no of read processes that have opened

unsigned int w_counter; //no of write processes that have opened

};

The length of the area of the pipe is managed in the i_size field. The system call pipe creates a pipe which involves setting up a temporary inode and allocating a page of memory which is decided by the architecture dependent model.

Opening a FIFO

    Blocking Non Blocking
For reading No Writing Processes Block Open FIFO
  Writing Processes Open FIFO Open FIFO
For Writing No Reading Processes Block Error ENXIO
  Reading Process Open FIFO Open FIFO
For Reading and Writing   Open FIFO Open FIFO

Communication Via Files (IPC)

This is one of the oldest way of data exchange in which the information is sent as a file.

File needs two type of locking

Mandatory Locking

  • Read wirte is blocked during the entire locking period

Advisory locking

  • Allows Read and write even after the lock has been set
  • processes accessing the file for Read and Write has to lock it and release it again

Locking Entire Files

There are two methods for locking the entire file

First Method is by the use of several system calls

  • link – a system call
  • create – another system call
  • combination of flags like O_CREAT and O_EXEC with open (system call)
  • lock a fle with open() function using O_CREAT | O_WRONLY | O_TRUNK, but this option fails under the superuser mode

Second Method

fcntl – A system call to lock the entire file. It is also useful to lock the file areas. In Linux 2.0 it is called flock() which is not advisable to use.

Locking file areas

int sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg);

int sys_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg);

where fd – file descriptor

cmd – includes F_GETLK, F_SETLK, F_SETLKW

Semantics of fcntl locks

Existing Lock Setting a read lock Write lock
None Possible Possible
More than One Possible Not Allowed
A Write Lock Not Allowed Not Allowed

Locking files may also lead to deadlock with the following example

Assume there are two processes wanted to read a file,

P1  -> 1 –> 2  (P1 tries to access the 1st block and then 2nd block within a file)

P2 –> 2  -> 1  (P2 tries to access the 2nd block and then 1st block within a file)

Now for getting the 2 and the 1st block, both processes will be in deadlock,

So Communication via files may not be a great option, of course there are some methods to avoid the deadlocks.