Wednesday, 14 January 2015

File handling in C - Theory


File - File is a resource for storing and retrieving information. Most of the operating systems store files as one dimensional arrays. There are many types of files like text files, program files, data files etc.  

Why Files? - Simply because primary memory is volatile and ceases data when program execution is finished. For using data later in future for further processing or just reading data needs to be stored in files.

File System - Method of organizing files on the disk so that those can be retrieved later. Different operating systems use different file systems to manage files on disk. For example Windows OS uses FAT (File Allocation Table) and NTFS (New Technology File System) file systems.

Buffer - Buffer is a block in main memory (RAM) that is used to store temporary data (input or output). Since accessing the disk every time for reading or writing back is slow buffer is used. Whenever we try to read data or write data from or to disk buffer is used. Data is first moved to buffer while reading and written back to it first before writing it back to the disk.  

Note - File pointer in C is location of pointer in buffer at any particular time. 

Text File - Text file is a file which is stored on the disk encoded with any character coding standard like ASCII or UTF.

If we use ASCII code it uses 7 bit code stored in a byte on disk. So if you want to store a character 'a' it will be stores as 97 ( ASCII code for character a is 97). There are 128 different ASCII codes for storing different characters on disk.

At disk level these files are still binary files only difference is these are binary files which are encoded with ASCII or UTF codes i.e. each byte is written using these codes.

On important aspect of these files is End of File character. In most cases CTRL+Z is assumed to be the EOF character. 

Binary File - General binary files do not have any restriction on how these are stored on disk. Each 256 bit pattern can be used in any byte of binary file. Each byte of a binary file can be one of the 256 bit patterns. Executable files, object files, sound files, image files all are binary files.
End Of File in binary files is last byte of the file.

File Operations: 

1) Create

2) Open

3) Write

4) Read

5) Move

6) Close 

File Modes:  Purpose of opening the file. Following modes are supported by C:

1) r -  read

2) w -  write

3) a -  append

4) r+ - read and write

5) w+ - read and write

6) a+ - read and write

7) rb - binary file read

8) wb - create or open a binary file

9) ab - binary file append

10) rb+ - binary file read write

11) wb+ - binary file read write

12)ab+ - binary file read write  

Functions for handling basic file operations in C: 

1)fopen() - creates a new file or open an existing file.

2)fclose() - closes a file.

3)fread() - read from a binary file.

4)fwrite() - write to a binary file.
 

Other functions used for handling files in C: 

1)fseek() - set position of file pointer to the desired location in the file.

2)ftell() - returns the current position of the file pointer in the file.

3)rewind() - set the position of the file pointer to the beginning of the file.

4)getc(), fscanf(), getw() - read a character, set of data and an integer from a file respectively.

5)putc(), fprintf(), putw() - write a character, set of data and an integer to a file. 

Functions explained - 

1) fopen() - This function opens the file name which is specified in the specified mode. This function will return a pointer that can be used to manipulate the file. In case function call fail due to some issues it will return a null pointer. 

Declaration - FILE *fopen(const char *filename, const char *mode)

Example -  

FILE * ptr   

ptr = fopen("myfile.txt","w");   

// File type pointer is declared which will hold the pointer to the first byte address when the file is opened. 

// fopen() function is called with myfile.txt (file which is to be opened or created) argument and w ( this is the mode in which we want to open the file, w specifies that we just want to write the contents of the file) argument. 

Note - If myfile.txt file does not exist it will be created. 

If the file is not opened due to some reasons fopen() will return a NULL. This may be due to many reasons like file is write protected, file does not exist etc. So in most of the programs we check whether fopen() call was successful or not. To do this you will see code like following after every fopen() call. 

if(!ptr)
{
return 1;
}

This if condition will return a 1 when fopen() is not successful indicating something wrong with the call. 

2) fclose() - This function will close the stream opened by fopen() function. 

Declaration - int fclose(FILE *ptr) 

As we see in declaration for fclose(, it can be called with an file type pointer as argument. This pointer will be the same pointer with which the fopen() function was called.
If the file is closed successfully this function will return 0 otherwise it will return EOF (End Of File). 

3)fread() - This function reads the specified number of elements from a input stream ( file in general terms) and stored these in a block of memory and return the number of items read. 

In other words this function reads unformatted data from a stream into a buffer.

Declaration - size_t fread( void *buffer, size_t size, size_t count, FILE *ptr) 

Here, 

*buffer - Pointer to the buffer to where we will store data we read.

size - size of each element to be read. 

count - number of bytes to read. 

ptr - pointer to the file stream from which to data will be read.

This function will return the number of items read. If number of items read is different from requested amount (count parameter) we may needs to check if this is because of some error or EOF is reached. We can do this with the help of feof() or ferror() functions. 

What is Stream? - Stream is just a block of data which is coming from some source. Difference between file and stream is that file is complete set of data and stream is subset of that data which is read into the buffer. 

Hence when we want to create, copy, delete, move or open file we use file. But when we want to just do read and write operations we use stream to do that. 

Example - 
Note - In following example I have not checked the error conditions while opening the file with fopen() or while reading the file with fread().  

int count;

char buffer[1000]; // Buffer where data is stored while reading

long file_stream;  // Stream from which data will be read from

char *filename = "c:\\myfile.txt"; // file path and name

file_stream = fopen(filename,"r");  

count = fread(buffer, sizeof(char),1000, file_stream)


This line of code will read 1000 bytes from the file_stream into a array pointed by buffer.

count will return number of bytes read.

If count is not equal to 1000 bytes this means either we reached EOF before 100o bytes or some error occured. To check what happened we can use feof() or ferror() functions described later.

4) fwrite() - This function writes unformatted data from a buffer to a stream. 

Declaration - size_t fwrite( void *buffer, size_t size, size_t count, FILE *ptr) 

Here, 

*buffer - Pointer to the buffer containing the data. 

size - size of each element to be written. 

count - number of bytes to store in buffer. 

ptr - pointer to the file to which data will be written to. 

Example -

FILE *fp;

char str[] = "My name is KBanyal"; // String which we want to write on file.
fp = fopen( "file.txt" , "w" ); // File is the pointer to file we need to write data to.
fwrite(str , 1 , sizeof(str) , fp );
 
Note - fwrite() will return the total number of elements successfully written. If number of records written is different than count then a error is thrown. ferror() function can be used to validate if any error occured while writting to the file.

5)fscanf() - This function reads formatted data from the stream. It reads bytes, interprets them according to a format and store the results in its arguments. This function returns number of items successfully read.If some specifier is specified to ignore some element while reading those elements are not included in the count.

Declaration - int fscanf(FILE *fp, const char *format, ...)

Here,

fp = pointer to a FILE object which is the stream to read from.

format = A sequence formed by an initial % sign indicates a format specifier. This is used to specify the type and format of the data to be retrieved from the stream and stored      into the locations pointed by additional arguments.

The prototype for fscanf() format specifier is something like this - %[*][width][length]specifier

Here specifier specifies which characters are extracted from the stream. For example following specifiers can be specified to extract corresponding element from the stream.

%d reads an integer

%f reads a float    

%lf reads a double             

%c reads a character, including white space. If more than 1 character needs to be read at a time specify the width.          

%s reads a string up to first white space

%[...] string, up to first character not in brackets

Example %[abk] will read 'kbanyal' as 'kba'.

%[0123456789] would read in digits

%[^...] string, up to first character in brackets

%[^\n] would read everything up to a newline

* is used to ignore particular elements in the stream.

Example %*d will ignore all the integers in input stream.

Examples:

fscanf(infile, "%d,%c", &x, &c); // read an int & char from file where int and  char are separated by a comma

fscanf(infile,"%s", array);  // read a string from file into array stops at white space

fscanf(infile, "%lf %24s", &d, array);  // read a double and a string upto 24 chars from infile

fscanf(infile, "%20[012345]",array);  // read a string of at most 20 chars consisting of only chars in set

fscanf(infile, "%ld %d%c", &x, &b, &c);  // read in two integer values store first in long, second in int read in end of line char into c

6) fprintf() - This function is used to send formatted output to a file stream. This function returns total number of characters printed. If error occurs it will return a negative number.

Declaration - int fprintf(FILE *fp, const char *fs, ...)

fp = Pointer to a file (actually stream)

fs = Formatted string we want to write to the file (actually stream)  

Return Value - On success - Total number of characters printed.

               On error - Negative number

fs or the format string can be formed using different format tags.

Prototype = %[flags][width][.precision][length]specifier

Most common specifiers are following:  

   %d = Displays and integer

   %f = Displays a floating-point number in fixed decimal format

   %e = Displays a floating-point number in exponential notation

   %s = Displays string of characters

   %u = Displays unsigned decimal integer

   %c = Displays a character     

   Since most of the format tags will never be used and there are lots I am just giving one or two examples for all the tags below.  

   Flags = flags can be used for various purposes. Main is to format data as per requirement. For example -

   1) - = is used to left justify the field.

     Example - fprintf(fp,"%-10d %c",143,'k');

     Output: 143        k

     7 blanks after 143.

   2) 0 = to pad field with zeros rather than blanks.

      Example - fprintf(fp,"%010d %c",143,'k')

      Output: 0000000143 k

      Right side blanks padded with 7 0’s.  

   Width = every data format is provided with minimum required width  to hold the same. Width format tag can be used to increase it further. 

    1) %[width]d - will increase the field width of given integer.

      Example - fprintf(fp,"%4d",7)

      Output:     7

      3 blanks to the right of 7 to increase its width.

    2) %[width]s - will increase the width of given string.

      Example - fprintf(fp,"%15s","kbanyal")

      Output:         kbanyal

      8 blank spaces before kbanyal to increase its width.

  

   Precision = This format tag takes different meanings for different format types.  

     1) %[total].[decimal]f - Here total field length will be [total] and [decimal] of these will hold the decimal part for a float value.

   Example - fprintf(fp,"%10.3f",546666.7)

   Output: 546666.700

     2) %[minimum].[maximum]s - Here minimum width is [minimum] and maximum width is [maximum] for a string value. If string is more than [maximum] value it will be cropped to [maximum] value.

   Example - fprintf(fp,"%3.4s","kbanyal");

   Output: kban  

   Length = length modifier is used to let fprintf know that we want to print very big or very small variables. For example -  

   1) short int - Modifier to be used in this case is 'h'. Short int is always takes less or equal bits as int. that is short int <= int <= long.

   Example - fprintf(fp,"%hd",1)

   Output: 1

   2) long double - Modifier to be used in this case is 'L".

   Example - fprintf(fp,"%Lf",10.000000001223);

   Output: 10.000000

7) fseek() – This function sets the current position in a file to a new location.

When we perform read and write from or to a file, operating system keep track of our location in file using file pointer. At any time during read or write if we want to change our location to any other location in file we can use fseek() function for this.

Declaration - int fseek( FILE *ptr, long offset, int origin);

Here, 

*ptr = Pointer to the file 

offset = The offset within the file (in byte) 

Origin = The starting point. We can set it using following values: 

SEEK_SET – Beginning of the file 

SEEK_CUR – Current position of the file pointer. 

SEEK_END – End of file. 

Return value

On success – zero

On failure – negative number 

Examples:

1) fseek(fp, 100, SEEK_SET); // Move to 100th byte from start of file.

2) fseek(fp, 100, SEEK_CUR); // Move to 100th byte from current position.

3) fseek(fp, -100, SEEK_END); // Move to 100th byte before end of file.

4)ftell() – This function tells us the location in the file from where it will be read from or where it will be written to. Note – The location is relative to the beginning of the file. 

If we want to know where we are in the file at any particular time we can use this function to get that value. 

Declaration – long int ftell(FILE *fp) 

Return value

On success – current offset relative to beginning of file.

On failure – negative number 

Can be used are follows: 

long position;

FILE *fp;

fp = fopen(“xxxx”,”r”);

position = ftell(fp); 

9)rewind() – This function re positions the file pointer to the start of the file. 

Declaration – void rewind(FILE *fp)

 

 

 

 

 

 

 

 

Sunday, 4 January 2015

Linear Search in C

Search - Searching is finding any item from a set of items which meets a specified condition.
For example - searching tallest student of a class from a provide list of students with their hight details.

Searching is very important aspect of most of the computer applications. Different types of data structures are created overtime to facilitate fast searches.

Examples are linear data structures like arrays or link lists and some more recent data structures like hash table or binary search trees.

Since we want to learn the algorithms for search hence it is in our own interest to apply those on data structures which are not designed for efficient data searching like binary search trees or hash tables.
Types of search algorithms which can be applied on linear data structures like array or linked list are linear search and binary search.

Linear Search - It is the most basic search algorithm which will search for the required element by traversing each element in the set sequentially.If the item which we are looking for is found search is stopped else proceeds to the next element until all the elements are traversed.

Linear Search in C -

#include <stdio.h>
void main()
{
     void linear_search(int *a, int b, int c);
    
     int i,array_size,array[20],element_to_search; 
    
     printf(" Enter the number of elements you want to enter in the array : \t ");
     scanf("%d",&array_size);
    
     for(i=0;i<array_size;i++)
     {
     printf("\n Enter the %d element : \t",i+1);
     scanf("%d",&array[i]);                        
     }
    
     printf("\n The set of elements in which we need to search the element is :  ");
     for(i=0;i<array_size;i++)
     printf("%d  ",array[i]);
    
     printf("\n Enter the element to be searched :  ");
     scanf("%d",&element_to_search);
    
     linear_search(&array,array_size,element_to_search); 
       
     getch();    
 }
// Below is the actual function which do the linear search. Note - This function returns all the occurrences of the element in the data set. With very small modifications it can be modified to stop execution when the required element is found for the first time. Just uncomment the code if condition and comment the existing code in there.

 void linear_search(int *array, int array_size, int element_to_search)
 {
      int i;
      int found = 0;
     
      for(i=0;i<array_size;i++)
      {
       if(element_to_search==array[i])
       {
       printf("\n Entered element's postion in the set of elements entered is :\t %d",i+1);
       found = 1;


      //  printf("\n Entered element's first occurrence in the set of elements entered is at :\t %d",i+1);
     //   found = 1;
     //   break;



       }
       }
       if(found==0)
       {
       printf("\n Entered element is not found in the set");
       }                       
       
  }
 
  Analysis of Linear Search -
 
  Worst case - When the element we want to search is not present in the given set. Linear search will compare element to be searched with all the
  elements in the array or list. Therefore the worst case complexity in this case will be O(n).
 
  Average case - Average can be considered when the element we want to search is present at the middle position of the array. Complexity in that
  case too will be O(n/2) ~ O(n). This can also be derived by simply considering average case as mean of worst case complexity and best case
  complexity.
 
  Best case - When the element we want to search for is present at the first location of the array. This will take O(1) time every time. This is a
  constant and do not depend on the length of array. 
 

Saturday, 3 January 2015

Insertion sort in C

Sorting - Sorting refers to different techniques or methods of arranging or ordering things based on a criteria.

For example - We have list of student names and their age. Sorting techniques can be applied to sort this list based on student's names in decreasing order or in increasing order of their age or vice versa.

Sorting algorithms are the fundamental algorithm problems in computer science. The sorting algorithms are also embedded into other algorithms like search and merge algorithms.

Some of the sorting algorithms are bubble sort, selection sort, insertion sort, heap sort, quick sort, merge sort etc.

Insertion Sort -  Simple sorting algorithm which works the same way as we play bridge. Cards are laid down on the table facing down and we pick one card at a time compare it with the other cards (from left to right or vice versa) and insert it into the correct position. This sorting algorithm is very easy to implement and is quite efficient for small data set. In fact this algorithm is more efficient than selection sort and bubble sort.

Basic Principle - In every iteration this algorithm removes one element from the data set, compare it with already sorted list of elements and insert it into the correct position. The element to be picked from the input data set can be decided arbitrarily.

How it works with example -

Suppose we have an input array of five elements - 4 2 5 9 1

After each iteration this input array will be divided into two sub arrays. One will be sorted array and other unsorted one. In the beginning our sorted array will have only one element i.e. first element (4 here). Since sorted array has only 1 element it is sorted. Now in the next step we will pick next element i.e. 2 (from the unsorted array). Please follow following steps -

Step 1 :Sorted array in the beginning - 4     Unsorted array in the beginning - 2 5 9 1

Step 2 :Sorted array will become - 2 4        Unsorted array will become - 5 9 1 // here 2 is picked  from  unsorted array and inserted into sorted array after comparing it with already present element 4 in that array.

Step 3 :sorted array will become - 2 4 5      Unsorted array will become - 9 1   //here 5 is picked from unsorted array and inserted into sorted array after comparing it with already present elements 2 and 4 in that array.

Step 4 :sorted array will become - 2 4 5 9    Unsorted array will become - 1    //here 9 is picked from unsorted array and inserted into sorted array after comparing it with already present elements 2, 4 and 5 in that array.

Step 5 :sorted array will become - 1 2 4 5 9   //here 1 is picked from unsorted array and inserted into sorted array after comparing it with already present elements 2, 4, 5 and 9 in that array.

C Code (Insertion Sort) -

#include<stdio.h>
int main(){
  int i,current_position,back_position,array_size,temp,array[20];
 
  printf("\nEnter number of elements to sort:\t ");
  scanf("%d",&array_size);
 
  printf("\nEnter Elements one by one press enter after every element:");
  for(i=0;i<array_size;i++)
  {
  scanf("%d",&array[i]);
  }
 
  printf("\nArray before sorting:");
  for(i=0;i<array_size;i++)
  {
  printf("  %d\t",array[i]);
  }
 
  for(current_position=1;current_position<array_size;current_position++)
  {
  back_position = current_position;
 
  while(back_position > 0 && array[back_position] < array[back_position-1])
  {
    temp = array[back_position];
    array[back_position] = array[back_position-1];
    array[back_position-1] = temp;
    back_position--;                 
  }
                                                                      
  }
 
  printf("\n Array after sorting:");
  for(i=0;i<array_size;i++)
  {
  printf("  %d\t",array[i]);
  }
  getche(); 
  return 0;
}

Analysis of insertion sort.

Best case - Already sorted data. Best efficiency in this case (o(n)). Just one pass through the data. No insertion happens.

Worst case - Data is sorted in reverse order. Efficiency in this case in (o(n^2)). Every pass of inner loop (while loop in our code above) will scan and shift the entire sorted part of the array and then insert the next element. The average case also has the efficiency of O(n^2).

Calculating worst case complexity -

To insert the last element from the unsorted array to a sorted array we need n-1 comparisons and n-1 shifts. Similarly to insert n-1 st element we need n-2 comparisons and n-2 shifts and so on.

Number of comparisons = n-1 + n-2 + .... + 1 = n(n-1)/2 = o(n^2)
Number of shifts = n-1 + n-2 + ...... + 1 = n(n-1)/2 = o(n^2)
So the complexity in worst case is of the order o(n^2).