Wednesday 14 January 2015

File handling in C - Theory


File - File is a resource for storing and retrieving information. Most of the operating systems store files as one dimensional arrays. There are many types of files like text files, program files, data files etc.  

Why Files? - Simply because primary memory is volatile and ceases data when program execution is finished. For using data later in future for further processing or just reading data needs to be stored in files.

File System - Method of organizing files on the disk so that those can be retrieved later. Different operating systems use different file systems to manage files on disk. For example Windows OS uses FAT (File Allocation Table) and NTFS (New Technology File System) file systems.

Buffer - Buffer is a block in main memory (RAM) that is used to store temporary data (input or output). Since accessing the disk every time for reading or writing back is slow buffer is used. Whenever we try to read data or write data from or to disk buffer is used. Data is first moved to buffer while reading and written back to it first before writing it back to the disk.  

Note - File pointer in C is location of pointer in buffer at any particular time. 

Text File - Text file is a file which is stored on the disk encoded with any character coding standard like ASCII or UTF.

If we use ASCII code it uses 7 bit code stored in a byte on disk. So if you want to store a character 'a' it will be stores as 97 ( ASCII code for character a is 97). There are 128 different ASCII codes for storing different characters on disk.

At disk level these files are still binary files only difference is these are binary files which are encoded with ASCII or UTF codes i.e. each byte is written using these codes.

On important aspect of these files is End of File character. In most cases CTRL+Z is assumed to be the EOF character. 

Binary File - General binary files do not have any restriction on how these are stored on disk. Each 256 bit pattern can be used in any byte of binary file. Each byte of a binary file can be one of the 256 bit patterns. Executable files, object files, sound files, image files all are binary files.
End Of File in binary files is last byte of the file.

File Operations: 

1) Create

2) Open

3) Write

4) Read

5) Move

6) Close 

File Modes:  Purpose of opening the file. Following modes are supported by C:

1) r -  read

2) w -  write

3) a -  append

4) r+ - read and write

5) w+ - read and write

6) a+ - read and write

7) rb - binary file read

8) wb - create or open a binary file

9) ab - binary file append

10) rb+ - binary file read write

11) wb+ - binary file read write

12)ab+ - binary file read write  

Functions for handling basic file operations in C: 

1)fopen() - creates a new file or open an existing file.

2)fclose() - closes a file.

3)fread() - read from a binary file.

4)fwrite() - write to a binary file.
 

Other functions used for handling files in C: 

1)fseek() - set position of file pointer to the desired location in the file.

2)ftell() - returns the current position of the file pointer in the file.

3)rewind() - set the position of the file pointer to the beginning of the file.

4)getc(), fscanf(), getw() - read a character, set of data and an integer from a file respectively.

5)putc(), fprintf(), putw() - write a character, set of data and an integer to a file. 

Functions explained - 

1) fopen() - This function opens the file name which is specified in the specified mode. This function will return a pointer that can be used to manipulate the file. In case function call fail due to some issues it will return a null pointer. 

Declaration - FILE *fopen(const char *filename, const char *mode)

Example -  

FILE * ptr   

ptr = fopen("myfile.txt","w");   

// File type pointer is declared which will hold the pointer to the first byte address when the file is opened. 

// fopen() function is called with myfile.txt (file which is to be opened or created) argument and w ( this is the mode in which we want to open the file, w specifies that we just want to write the contents of the file) argument. 

Note - If myfile.txt file does not exist it will be created. 

If the file is not opened due to some reasons fopen() will return a NULL. This may be due to many reasons like file is write protected, file does not exist etc. So in most of the programs we check whether fopen() call was successful or not. To do this you will see code like following after every fopen() call. 

if(!ptr)
{
return 1;
}

This if condition will return a 1 when fopen() is not successful indicating something wrong with the call. 

2) fclose() - This function will close the stream opened by fopen() function. 

Declaration - int fclose(FILE *ptr) 

As we see in declaration for fclose(, it can be called with an file type pointer as argument. This pointer will be the same pointer with which the fopen() function was called.
If the file is closed successfully this function will return 0 otherwise it will return EOF (End Of File). 

3)fread() - This function reads the specified number of elements from a input stream ( file in general terms) and stored these in a block of memory and return the number of items read. 

In other words this function reads unformatted data from a stream into a buffer.

Declaration - size_t fread( void *buffer, size_t size, size_t count, FILE *ptr) 

Here, 

*buffer - Pointer to the buffer to where we will store data we read.

size - size of each element to be read. 

count - number of bytes to read. 

ptr - pointer to the file stream from which to data will be read.

This function will return the number of items read. If number of items read is different from requested amount (count parameter) we may needs to check if this is because of some error or EOF is reached. We can do this with the help of feof() or ferror() functions. 

What is Stream? - Stream is just a block of data which is coming from some source. Difference between file and stream is that file is complete set of data and stream is subset of that data which is read into the buffer. 

Hence when we want to create, copy, delete, move or open file we use file. But when we want to just do read and write operations we use stream to do that. 

Example - 
Note - In following example I have not checked the error conditions while opening the file with fopen() or while reading the file with fread().  

int count;

char buffer[1000]; // Buffer where data is stored while reading

long file_stream;  // Stream from which data will be read from

char *filename = "c:\\myfile.txt"; // file path and name

file_stream = fopen(filename,"r");  

count = fread(buffer, sizeof(char),1000, file_stream)


This line of code will read 1000 bytes from the file_stream into a array pointed by buffer.

count will return number of bytes read.

If count is not equal to 1000 bytes this means either we reached EOF before 100o bytes or some error occured. To check what happened we can use feof() or ferror() functions described later.

4) fwrite() - This function writes unformatted data from a buffer to a stream. 

Declaration - size_t fwrite( void *buffer, size_t size, size_t count, FILE *ptr) 

Here, 

*buffer - Pointer to the buffer containing the data. 

size - size of each element to be written. 

count - number of bytes to store in buffer. 

ptr - pointer to the file to which data will be written to. 

Example -

FILE *fp;

char str[] = "My name is KBanyal"; // String which we want to write on file.
fp = fopen( "file.txt" , "w" ); // File is the pointer to file we need to write data to.
fwrite(str , 1 , sizeof(str) , fp );
 
Note - fwrite() will return the total number of elements successfully written. If number of records written is different than count then a error is thrown. ferror() function can be used to validate if any error occured while writting to the file.

5)fscanf() - This function reads formatted data from the stream. It reads bytes, interprets them according to a format and store the results in its arguments. This function returns number of items successfully read.If some specifier is specified to ignore some element while reading those elements are not included in the count.

Declaration - int fscanf(FILE *fp, const char *format, ...)

Here,

fp = pointer to a FILE object which is the stream to read from.

format = A sequence formed by an initial % sign indicates a format specifier. This is used to specify the type and format of the data to be retrieved from the stream and stored      into the locations pointed by additional arguments.

The prototype for fscanf() format specifier is something like this - %[*][width][length]specifier

Here specifier specifies which characters are extracted from the stream. For example following specifiers can be specified to extract corresponding element from the stream.

%d reads an integer

%f reads a float    

%lf reads a double             

%c reads a character, including white space. If more than 1 character needs to be read at a time specify the width.          

%s reads a string up to first white space

%[...] string, up to first character not in brackets

Example %[abk] will read 'kbanyal' as 'kba'.

%[0123456789] would read in digits

%[^...] string, up to first character in brackets

%[^\n] would read everything up to a newline

* is used to ignore particular elements in the stream.

Example %*d will ignore all the integers in input stream.

Examples:

fscanf(infile, "%d,%c", &x, &c); // read an int & char from file where int and  char are separated by a comma

fscanf(infile,"%s", array);  // read a string from file into array stops at white space

fscanf(infile, "%lf %24s", &d, array);  // read a double and a string upto 24 chars from infile

fscanf(infile, "%20[012345]",array);  // read a string of at most 20 chars consisting of only chars in set

fscanf(infile, "%ld %d%c", &x, &b, &c);  // read in two integer values store first in long, second in int read in end of line char into c

6) fprintf() - This function is used to send formatted output to a file stream. This function returns total number of characters printed. If error occurs it will return a negative number.

Declaration - int fprintf(FILE *fp, const char *fs, ...)

fp = Pointer to a file (actually stream)

fs = Formatted string we want to write to the file (actually stream)  

Return Value - On success - Total number of characters printed.

               On error - Negative number

fs or the format string can be formed using different format tags.

Prototype = %[flags][width][.precision][length]specifier

Most common specifiers are following:  

   %d = Displays and integer

   %f = Displays a floating-point number in fixed decimal format

   %e = Displays a floating-point number in exponential notation

   %s = Displays string of characters

   %u = Displays unsigned decimal integer

   %c = Displays a character     

   Since most of the format tags will never be used and there are lots I am just giving one or two examples for all the tags below.  

   Flags = flags can be used for various purposes. Main is to format data as per requirement. For example -

   1) - = is used to left justify the field.

     Example - fprintf(fp,"%-10d %c",143,'k');

     Output: 143        k

     7 blanks after 143.

   2) 0 = to pad field with zeros rather than blanks.

      Example - fprintf(fp,"%010d %c",143,'k')

      Output: 0000000143 k

      Right side blanks padded with 7 0’s.  

   Width = every data format is provided with minimum required width  to hold the same. Width format tag can be used to increase it further. 

    1) %[width]d - will increase the field width of given integer.

      Example - fprintf(fp,"%4d",7)

      Output:     7

      3 blanks to the right of 7 to increase its width.

    2) %[width]s - will increase the width of given string.

      Example - fprintf(fp,"%15s","kbanyal")

      Output:         kbanyal

      8 blank spaces before kbanyal to increase its width.

  

   Precision = This format tag takes different meanings for different format types.  

     1) %[total].[decimal]f - Here total field length will be [total] and [decimal] of these will hold the decimal part for a float value.

   Example - fprintf(fp,"%10.3f",546666.7)

   Output: 546666.700

     2) %[minimum].[maximum]s - Here minimum width is [minimum] and maximum width is [maximum] for a string value. If string is more than [maximum] value it will be cropped to [maximum] value.

   Example - fprintf(fp,"%3.4s","kbanyal");

   Output: kban  

   Length = length modifier is used to let fprintf know that we want to print very big or very small variables. For example -  

   1) short int - Modifier to be used in this case is 'h'. Short int is always takes less or equal bits as int. that is short int <= int <= long.

   Example - fprintf(fp,"%hd",1)

   Output: 1

   2) long double - Modifier to be used in this case is 'L".

   Example - fprintf(fp,"%Lf",10.000000001223);

   Output: 10.000000

7) fseek() – This function sets the current position in a file to a new location.

When we perform read and write from or to a file, operating system keep track of our location in file using file pointer. At any time during read or write if we want to change our location to any other location in file we can use fseek() function for this.

Declaration - int fseek( FILE *ptr, long offset, int origin);

Here, 

*ptr = Pointer to the file 

offset = The offset within the file (in byte) 

Origin = The starting point. We can set it using following values: 

SEEK_SET – Beginning of the file 

SEEK_CUR – Current position of the file pointer. 

SEEK_END – End of file. 

Return value

On success – zero

On failure – negative number 

Examples:

1) fseek(fp, 100, SEEK_SET); // Move to 100th byte from start of file.

2) fseek(fp, 100, SEEK_CUR); // Move to 100th byte from current position.

3) fseek(fp, -100, SEEK_END); // Move to 100th byte before end of file.

4)ftell() – This function tells us the location in the file from where it will be read from or where it will be written to. Note – The location is relative to the beginning of the file. 

If we want to know where we are in the file at any particular time we can use this function to get that value. 

Declaration – long int ftell(FILE *fp) 

Return value

On success – current offset relative to beginning of file.

On failure – negative number 

Can be used are follows: 

long position;

FILE *fp;

fp = fopen(“xxxx”,”r”);

position = ftell(fp); 

9)rewind() – This function re positions the file pointer to the start of the file. 

Declaration – void rewind(FILE *fp)

 

 

 

 

 

 

 

 

No comments:

Post a Comment