Broad Network


String Literal in the C Computer Language

Part 17 of Complete C Course

Foreword: String Literal in the C Computer Language

By: Chrysanthus Date Published: 4 Jun 2024

Introduction

A string is a continuous sequence of char objects in memory, with an extra code. Note: a space typed by the spacebar key of the keyboard, does not create a gap in a string. It puts in a char with a char value. This is just like other char values, but instead of being displayed as a character on screen, it is displayed as space. This means that, so far as the computer is concerned, a space is a character (char). C has object types for int, bool, float, char and void. C does not have any object type for string. So, a way had to be worked out to store and retrieve strings from memory.

Array of Characters
An example of a string is “the man”. There is an object type for characters, which is char. To store a string in memory, chars (characters) need to be stored, that represent the string as consecutive objects in memory. A good way to do this is to have the chars in an array. Elements (objects) of an array are stored consecutively. So this is the beginning of the solution to have a string. Consider the following string:

    "the man"

This string can be stored in an array as follows:

        char txt[] = {'t','h','e',' ','m','a','n'};

When an array is initialized like this, all the objects of the array are stored in memory consecutively. Note that each character in the string is now an object of type, char, in the array. Also note that the space between the words “the” and “man” is also stored in the array in an object, as ‘ ’. Remember that in the initialization of an array, all the array elements are separated by commas.

In order to print (retrieve) the elements in the array, so that they appear as a string that can be typed (characters in a group), the characters would have to be printed one by one. The following code illustrates this:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char txt[] = {'t','h','e',' ','m','a','n'};
                        
        printf("%c", txt[0]); printf("%c", txt[1]); printf("%c", txt[2]); printf("%c", txt[3]); printf("%c", txt[4]); printf("%c", txt[5]); printf("%c", txt[6]);

        printf("\n");

        return 0;
    }

The output display is “the man” in one line. The reader should read and test the code, if he/she has not already done so.

All that looks good, but that is not a convenient way of handling strings. Here, a phrase (string) has been handled character by character. This is not good; there should be a way of handling or referring to a phrase (string) using one identifier and not many identifiers (the array elements) as in this case. To achieve that, the inventors of C decided, that at the end of the array, the null character, \0, is added. With that, C should consider the set of characters in the array as a string, then one identifier (the array identifier) is used to identify (refer or handle) the string. The null character begins with a back slash, followed by zero, that is ‘\0’. In Western European computing, this is 0016, internally. The identifier that identifiers the resulting array, is the identifier for the string. Read and test the following program that illustrates this:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char str[] = {'t','h','e',' ','m','a','n','\0'};

        printf("%s\n", str);

        return 0;
    }

The output is "the man", without the quotes. Note that the first argument of the printf() function has %s for string and not %c for character. The second argument is the identifier of the null terminated character array, without the square brackets. In the code, the last element in the array is the null character. It is in single quotes like the rest of the characters. These are single quotes of the text editor and not single quotes of the word processor.

The array name which is the identifier of a constant pointer to the first element (object) of the array, still has the address of the first element (object) of the array. This first element of the array, and all the rest of the elements of the array, are characters; with the last character being the null character, ‘\0’.

Normally, a pointer should not return any value of the pointed object or pointed objects. In the above code, the printf() function has been design in such a way, that if it receives a pointer to an array of chars, ending with, \0, it should print all the characters in the array. Such a pointer still points to the first element of the array, but a function design (printf()) can use it to obtain all the characters in the array.

Still, coding a string by filling an array with elements (chars) and ending it with, '\0' is not convenient for the programmer. So the Inventors of C decided to replace the char array block that ends with the null character, with a string in double quotes.

The typed string in double quotes, assigned to an array declaration of chars, results in a constant pointer to the first element of the string. The compiler turns this into an array of chars, still ending with the null character, ‘\0’. This conversion is not seen by the null character. The array name is the string pointer.

The following program illustrates these ideas:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char herStr[] = "the woman";

        printf("%s\n", herStr);

        return 0;
    }

The output is, "the woman". The double quotes used here, are those of the text editor and not of a word processor. The reader should read and test the above code, if he/she has not already done so. Remember, the compiler will turn the content within the double quotes, into an array of characters ending with ‘\0’; without informing the programmer.

Note: By the design of C, the pointer to the string cannot be modified, but the characters can be modified, unless the declaration of the string is preceded by the reserved word, const.

Pointer to a String

Consider again the string definition,

    char str[] = {'t','h','e',' ','m','a','n','\0'};

str is a constant pointer and cannot be changed, as it is with all arrays. It points to the first character of the string. That is, it cannot be incremented or decremented. In order to have a pointer that would be incremented, with the intention of pointing it to a particular character in the string, a non-constant pointer has to be made, to also point to the first character of the string. This is done as follows:

    char *ptr = str;

just as it would be done with an array of any type. Here, it is an array of chars. In the previous section, it was an array of integers. The object, str has the address of the first character of the string. So it does not need to be preceded by &. The ordinary identifier does not have the (start) address of its object location. Read and test the following code, which increments a non-constant pointer to a string:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char str[] = {'t','h','e',' ','m','a','n','\0'};
        char *ptr = str;
        ptr++;
        ptr++;
        printf("%s\n", ptr);

        return 0;
    }

The output is "e man" and not just 'e', as might have been expected. The reason for having "e man", is because of the terminating ‘\0’ in the array, as well as the first argument of printf() being "%s\n" and not "%c\n". So all the characters from the third character to ‘\0’ are printed. This is how string behavior is designed in C. Remember that the second argument to the printf() function is the pointer, and not the dereferenced pointer (i.e. not the pointer preceded by the indirection operator, *). This second argument is also not a char. It is actually a pointer to a char. If the first argument is given %c, and the second argument is change to *ptr, then the character, 'e' would be printed. That is, if the printf() function were,

        printf("%c\n", *ptr);

the character 'e' would be printed. However, that is not string handling or string behavior. For string behavior, the printf() function should be “printf("%s\n", ptr);”.

Non Constant Pointer, Constant Pointer and Double Quoted String Literal

"the man" with double quotes is a string literal. In order to have a non-constant pointer for a string, just assign the string literal to a non-constant char pointer, as in:

        char *ptr = "the man";

without employing an array. The following program shows a string whose pointer to the first element is not constant:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char *ptr = "the man";
        ptr++;
        ptr++;
        printf("%s\n", ptr);

        return 0;
    }

This program is the same as the last one above, except that in the declaration of the string, the pointer above is constant, and the right operand (of =) is an array. The pointer is incremented two times, and with the use of “printf("%s\n", ptr);”, the output is “e man”. That is, all the characters from the third character to ‘\0’ , are printed. That is string behavior: to print from the first or any character pointed to, up to ‘\0’, but excluding ‘\0’.  

In order to use a string literal (character sequence in double quotes) and make the pointer to the first character constant, do

    char * const str = "the man";

which is the same as,

    char str[] = {'t','h','e',' ','m','a','n','\0'};

which is the same as,

    char str[] = "the man";

Note the position and use of the reserved word, “const” in the first statement. Under this condition, str cannot be incremented, and “printf("%s\n", ptr);” would print, the complete string, “the man”. In fact, this is the preferred way to create a string, instead of using the array. However, remember that the compiler will always change the string literal (double quoted sequence of characters) to a null (‘\0’) terminated array, whose characters can be accessed, using the constant pointer and the [] sub-scripting operator.

When a non-constant string pointer (or any non-constant array pointer) is increased two times, it points to the third character (element). In the case of a constant pointer, instead of incrementing the constant pointer (which is not possible), use the subscript operator [2], with the index 2. Remember, index counting begins from zero. So the third position is index 2. The following program illustrates this:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char * const ptr = "the man";
        printf("%c\n", ptr[2]);

        return 0;
    }

With the use of “printf("%c\n", ptr[2]);”, having %c in the first argument and having the second argument as ptr[2], the output is ‘e’.

Constant Content

The above statement “char * const ptr = "the man";”, makes the pointer, ptr constant, and not the content (characters) of the string constant. In order to make the content constant, and the pointer non-constant, do,

    const char * ptr = "the man";

changing the position of const. Read and test the following program:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        const char * ptr = "the man";
        ptr++; ptr++; ptr++;ptr++;
        //*ptr = 'p';

        printf("%s\n", ptr);

        return 0;
    }

The output is “man”, indicating that the pointer was not constant (it was successfully incremented). Now, if the comment symbol, // is removed from the statement it comments, the program will not compile, issuing the error message, that an assignment has been wrongly made to a read-only location.

Constant Pointer and Constant Content for String
To make both the pointer to a string and content of the string constant, do

    conts char *const str = "the string";

with const in both places. This statement is the same as:

        const char str[] = "the string";

By the definition of an array, the pointer is already constant, but the content is not constant. The preceding const makes the content constant as well.

Issue of Editing a String in C

In C, the two ways of defining a string are exemplified as follows:

        char str1[] = "the man";
        char *const str2 = "the pap";

Note that the second string identifier does not have the square brackets. In both cases, the pointer to the first character of the string is constant, while the content (each character) of each string is not constant (can be changed). In the first case, a character in the string is changed using the subscript, [] operator with the correct index. The following program illustrates this:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char str[] = "the man";

        str[4] = 'p';

        printf("%s\n", str);

        return 0;
    }

The output is, “the pan”, including the change. Without using the subscript operator [], the content of a string can be modified, character-by-character, by a pointer, if the pointer is not the identifier of the string, and if the pointer is a new non-constant pointer, that still points to the first character of the string. The following program illustrates this:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char str[] = "the man";
        
        char *ptr = str; ptr++; ptr++; ptr++; ptr++;

        *ptr = 'p';

        printf("%s\n", str);

        return 0;
    }

The output is, “the pan” with change. Note that the printf() function call, used str as its second argument, and not ptr. If it had used ptr, then only the right portion of the string would have been displayed. Again, to prevent the content of a string from being modified, do

        const char str1[] = "the man";

or

        const char *const str2 = "the pap";

In either statement, both the string pointer and each character, is constant. Despite these, a new independent non-constant pointer can still be used to modify the strings. This is because a string is just consecutive characters in memory, with the last character being ‘\0’. The pointer defined with the string is made constant, but the new independent non-constant pointer is not constant, and can be incremented, or decremented as many times as possible, to cross object boundaries in memory.

To conclude this sub-section, define a string for editing as follows:

        char str[] = "the man";

then use the identifier and subscript, to replace any character.

Issue with Length of String

The size (length) of an array is made fixed when declaration is complete. A string is an array. So, once defined, the size cannot be increased or decreased. However, the size can indirectly be decreased for a string, by inserting ‘\0’ in front of the first ‘\0’. The printf() function identifies a string by moving from the first character of the string, until it sees ‘\0’. Consider the following declaration:

    char str[] = "the woman";

In this case, index 9 would have the ‘\0’ character. The string can now be changed to “the man” as follows:

    str[4] = 'm'; str[5] = 'a'; str[6] = 'n'; str[7] = '\0';

Index 8 still has ‘n’ and index 9 would have ‘\0’. The characters at index 8 and 9 are ignored since index 7 now has ‘\0’, and a string ends at ‘\0’. So, printf("%s\n", str); will output “the man”.

Coding Very Long Strings
It is possible to have a string that is very long, and coding it would mean, it has to take more than one line. It is coded as illustrated in the following example. Read and test the program.

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char *const longStrPtr = "This is a very long string "
                                 "that takes more than one line "
                                 "to type in the source code.";
        printf("%s\n", longStrPtr);

        return 0;
    }

Each part of the string that is in a line is in double quotes. Only the last part of the string is followed by the semicolon. The parts of the whole string before the last part are not followed by semicolons. That is how strings are concatenated in C. The output is:

    “This is a very long string that takes more than one line to type in the source code.”

The different parts of the string can be typed without pressing the Enter key on the keyboard, unlike in the above case where the Enter key was pressed twice. Read and test the following program:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char *const longStrPtr = "This is a very long string " "that takes more than one line " "to type in the source code.";
        printf("%s\n", longStrPtr);

        return 0;
    }

The output is the same.

Array of Strings
An array of strings, consists of pointers to pointers. Each value (element) of the array is a pointer to a const pointer, that points to the first character of each string. The array is declared as:

    char * arr[] = {pointer1, pointer2, pointer3, etc. };

or

    char * arr[noOfElements];

and then the pointers are assigned to the array one-by-one, just as string literals.

The strings are defined in the normal way. The following program illustrates this for the first approach:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char *const one = "the first";
        char *const two = "the second";
        char *const three = "a third";

        char *myStrings[] = {one, two, three};  //the block has const pointers
        
        printf("%s\n", myStrings[0]);
        printf("%s\n", myStrings[1]);
        printf("%s\n", myStrings[2]);

        return 0;
    }

The output is:

    the first
    the second
    a third

The following program, does the illustration for the second approach:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char *myStrings[3];
        
        myStrings[0] = "the first";
        myStrings[1] = "the second";
        myStrings[2] = "a third";
        
        printf("%s\n", myStrings[0]);
        printf("%s\n", myStrings[1]);
        printf("%s\n", myStrings[2]);

        return 0;
    }

A string literal in double quotes, returns the address of the first character of the string. The output is the same as before. Instead of assigning string literals to the array subscripts, the constant pointers of the strings themselves, can be assigned, as follows:

    #include <stdio.h>

    int main(int argc, char *argv[])
    {
        char *myStrings[3];
        
        char *const one = "the first";
        char *const two = "the second";
        char *const three = "a third";

        myStrings[0] = one;
        myStrings[1] = two;
        myStrings[2] = three;
        
        printf("%s\n", myStrings[0]);
        printf("%s\n", myStrings[1]);
        printf("%s\n", myStrings[2]);

        return 0;
    }

The output is the same as before. Note that the number of elements in an array is one more than the highest index.

Before leaving this section of the chapter, remember that a string is a sequence of characters in memory, ending with the null character, and the first character is pointed to, by a constant pointer.

Converting Strings to Numbers
There are predefined functions in the stdlib.h header, to convert a string to a particular number. The following are syntaxes for converting a string to a number type:

int atoi(const char *nptr);                     // string to integer
long int atol(const char *nptr);             // string to long integer
long long int atoll(const char *nptr);    // string to long long integer
double atof(const char *nptr);               // string to double



Related Links

More Related Links

Cousins

BACK NEXT

Comments