CS1010 Notes
  • Welcome
  • Lec/Tut/Lab/Exes
    • Lecture
      • Lec 01 - Computational Problem Solving
      • Lec 02 - Functions and Types
      • Lec 03 - Basic C Programming
      • Lec 04 - Conditionals
      • Lec 05 - Loops
      • Lec 06 - Call Stacks, Arrays
        • Diagnostic Quiz
      • Lec 07 - Pointers, Memory management
        • Diagnostic Quiz
      • Lec 08 - Multi-d Array, Efficiency
        • Diagnostic Quiz
      • Lec 09 - Searching and Sorting
        • Diagnostic Quiz
      • Lec 10 - More Recursion
        • Diagnostic Quiz
      • Lec 11 - Strcut & Standard I/O
        • Diagnostic Quiz
      • Lec 12 - Recap
    • Tutorial
      • Tut 01 - Computational Problem-Solving
      • Tut 02 - Functions and Conditionals
      • Tut 03 - More on Conditionals
      • Tut 04 - Loops
      • Tut 08 - Searching and Sorting
    • Lab
      • Lab 01 - Unix/Vim Setup
      • Lab 02 - Debugging
      • Lab 03 - Assert
      • Lab 04 - Test Cases
      • Lab 05 - Arrays
      • Lab 06 - Memory Errors
      • Lab 07 - Compiling with Clang
      • Lab 08 - C Preprocessor
      • Lab 09 - Backtracking
      • Lab 10 - Struct and Wrap up
    • Exercises
      • Exercise 3 - Fixed-Length Arrays
      • Exercise 4 - Dynamic Arrays and Strings
      • Exercise 6 - Searching and Sorting
      • Exercise 7 - More Recursion
      • Exercise 8 - Struct
  • Past Year Exam
    • Midterm PE
      • PE1 (AY18/19)
      • PE1 (AY20/21)
      • PE1 (AY21/22)
      • PE0 (AY22/23)
      • PE0 (AY23/24)
    • Midterm Paper
      • Midterm (AY18/19)
      • Midterm (AY20/21)
      • Midterm (AY21/22)
      • Midterm (AY22/23)
    • PE1 Review
      • PE1 (AY23/24)
    • PE2 Review
      • PE2 (AY18/19)
      • PE2 (AY20/21)
      • PE2 (AY21/22)
      • PE2 (AY22/23)
      • PE2 (AY23/24)
    • Final Paper
      • Final (AY18/19)
      • Final (AY20/21)
      • Final (AY21/22)
      • Final (AY22/23)
      • Final (AY23/24)
  • Current Year Exam
    • PE0 (AY24/25)
    • PE1 (AY24/25)
    • PE2 (AY24/25)
    • Final (AY24/25)
  • Toolbox
    • Vim & Unix
    • GDB
  • After CS1010
Powered by GitBook
On this page
  • Structure
  • Declaration
  • Initialization
  • Assigning a Structure Variable
  • Declare Struct as a Type
  • Standard I/O
  • printf
  • scanf
  • fgets
  • Credits
Edit on GitHub
  1. Lec/Tut/Lab/Exes
  2. Lecture

Lec 11 - Strcut & Standard I/O

PreviousDiagnostic QuizNextDiagnostic Quiz

Last updated 6 months ago

Slides:

Structure

Declaration

Just like a variable, a struct has a scope, and it follows the same rule as the scope of a variable (i.e., it is valid within the block it is declared in). Unlike declaration of a variable, it is common to declare a struct in the global scope, i.e., outside of any function, so that it is usable within the whole program.

struct module {
  char *code;
  char *title;
  long mc;
}

Initialization

Use the struct we have declared just now, here's an example to show the initialization of a struct.

struct module cs1010;
cs1010.code = "CS1010";
cs1010.title = "Programming Methodology";
cs1010.mc = 4;

Assigning a Structure Variable

We can assign one structure variable to another

struct module cs1010s = cs1010;

This assignment statement above is equivalent to assigning each member of the struct individually.

Declare Struct as a Type

Using typedef on struct frees us from typing the word struct every time. We can do so with either:

typedef struct module {
  char *code;
  char *title;
  long mc;
} module;

or

typedef struct {
  char *code;
  char *title;
  long mc;
} module;

In either case, we can just use module like any other type:

module cs1010e;

Standard I/O

printf

printf() takes in a variable number of arguments. For the first argument, it should be a format string containing one or more format specifiers, like %s. And the general format for the format modifier is:

%[flags][field_width][.precision][length_modifier]specifier

The specifier controls the interpretation of the argument. s for string, c for character, d for integer (base 10), f for floating-point number, p for pointer (base 16). We can additionally prepend this with length modifier. ld for long integer, lld for long long, and lf for double.

To format the output, we can prepend it with a number to indicate its field width, or minimum space used when printing. E.g., %3d will pad the number printed with space if the number printed is less than 3 digits. Adding a flag 0 in front, %03d, will pad the number with 0s if the number printed is less than 3 digits. For floating-point numbers, we can additionally control the precision, or the number of digits printed after the decimal point. e.g. %3.4lf will print a double to four decimal points. The first 3 indicates that if the whole floating point number (integer + floating parts + 1 for the .) is less than length 3, white spaces will be padded at front. Otherwise, nothing will be padded.

Some examples:

printf("%10.4lf\n", 10.0);
//   10.0000
printf("%3.4lf\n", 10.0);
//10.0000
printf("%3d\n", 10);
// 10
printf("%3d\n",10000);
//10000

Mismatch Number of Arguments

Since printf expects a variable number of arguments, you can pass it fewer arguments than expected and the code would still compile (with warnings). If you push ahead and run it anyway, printf will start to fetch arguments from the stack, pretending that it is there, causing weird things to happen.

Printing User Input

We should also never do this:

char *str = cs1010_read_word();
printf(str);

We should always print a string using:

printf("%s", str);

scanf

Rule of Thumb: Don't use scanf(). (Unless, you know exactly what you do.)

Rule 1: scanf() is not for reading input, it's for parsing input.

Like printf, scanf takes in one or more arguments, with the first argument being a format string containing one or more format specifiers. The format specifier for scanf is simpler and has the following pattern:

%[*][field_width][length_modifier]specifier

For instance, to read an integer, a floating-point number, and a string of at most 10 characters,

long l;
double d;
char s[11];
scanf("%ld %lf %10s", &l, &d, s);

scanf scans the standard input, and tries to match it to the format specified. The space in between the format specifier matches zero or more white spaces (space, tab, newline). Scanning stops when an input character does not match such a format character or when an input conversion fails.

Return values of scanf(): On success, scanf() will return the number of input items successfully matched and assigned; this can be fewer than provided for, or even zero, in the event of an early matching failure.

Adding a * to the format modifier means that scanf should consume the inputs but not store them in any variables. This, combined with %[ ] is useful to clear any remaining data from the standard input. Its usage is shown as follows:

long a;
long result = scanf("%ld", &a);
if (result == 1) {
  printf("%ld", a);
} else {
  scanf("%*[^\n]");
}

Invalid Pointers

Since scanf expects the caller to pass in pointers to variables for it to store the results, we need to be careful about what we pass in. It is easy to pass in something like this:

long *a;
scanf("%ld", a);

The compiler would not warn us since the type matches perfectly. The program may crash since the pointer is not pointing to a valid memory location accessible by the program.

Buffer Overflow

When we use scanf to read a string, it keeps reading until it reaches space, and stores everything that it reads into an array. The problem here is that we do not know when it will stop reading, and therefore how big is the array that we need to allocate for the input!

char name[10];
printf("What's your name?", name);
scanf("%s", name);
printf("Hello %s!\n", name);

As %s is for strings, this should work with any input:

$ ./example3
What's your name? Paul
Hello Paul!

$ ./example3
What's your name? Christopher-Joseph-Montgomery
Segmentation fault

Well, now we have a buffer overflow. You might get Segmentation fault on a Linux system, any other kind of crash, maybe even a "correctly" working program, because, once again, the program has undefined behavior.

Buffer overflows in C

A buffer overflow is a specific kind of undefined behavior resulting from a program that tries to write more data to an (array) variable than this variable can hold. Although this is undefined, in practice it will result in overwriting some other data (that happens to be placed after the overflowed buffer in memory) and this can easily crash the program.

One particularly dangerous result of a buffer overflow is overwriting the return address of a function. The return address is used when a function exits, to jump back to the calling function. Being able to overwrite this address ultimately means that a person with enough knowledge about the system can cause the running program to execute any other code supplied as input. This problem has led to many security vulnerabilities; imagine you can make for example a webserver written in C execute your own code by submitting a specially tailored request...

So, here's the next rule:

Rule 2: scanf() can be dangerous when used carelessly. Always use field widths with conversions that parse to a string (like %s).

The field width is a number preceeding the conversion specifier. It causes scanf() to consider a maximum number of characters from the input when parsing for this conversion. Let's demonstrate it in a fixed program:

#include <stdio.h>

int main(void)
{
    char name[40];
    printf("What's your name? ");
    scanf("%39s", name);
    printf("Hello %s!\n", name);
}

We also increased the buffer size, because there might be really long names.

There's an important thing to notice: Although our name has room for 40 characters, we instruct scanf() not to read more than 39. This is because a string in C always needs a 0 byte appended to mark the end. When scanf() is finished parsing into a string, it appends this byte automatically, and there must be space left for it.

So, this program is now safe from buffer overflows. Let's try something different:

$ ./example4
What's your name? Martin Brown
Hello Martin!

Well, that's ... outspoken. What happens here? Reading some scanf() manual, we would find that %s parses a word, not a string, for example I found the following wording:

s: Matches a sequence of non-white-space characters

A white-space in C is one of space, tab (\t) or newline (\n).

Rule 3: Although scanf() format strings can look quite similar to printf() format strings, they often have slightly different semantics. (Make sure to read the fine manual)

The general problem with parsing "a string" from an input stream is: Where does this string end? With %s, the answer is at the next white-space. If you want something different, you can use %[:

  • %[a-z]: parse as long as the input characters are in the range a - z.

  • %[ny]: parse as long as the input characters are y or n.

  • %[^.]: The ^ negates the list, so this means parse as long as there is no . in the input.

We could change the program, so anything until a newline will be parsed into our string:

#include <stdio.h>

int main(void)
{
    char name[40];
    printf("What's your name? ");
    scanf("%39[^\n]", name);
    printf("Hello %s!\n", name);
}

It might get a bit frustrating, but this is again a program with possible undefined behavior, see what happens when we just press Enter:

$ ./example5
What's your name? 
Hello ÿ¦e!

Here's another sentence from a scanf() manual, from the section describing the [ conversion:

The usual skip of leading white space is suppressed.

With many conversions, scanf() automatically skips whitespace characters in the input, but with some, it doesn't. Here, our newline from just pressing enter isn't skipped, and it doesn't match for our conversion that explicitly excludes newlines. The result is: scanf() doesn't parse anything, our name remains uninitialized.

One way around this is to tell scanf() to skip whitespace: If the format string contains any whitespace, it matches any number of whitespace characters in the input, including no whitespace at all. Let's use this to skip whitespace the user might enter before entering his name:

#include <stdio.h>

int main(void)
{
    char name[40];
    printf("What's your name? ");
    scanf(" %39[^\n]", name);
    //     ^ note the space here, matching any whitespace
    printf("Hello %s!\n", name);
}

Yes, this program works and doesn't have any ), but I guess you don't like very much that nothing at all happens when you just press enter, because scanf() is skipping it and continues to wait for input that can be matched.

Read a number with scanf

Rule 4: scanf() is a very powerful function. (and with great power comes great responsibility ...)

A lot of parsing work can be done with scanf() in a very concise way, which can be very nice, but it also has many pitfalls and there are tasks (such as reading a line of input) that are much simpler to accomplish with a simpler function. Make sure you understand the rules presented here, and if in doubt, read the scanf() manual precisely.

That being said, here's an example on how to read a number with retries using scanf():

#include <stdio.h>

int main(void)
{
    int a;
    int rc;
    printf("enter a number: ");
    while ((rc = scanf("%d", &a)) == 0)  // Neither success (1) nor EOF
    {
        // clear what is left, the * means only match and discard:
        scanf("%*[^\n]");
        // input was not a number, ask again:
        printf("enter a number: ");
    }
    if (rc == EOF)
    {
        printf("Nothing more to read - and no number found\n");
    }
    else
    {
        printf("You entered %d.\n", a);
    }
}

fgets

There are several functions in C for reading input. Let's have a look at one that's probably most useful to you: fgets().

fgets() does a simple thing, it reads up to a given maximum number of characters, but stops at a newline, which is read as well. In other words: It reads a line of input.

This is the function signature:

char *fgets(char *str, int n, FILE *stream)

There are two very nice things about this function for what we want to do:

  • The parameter for the maximum length accounts for the necessary 0 byte, so we can just pass the size of our variable.

  • The return value is either a pointer to str or NULL if, for any reason, nothing was read.

So let's rewrite this program again:

#include <stdio.h>

int main(void)
{
    char name[40];
    printf("What's your name? ");
    if (fgets(name, 40, stdin))
    {
        printf("Hello %s!\n", name);
    }
}

I assure you this is safe, but it has a little flaw:

$ ./example7
What's your name? Bob
Hello Bob
!

Of course, this is because fgets() also reads the newline character itself. But the fix is simple as well: We use strcspn() to get the index of the newline character if there is one and overwrite it with 0. strcspn() is declared in string.h, so we need a new #include:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char name[40];
    printf("What's your name? ");
    if (fgets(name, 40, stdin))
    {
        name[strcspn(name, "\n")] = 0;
        printf("Hello %s!\n", name);
    }
}

size_t strcspn(const char *s, const char *reject);

The strcspn() function calculates the length of the initial segment of s which consists entirely of bytes not in reject.

Let's test it:

$ ./example8
What's your name? Bob Belcher
Hello Bob Belcher!

Credits

At last, I really want to thanks this amazing wesbite, it's legit awesome and useful!

The reason is that we have no control over what the user would type as input: the user may type "%s" into the standard input, so the variable str now points to %s, which printf treats as a format modifier, and output the content of the stack! This is a huge security risk and is known as the vulnerability.

It's not as nice as the version strtol(), because there is no way to tell scanf() not to skip whitespace for %d -- so if you just hit Enter, it will still wait for your input -- but it works and it's a really short program.

externally-controlled format string
using
A beginners' guide away from scanf()
CS1010 AY24/25 S1 Lecture 11
Lecture Slides