Lec 11 - Strcut & Standard I/O
Last updated
Last updated
Slides:
Just like a variable, a struct
has a scope, and it follows the same rule as the scope of a variable (i.e., it is valid within the block it is declared in). Unlike declaration of a variable, it is common to declare a struct
in the global scope, i.e., outside of any function, so that it is usable within the whole program.
Use the struct we have declared just now, here's an example to show the initialization of a struct.
We can assign one structure variable to another
This assignment statement above is equivalent to assigning each member of the struct
individually.
Using typedef
on struct
frees us from typing the word struct
every time. We can do so with either:
or
In either case, we can just use module
like any other type:
printf()
takes in a variable number of arguments. For the first argument, it should be a format string containing one or more format specifiers, like %s
. And the general format for the format modifier is:
The specifier controls the interpretation of the argument. s
for string, c
for character, d
for integer (base 10), f
for floating-point number, p
for pointer (base 16). We can additionally prepend this with length modifier. ld
for long
integer, lld
for long long
, and lf
for double
.
To format the output, we can prepend it with a number to indicate its field width, or minimum space used when printing. E.g., %3d
will pad the number printed with space if the number printed is less than 3 digits. Adding a flag 0 in front, %03d
, will pad the number with 0s if the number printed is less than 3 digits. For floating-point numbers, we can additionally control the precision, or the number of digits printed after the decimal point. e.g. %3.4lf
will print a double to four decimal points. The first 3 indicates that if the whole floating point number (integer + floating parts + 1 for the .
) is less than length 3, white spaces will be padded at front. Otherwise, nothing will be padded.
Some examples:
Since printf
expects a variable number of arguments, you can pass it fewer arguments than expected and the code would still compile (with warnings). If you push ahead and run it anyway, printf
will start to fetch arguments from the stack, pretending that it is there, causing weird things to happen.
We should also never do this:
The reason is that we have no control over what the user would type as input: the user may type "%s" into the standard input, so the variable str
now points to %s
, which printf
treats as a format modifier, and output the content of the stack! This is a huge security risk and is known as the externally-controlled format string vulnerability.
We should always print a string using:
Rule of Thumb: Don't use scanf()
. (Unless, you know exactly what you do.)
Rule 1: scanf()
is not for reading input, it's for parsing input.
Like printf
, scanf
takes in one or more arguments, with the first argument being a format string containing one or more format specifiers. The format specifier for scanf
is simpler and has the following pattern:
For instance, to read an integer, a floating-point number, and a string of at most 10 characters,
scanf
scans the standard input, and tries to match it to the format specified. The space in between the format specifier matches zero or more white spaces (space, tab, newline). Scanning stops when an input character does not match such a format character or when an input conversion fails.
Return values of scanf()
: On success, scanf()
will return the number of input items successfully matched and assigned; this can be fewer than provided for, or even zero, in the event of an early matching failure.
Adding a *
to the format modifier means that scanf
should consume the inputs but not store them in any variables. This, combined with %[
]
is useful to clear any remaining data from the standard input. Its usage is shown as follows:
Since scanf
expects the caller to pass in pointers to variables for it to store the results, we need to be careful about what we pass in. It is easy to pass in something like this:
The compiler would not warn us since the type matches perfectly. The program may crash since the pointer is not pointing to a valid memory location accessible by the program.
When we use scanf
to read a string, it keeps reading until it reaches space, and stores everything that it reads into an array. The problem here is that we do not know when it will stop reading, and therefore how big is the array that we need to allocate for the input!
As %s
is for strings, this should work with any input:
Well, now we have a buffer overflow. You might get Segmentation fault
on a Linux system, any other kind of crash, maybe even a "correctly" working program, because, once again, the program has undefined behavior.
Buffer overflows in C
A buffer overflow is a specific kind of undefined behavior resulting from a program that tries to write more data to an (array) variable than this variable can hold. Although this is undefined, in practice it will result in overwriting some other data (that happens to be placed after the overflowed buffer in memory) and this can easily crash the program.
One particularly dangerous result of a buffer overflow is overwriting the return address of a function. The return address is used when a function exits, to jump back to the calling function. Being able to overwrite this address ultimately means that a person with enough knowledge about the system can cause the running program to execute any other code supplied as input. This problem has led to many security vulnerabilities; imagine you can make for example a webserver written in C
execute your own code by submitting a specially tailored request...
So, here's the next rule:
Rule 2: scanf()
can be dangerous when used carelessly. Always use field widths with conversions that parse to a string (like %s
).
The field width is a number preceeding the conversion specifier. It causes scanf()
to consider a maximum number of characters from the input when parsing for this conversion. Let's demonstrate it in a fixed program:
We also increased the buffer size, because there might be really long names.
There's an important thing to notice: Although our name
has room for 40 characters, we instruct scanf()
not to read more than 39. This is because a string in C
always needs a 0
byte appended to mark the end. When scanf()
is finished parsing into a string, it appends this byte automatically, and there must be space left for it.
So, this program is now safe from buffer overflows. Let's try something different:
Well, that's ... outspoken. What happens here? Reading some scanf()
manual, we would find that %s
parses a word, not a string, for example I found the following wording:
s: Matches a sequence of non-white-space characters
A white-space in C
is one of space, tab (\t
) or newline (\n
).
Rule 3: Although scanf()
format strings can look quite similar to printf()
format strings, they often have slightly different semantics. (Make sure to read the fine manual)
The general problem with parsing "a string" from an input stream is: Where does this string end? With %s
, the answer is at the next white-space. If you want something different, you can use %[
:
%[a-z]
: parse as long as the input characters are in the range a
- z
.
%[ny]
: parse as long as the input characters are y
or n
.
%[^.]
: The ^
negates the list, so this means parse as long as there is no .
in the input.
We could change the program, so anything until a newline will be parsed into our string:
It might get a bit frustrating, but this is again a program with possible undefined behavior, see what happens when we just press Enter
:
Here's another sentence from a scanf()
manual, from the section describing the [
conversion:
The usual skip of leading white space is suppressed.
With many conversions, scanf()
automatically skips whitespace characters in the input, but with some, it doesn't. Here, our newline from just pressing enter isn't skipped, and it doesn't match for our conversion that explicitly excludes newlines. The result is: scanf()
doesn't parse anything, our name
remains uninitialized.
One way around this is to tell scanf()
to skip whitespace: If the format string contains any whitespace, it matches any number of whitespace characters in the input, including no whitespace at all. Let's use this to skip whitespace the user might enter before entering his name:
Yes, this program works and doesn't have any ), but I guess you don't like very much that nothing at all happens when you just press enter, because scanf()
is skipping it and continues to wait for input that can be matched.
Rule 4: scanf()
is a very powerful function. (and with great power comes great responsibility ...)
A lot of parsing work can be done with scanf()
in a very concise way, which can be very nice, but it also has many pitfalls and there are tasks (such as reading a line of input) that are much simpler to accomplish with a simpler function. Make sure you understand the rules presented here, and if in doubt, read the scanf()
manual precisely.
That being said, here's an example on how to read a number with retries using scanf()
:
It's not as nice as the version using strtol()
, because there is no way to tell scanf()
not to skip whitespace for %d
-- so if you just hit Enter
, it will still wait for your input -- but it works and it's a really short program.
There are several functions in C
for reading input. Let's have a look at one that's probably most useful to you: fgets()
.
fgets()
does a simple thing, it reads up to a given maximum number of characters, but stops at a newline, which is read as well. In other words: It reads a line of input.
This is the function signature:
There are two very nice things about this function for what we want to do:
The parameter for the maximum length accounts for the necessary 0
byte, so we can just pass the size of our variable.
The return value is either a pointer to str
or NULL
if, for any reason, nothing was read.
So let's rewrite this program again:
I assure you this is safe, but it has a little flaw:
Of course, this is because fgets()
also reads the newline character itself. But the fix is simple as well: We use strcspn()
to get the index of the newline character if there is one and overwrite it with 0
. strcspn()
is declared in string.h
, so we need a new #include
:
size_t strcspn(const char *s, const char *reject);
The strcspn() function calculates the length of the initial segment of s
which consists entirely of bytes not in reject
.
Let's test it:
At last, I really want to thanks this amazing wesbite, it's legit awesome and useful!