Lec 01 - Compiler, Types, Classes, Objects

Slides:

Program and Compiler

Compiled vs. Intepreted Programs

  • Compiler: The compiler reads in the entire program written in a higher-level programming language and translates it into machine code. The machine code is then saved into an executable file, which can be executed later. e.g., C/C++

  • Interpreter: The interpreter reads in the program one statement at a time, interprets what the statement means, and executes its directly. e.g., Python, JavaScript

Java programs, on the other hand, can be executed in two ways:

  1. The Java program can first be compiled into bytecode. During execution, the bytecode is interpreted and compiled on-the-fly by the Java Virtual Machine (JVM) into machine code. See Compiling and Running Java Programs

  2. The Java program can be interpreted by the Java interpreter. See Interpreting a Java Program

Compiling and Running Java Programs

Suppose we have a Java program called Hello.java.

1

Compile the Java Program into bytecode

To compile the program, we type

into the command line. javac is the Java compiler. This step will either lead to the bytecode called Hello.class being created or spew out some errors. In this step, the Hello.java program is compiled from Java to the JVM language (bytecode).

2

Interpret/Execute the compiled bytecode

Assuming that there is no error in compilation, we can now run

to invoke the JVM java and execute the bytecode contained in Hello.class.

JVM is an interpreter.

Interpreting a Java Program

Java (version 8 or later) comes with an interpreter called jshell that can read in Java statements, evaluate them, and print the results. Its usage is as follows

Files intended to be run on jshell typically uses .jsh extension while files intended to be compiled and run uses .java extension. However, this difference is merely a convention. You can still interpret .java program on jshell.

Compiler

The compiler does more than just translating source code into machine code or bytecode. The compiler also needs to parse the source code written and check if it follows the precise specification of the programming language (called grammar) used, and produces a syntax error if the grammar is violated. It therefore can detect any syntax error before the program is run. This kind of error is called the Compilation Error.

For the difference between Compilation Error and Runtime Error, please see my CS1010 Notes here.

Workflow

A typical workflow in a compiled language is the edit, compile, execute, loop. This is shown as follows

Variable and Type

Data Abstraction: Type

A variable is an abstraction that allows us to give a user-friendly name to a piece of data in memory. We use the variable name whenever we want to access the value in that location, and a pointer to the variable or reference to the variable whenever we wish to refer to the address of the location.

Type

As the program gets more complex, our variables might be an abstraction over different types of data: some variables might refer to a number, some to a string, some to a list of numbers, etc. Not all operations are meaningful over all types of data.

To help mitigate the complexity, we can assign a type to a variable. The type communicates:

  1. to the readers what data type the variable is an abstraction over,

  2. and to the compiler/interpreter what operations are valid on this variable and how the operation behaves.

Dynamic vs. Static Type

In dynamically typed programming languages, like Python, JavsScript, the type is associated with the values, and the type of the variable changes depending on the value it holds. For example, we can do the following:

However, in statically-typed language, like Java, we need to declare every variable we use in the program and specify its type. Once a variable is declared with a particular type, the type of the variable cannot be changed. In other words, the variable can only hold values of that declared type.

The type that a variable is assigned when we declare the variable is also known as the compile-time type. During the compilation, this is the only type that the compiler is aware of. The compiler will check if the compile-time type matches when it parses the variables, expressions, values, and function calls, and throw an error if there is a type mismatch. This type-checking step helps to catch errors in the code early.

An important distinction between dynamic and static type is where the type gets attached to. In static typing, the type is attached to the variable such that the variable can only store values of that particular type (or its subtype as you will see later). In fact, in Java, the type that is attached to a variable is the declared type (i,e., the type written in the variable declaration also commonly known as compile-time type).

On the other hand, in dynamically typed language, the type is attached to the value. In other words, a variable can store anything but we can know what the type is because the type can be queried from the value.

Strong Typing vs. Weak Typing

A type system of a programming language is a set of rules that governs how the types can interact with each other.

Generally, a strongly typed programming language enforces strict rules in its type system, to ensure type safety, i.e., to ensure that if there are any problems with the program, it is not due to the type. For instance, catching an attempt at multiplying two strings. One way to ensure type safety is to catch type errors during compile time rather than leaving it to run time.

This concept of type safety is very very important in Java! Catch error during compile time instead of run time!

On the other hand, a weakly typed (or loosely typed) programming language is more permissive in terms of typing checking. C is an example of a static, weakly typed language. In C, the following is possible:

In contrast, if we try the following in Java:

we will get the following compile-time error message:

because the compiler enforces a stricter rule and allows typecasting only if it makes sense. More specifically, we will get a compilation error if the compiler can determine with certainty that such conversion can never happen successfully.

Type Checking with a compiler

In addition to checking for syntax errors, the compiler can check for type compatibility according to the compile-time type, to catch possible errors as early as possible. Such type-checking is made possible with static typing. Consider the following Python program:

Since Python does not allow adding a string to an integer, there is a type mismatch error on Line 5. The type mismatch error is only caught when Line 5 is executed after the program is run for a long time. Since the type of the variable i can change during run time, Python (and generally, dynamically typed languages) cannot tell if Line 5 will lead to an error until it is evaluated during run time.

In contrast, statically typed language like Java can detect type mismatch during compile time since the compile-time type of a variable is fixed. As you will see later, Java allows "addition" on string and integer, and but doesn't allow multiplication of a string and an integer. If we have the following code, Java can confidently produce compilation errors without even running a program:

For String objects in Java, the * operator is not allowed, and thus will generate a compilation-error.

Primitive Types in Java

There are two categories of types in Java, the primitive types and the reference types. We will first look at primitive types in this unit.

Primitive types are types that hold numeric values (integers, floating-point numbers) as well as boolean values (true and false).

Kinds
Types
Sizes (in bits)

Boolean

boolean

1

Character

char

16

Integral

byte

8

short

16

int

32

long

64

Floating-Point

float

32

double

64

Long and Float constant

By default, an integer literal (e.g., 888) is assigned an int type. To differentiate between a long and an int constant, you can use the suffix L to denote that the value is expected to be of long type (e.g., 888L is a long). This is important for large values beyond the range of int. Also, to make your large numbers looks clear, you can add underscore _ in the number. (e.g., 888_888_888_888L )

On the other hand, if the constant is a floating-point constant, by default it is treated as type double. You need to add the suffix f to indicate that the value is to be treated as a float type.

Default Values

Fields that are declared but not initialized will be set to a reasonable default by the compiler. Generally speaking, this default will be zero or null, depending on the data type. Relying on such default values, however, is generally considered bad programming style.

Data Type
Default Value (for fields)

byte

0

short

0

int

0

long

0L

float

0.0f

double

0.0d

char

'\u0000'

String (or any object)

null

boolean

false

Local variables are slightly different; the compiler never assigns a default value to an uninitialized local variable. If you cannot initialize your local variable where it is declared, make sure to assign it a value before you attempt to use it. Accessing an uninitialized local variable will result in a compile-time error.

Subtypes

Let SS and TT be two types. We say that TT is a subtype of SS if a piece of code written for variables of type SS can also safely be used on variables of type TT.

We use the notation T<:ST<:S or S:>TS:>T to denote that TT is a subtype of SS. The subtyping relationship in general must satisfy two properties:

  1. Reflexive: For any type S, we have S<:SS<:S (i.e., S is a subtype of itself).

  2. Transitive: If S<:TS<:T and T<:U,T<:U, then S<:US<:U. In other words, if SS is a subtype of TT and TT is a subtype of UU, then SS is a subtype of UU.

Additionally, in Java, you will find that the subtyping relationship also satisfies anti-symmetry. However, this is often omitted as it is enforced by design.

  • Anti-Symmetry: If S<:TS<:T and T<:ST<:S, then SS must be the same type as TT.

Related to the subtype relationship,

  • We use the term supertype to denote the reversed relationship: if TT is a subtype of SS, then SS is a supertype of TT.

  • In specific scenarios, we use the term proper subtype (or <<) to denote a stricter subtyping: if T<:ST<:S and TST\neq S, then TT is a proper subtype of SS, denoted as T<ST<S.

Subtype is nothing but a subset! For more information, see here.

Subtyping Between Java Primitive Types

The following diagram summarises the Subtyping between Java Primitive Types

Long <:<: Float?

Why is long a subtype of float? More specifically, long is 64-bit, and float is only 32-bit. There are more values in long than in float.

The resolution lies in the range of values that can be represented with float and long. long can represent every integer between -263 and 263-1, a 19-digit number. float, however, can represent floating point numbers as big as 38 digits in the integral part (although it can not represent every floating point number and every integer values within the range).

Thus, a piece of code written to handle float can also handle long (since all long values can be represented with a float, albeit with possible loss of precision).

Valid subtype relationship is part of what the Java compiler checks for when it compiles. This means narrow type conversion without explicit casting is not allowed and a compilation-error will be generated by the compiler. Consider the following example:

Line 4 above would lead to an error:

But Line 3 is OK.

Using the terminology that you just learned, double is a supertype of int. And this conversion is known as a narrow type conversion. Since it is done without explicit casting, this is not allowed in Java. However, after we implement the explicit casting as follows, the code should be ok,

Some of the readers might notice that, in the example above, the value of d is 5.0, so, we can store the value as 5 in i, without any loss. Or, in Line 3, we already copied the value stored in i to d, and we are just copying it back to i? Since the value in d now can be represented by i, what is wrong with copying it back? Why doesn't the compiler allow Line 4 to proceed?

The reason is that the compiler does not execute the code (which is when assigning 5.0 to d happens) and it (largely) looks at the code, statement-by-statement. Thus, the line i = d is considered independently from the earlier code shown in the example. In practice, Line 4 might appear thousands of lines away from earlier lines, or may even be placed in a different source file. The values stored in d might not be known until run time (e.g., it might be an input from the user).

Functions

Functions as an Abstraction

In Java, we treat Function as an abstraction over computation

In this course, we'd better consider functions as an abstraction. This abstraction allows programmers to group a set of instructions and give it a name. The named set of instructions may take one or more variables as input parameters, and return zero or one values.

Defining a Function in Java

This is very similar to C, which is learned in CS1010. Below is a function example in Java

Note that the return type is not optional. If the function does not return anything, we use the type called void. Note that, unlike Python, Java does not allow returning more than one value.

Abstraction Barrier

We can imagine an abstraction barrier between the code that calls a function and the code that defines the function body. Above the barrier, the concern is about what task a function performs, while below the barrier, the concern is about how the function performs the task.

The abstraction barrier separates the role of the programmer into two:

  1. an implementer, who provides the implementation of the function, and

  2. a client, who uses the function to perform the task.

Part of the aim of CS2030/S is to switch your mindset into thinking in terms of these two roles. In fact, in CS2030/S, you will be both but may be restricted to just being either a client or an implementer on specific functionality.

Encapsulation

Object

Look around the world, you will find that Real-world objects share two characteristics: They all have state and behavior. For example, Bicycles have state (current gear, current pedal cadence, current speed) and behavior (changing gear, changing pedal cadence, applying brakes).

Software objects are conceptually similar to real-world objects: they too consist of state and related behavior. An object stores its state in fields (variables in some programming languages) and exposes its behavior through methods (functions in some programming languages). Methods operate on an object's internal state and serve as the primary mechanism for object-to-object communication. Hiding internal state and requiring all interaction to be performed through an object's methods is known as data encapsulation — a fundamental principle of object-oriented programming.

Class

In the real world, you'll often find many individual objects all of the same kind. There may be thousands of other bicycles in existence, all of the same make and model. Each bicycle was built from the same set of blueprints and therefore contains the same components. In object-oriented terms, we say that your bicycle is an instance of the class of objects known as bicycles. A class is the blueprint from which individual objects are created.

For example, below is an example to create a Bicycle class

After creating our blueprint - class, we can use new keyword to create an object of this class. For instance, to create a Bicycle object, we can use

  1. If a method is not associated with and does not utilize the fields in the class, it should not be specific to a class and should exist outside.

  2. If your class includes a constructor with parameters (like the one in your Bicycle class), you are required to provide arguments when creating an object using that constructor.

Reference Types in Java

We mentioned in Variable and Type that there are two kinds of types in Java. You have been introduced to the primitive types. Everything else in Java is a reference type.

The Bicycle class is an example of a reference type. Unlike primitive variables, which never share the value, a reference variable stores only the reference to the value, and therefore two reference variables can share the same value. For instance,

The behavior above is due to the variables b1 and b2 referencing to the same Bicycle object in the memory. Therefore, changing the field cadence of b1 causes the field cadence of b2 to change as well.

Special Reference Value: null

Any reference variable that is not initialized will have the special reference value null. So, remember to always instantiate a reference variable before using it.

This idea of reference type in Java is similar to the idea of pointers in C, which is covered in CS1010.

QnA

1

Is it possible to instantiate an object twice?

The problem is that, in java, is the following code correct? What kind of error will we get?

Note that till now, we haven't learned how to write complete compilable java program. This code snippet is just for demo only.

Ans: We will get a compilation error. However, if you run this code in jshell, it should overwrite a c1 to a new object.

Useful Resourcse

Last updated