Data-Centric Computing

1. Policies
- 1.1. Evaluation
- 1.2. Resources
2. Lectures
3. Labs

1. Policies

1.1. Evaluation

Attend lectures and labs only if you want to. It is expected that you will keep up with the discussions during lectures and labs even if you don't attend them.

The evaluation policy is as follows:

Assessment	Percentage	Remarks
Quiz x10	60	Best 6 out of 10. Normalized.
Mid-semester	20
End-semester	20

Quizzes will be conducted during lectures or labs. There will be no surprise quizzes.
There will be no retakes if you miss a quiz. We will simply take best 6 of your scores.
Any violation of the honor code of the institute is automatically an "F" grade.
The syllabus for all assessments is whatever that has been discussed or suggested until and including the previous meeting.

There are no prescribed textbooks for this course. The topics we cover in this course are spread across many books, research papers, and other technical documents. The following list of links will collect the external resources.

You are expected to use these in addition to the following lecture notes. Your TAs are also a resource from which to learn. Use them.

2. Lectures

2.1. Overview

We call presentation of values as data. Data has various forms: numbers, text, audio, video, lists, tables, dictionaries, etc. Data can be classified loosely into two categories as follows:

Low-level: Convenient for consumption by computers.
High-level: Convenient for consumption by humans.

As programmers, we have to learn both. We write programs which help computers consume data. We also use programs that help us consume data, for example, spreadsheet programs.

For presenting the same value, there are usually many choices. For example, we represent by 42 and LXII the same value. The place-value system won because it was compact and it was convenient to do arithmetic on it. As programmers, we must learn to evaluate various representations and choose the best one for our requirements.

If I have to store scores of various quizzes, should I use this table?

Name	Q1	Q2
Raj	5	7
Rahul	6	3

or this one?

Name	Assessment	Score
Raj	Q1	5
Raj	Q2	7
Rahul	Q1	6
Rahul	Q2	3

How do you determine which is better?

2.2. Low-level data

Data has types. The type determines meaning and valid operations on data. In a computer, everything is represented as a sequence of bits. A bit is a type of data that has two values – 0 and 1. A group of 8 bits is a byte.

We will see some of the low-level types supported by the C programming language.

2.2.1. Unsigned Integers

A sequence of 8 bits can represent unsigned integers from 0 to 255. We use a place-value system with 2 as the base to interpret these numbers. The least significant bit has place value 1, the next one 2, the next 4, and so on.

The usual arithmetic operations are supported and results wrap to fit. Adding 1 to 255 yields 0 for 8-bit unsigned integers.

Bitwise operators operate bit-by-bit. The AND (&) operation outputs 1 if both bits are 1, the OR (|) operation outputs 1 when at least one of the bits are 1, the XOR (^) operation outputs 1 when exactly one of the bits are 1. The unary NOT (~) operation changes the bit.

#include <stdio.h>
#include <stdint.h>

int main()
{
    uint8_t a = 0xa7, b = 0x14;
    printf("%x\n", a&b);
    printf("%x\n", a|b);
    printf("%x\n", a^b);
    printf("%x\n", ~a);
    return 0;
}

4
b7
b3
ffffff58

To understand why there is a 32-bit output, see Integer Promotions in Beej's guide.

See Bitwise Operations in Beej's guide.

2.2.2. Signed Integers

An 8-bit signed integer represents values from -128 to 127. This is achieved by negating the place-value of the most significant bit.

Arithmetic operations should not overflow or underflow. The programmer should ensure this.

There are types like int8_t that have specified bit-width. Types like int have a fixed, unspecified width.

#include <stdio.h>
#include <stdint.h>

int main()
{
    int8_t i8 = 0xa3;
    int i = 25;
    printf("%d\n", i8);
    printf("%x\n", i);
    printf("%d\n", sizeof(i));
    return 0;
}

-93
19
4

See Operations in Beej's guide.

2.2.3. Characters

A character is an 8-bit type that represents characters of the English alphabet.

#include <stdio.h>

int main()
{
    char c = 'a';
    printf("%c (%x)\n", c, c);
    c = 'A';
    printf("%c (%x)\n", c, c);
    return 0;
}

a (61)
A (41)

See Character Types in Beej's guide.

2.2.4. Products

A product type holds values of possibly distinct types simultaneously.

#include <stdio.h>

struct point
{
    int x;
    int y;
};

int main()
{
    struct point p = { .x = 1, .y = 2 };
    printf("The square of distance of p: %d\n", p.x*p.x + p.y*p.y);
    p.x += 2;
    p.y += 3;
    printf("The square of distance of p: %d\n", p.x*p.x + p.y*p.y);
    return 0;
}

The square of distance of p: 5
The square of distance of p: 34

#include <stdio.h>

struct employee
{
    int salary;
    char designation;
};

int main()
{
    struct employee p = { .salary = 40000, .designation = 'M' };
    printf (
        "%s earning %d.\n",
        p.designation == 'M' ? "Manager" : "Worker",
        p.salary
    );
    return 0;
}

Manager earning 40000.

2.2.5. Unions

A union holds values of multiple types but not simultaneously. All values are stored at the same place. So only one of them can exist at a time.

#include <stdio.h>

union ioc
{
    int i;
    char c;
};

int main()
{
    union ioc u;
    u.i = 42;
    printf("%d %x\n", u.i, u.c);
    u.c = 10;
    printf("%d %x\n", u.i, u.c);
    return 0;
}

42 2a
10 a

See Unions in Beej's guide.

2.2.6. Sums

We can combine struct and union to create a sum. A sum holds one value at a time along with a tag to identify which one is being held.

#include <stdio.h>

struct animal
{
    int tag; // 0 - cat, 1 - fish.
    union {
        struct {
            int whiskers;
        } cat;
        struct {
            int scales;
        } fish;
    };
};

void describe(struct animal a)
{
    switch (a.tag) {
    case 0:
        printf("Cat with %d whiskers.\n", a.cat.whiskers);
        break;
    case 1:
        printf("Fish with %d scales.\n", a.fish.scales);
        break;
    default:
        printf("I don't recognize this animal.\n");
    }
}

int main()
{
    struct animal a = {
        .tag = 0,
        .cat.whiskers = 10
    };
    struct animal b = {
        .tag = 1,
        .fish.scales = 100
    };
    describe(a);
    describe(b);
    return 0;
}

Cat with 10 whiskers.
Fish with 100 scales.

2.2.7. Arrays

An array is a sequence of values of the same type. We can access individual elements using indexing. Indices start at zero.

#include <stdio.h>

int main()
{
    int primes[] = { 2, 3, 5, 7, 9, 13 };
    printf("The fifth prime is %d.\n", primes[4]);
    primes[4] = 11;
    printf("The fifth prime is %d.\n", primes[4]);
    return 0;
}

The fifth prime is 9.
The fifth prime is 11.

We usually traverse arrays in C using a three-clause for loop. Here's an implementation of the linear-search algorithm:

#include <stdio.h>

ssize_t linsearch(int haystack[], int n, int needle)
{
    for (size_t i = 0; i < n; ++i) {
        if (haystack[i] == needle)
            return i;
    }

    return -1;
}

int main()
{
    int primes[] = { 2, 3, 5, 7, 11, 13 };
    printf("%d found at %d.\n", 5, linsearch(primes, 6, 5));
    printf("%d found at %d.\n", 12, linsearch(primes, 6, 12));
    return 0;
}

5 found at 2.
12 found at -1.

2.2.8. Pointers

Every memory location has an address. A pointer is a type where values are addresses. The value held at the address can be changed through a pointer. Changing the pointer makes it point to a different location.

#include <stdio.h>

int main()
{
    int i = 0, j = 5;
    int *p = &i;
    printf("%d@%x %d@%x\n", i, &i, j, &j);
    *p = 3;
    printf("%d@%x %d@%x\n", i, &i, j, &j);
    p = &j;
    *p = 2;
    printf("%d@%x %d@%x\n", i, &i, j, &j);
    return 0;
}

0@1ed95954 5@1ed95950
3@1ed95954 5@1ed95950
3@1ed95954 2@1ed95950

Here are some relationships pointers in C have to other concepts.

You can use the address-of operator, denoted &, to obtain the address of variables, struct or union members, and array members.
The special value NULL is valid for any pointer and means that the pointer is not pointing to any object.
If a is an array variable, then the expression a is a pointer to the first element of the array. This is called pointer-to-array decay in C.
The operator s->a is equivalent to (*s).a.

2.2.9. Functions and data

The local variables of a function are created when the function begins and destroyed once it returns. Local variables include both parameters and any variable declared within the function.

int *min(int x, int y)
{
    return (x < y) ? &x : &y;
}

The above code has an error that won't be caught by the compiler. The function returns the address of a local variable. However, the caller of the function has no way of using that address as by the time the caller obtains the address, the object at that location has been destroyed.

The following code works.

int *min(int *x, int *y)
{
    return (*x < *y) ? x : y;
}

int main()
{
    int i = 5, j = 10;
    printf("%d\n", *min(&i, &j));
    return 0;
}

The return value of min() is an address of a variable in main(). So main() receives the address of an object that is local to itself.

In C, arrays are pointers.

int second(int *a, int n)
{
    assert(n >= 2);
    return a[1];
}

int main()
{
    int primes[] = { 2, 3, 5 };
    printf("%d\n", second(primes, 3));
    return 0;
}

When we pass primes as an argument to second(), only the address of the array is passed. The parameter n is the number of integers present contiguously from the base address. Notice that within second(), we can use a like an array. An expression of the form base[offset] is interpreted by C as the offset-th object from the base address. It does not care whether base is an array or a pointer.

We can also use address-passing as a mechanism to modify the local variables of a function from the called function.

void set_second(int *a, int n, int v)
{
    assert(n >= 2);
    a[1]= v;
}

int main()
{
    int primes[] = { 2, 4, 5 };
    set_second(primes, 2, 3);
    printf("%d\n", primes[1]);
    return 0;
}

Observe that modifying a[1] changes primes[1]. This is because a and primes are the same base address.

And, the modified objects need not be arrays.

void minmaxify(int *x, int *y)
{
    if (*x < *y) return;
    int t = *y;
    *y = *x;
    *x = t;
}

int main()
{
    int i = 10, j = 5;
    minmaxify(&i, &j);
    printf("%d %d\n", i, j);
    return 0;
}

5 10

You can observe the similar behaviour in python. Somewhat unpredictably.

def modify1(x):
    x = x + 1

def modify2(xs):
    xs[1] = 3

y = 5
ys = [2, 4, 5]

modify1(y)
modify2(ys)

print(y)
print(ys)

5
[2, 3, 5]

Observe that y remained the same while ys got modified. This is because x and y refer to different objects with the same value initially while xs and ys refer to the same object.

2.2.10. Representing contiguous sequences

As we saw earlier, arrays when passed to functions become just pointers to the first element. So the receiver has no way of knowing how many elements are there in the array. One way to solve this problem is to pass around the length of the array along with this pointer.

Another approach is the sentinel technique, where a special element is used to signal that the sequence has ended. In C, strings use this data representation.

char *s = "Hello";
for (size_t i = 0; i <= 5; ++i) {
    printf("%c (%d)\n", s[i] ? s[i] : '?', s[i]);
}

H (72)
e (101)
l (108)
l (108)
o (111)
? (0)

2.2.11. Recursive types

We can represent sequences using non-contiguous data in memory. This is used in situations where the sequence grows over time and we want to be able to place new elements at arbitrary positions in the sequence. This data structure is called a linked-list.

2.2.11.1. Lists

Here's an implementation of a linked-list. A linked-list stores a sequence. Each element of the sequence is stored in a node. The node also stores the address of the next node. The last node's next address is NULL to indicate that the sequence has ended.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

struct node
{
    int data;
    struct node *next;
};

struct node *empty()
{
    return NULL;
}

struct node *cons(int a, struct node *rest)
{
    struct node *n = malloc(sizeof(struct node));
    assert(n);
    n->data = a;
    n->next = rest;
    return n;
}

void print_list(struct node *n)
{
    printf("(");
    while (n) {
        printf("%d, ", n->data, n);
        n = n->next;
    }
    printf(")\n");
}

int main()
{
    struct node *xs = empty();
    for (int i = 2; i < 1000; i += 2 * i) {
        xs = cons(i, xs);
    }
    print_list(xs);
    return 0;
}

(486, 162, 54, 18, 6, 2, )

Observe that given a pointer to the first node and a number i, there is no way to calculate the address of the i th node. We have to read the pointers in first i-1 nodes. What about arrays?

What are some advantages of storing sequences this way?

Here's a function to insert a new node at some arbitrary position in the list. Note that it only takes a few operations. What if I wanted to do this in an array?

void insert_after(struct node *here, struct node *what)
{
    assert(here);
    assert(what);
    struct node *save = here->next;
    here->next = what;
    what->next = save;
}

We can also write a function to remove a node with a few operations. We should get a pointer to the predecessor to do this.

struct node *skip(struct node *here)
{
    assert(here);
    struct node *skipped = here->next;
    here->next = skipped ? skipped->next ? NULL;
    return skipped;
}

2.3. Fundamental algorithms

2.3.1. Time complexity

Every algorithm has an associated time complexity. First, we define the size of the input for the problem and then measure the rate of growth of the running time wrt the size. A few examples:

Consider is_prime(n). We can consider magnitude of input n to be the size of the input. Then, the straight-forward algorithm executes for at most \(a\sqrt{n} + b\) time for \(a, b\neq 0\).
Consider the problem of adding two n-digit numbers. The addition with carry algorithm takes \(a'n+b'\) time for some \(a', b'\neq 0\).
Consider multiplying two n-digit numbers. It takes \(a''n^2+b''n+c''\) for some non-zero \(a'', b'', c''\).

We are not really interested in finding the values of \(a, b, c\) etc. What we really want to understand is the rate of change. We can understand this by computing certain limits. If \(f(n)\) is the time, then limit \(n\) tends to \(\infty\) \(f(2n)/f(n)\) tells us how time increases when we double the input size. We take \(n\) tends to \(\infty\) as we don't really care how long it takes for small inputs. Computing this limit for the three examples yield constants 1.42, 2, and 4 which tells us how much we can expect time to increase if we double the input size. Notice that the values of \(a, b, c\) etc. are irrelevant for this calculation.

Let us do some back of the hand calculations to understand the absolute growth too. A computer can do roughly \(10^8\) basic operations per second. So, it can add two million-digit numbers in under a second. But, to multiply two million-digit numbers, it will take more than two hours. This tells us that we need better methods if we want to multiply million-digit numbers on a computer.

Consider the problem of computing \(\lceil\log_2(n)\rceil\) given \(n\). It takes only \(a\log(n)+b\) steps for some \(a, b\). This function grows slower than the previous examples. Doubling the input size makes the algorithm do \(a\) more basic operations. Or, put differently to double the running-time, you have to square the input.

Some common functions that appear as time complexity of algorithms:

Logarithms: \(\log(n)\)
Polynomials: \(n^k\) for constant \(k\).
Exponentials: \(k^n\) for constant \(k\).
Products: \(fg\) where \(f\) and \(g\) are functions.
Compositions: \(f\circ g\) where \(f\) and \(g\) are functions.

Notice that sums are absent from the list. This is because \(f+g\) has the same rate of growth as the "larger" function. Consider \(f(n)=n^2+n\) and \(g(n)=n^2\) and observe that the above limit is two for both functions. For the same reason, time complexities \(n\) and \(n\log(n)\) or \(2^n\) and \(2^nn^2\) are also considered similar (This is less common in theory, but practically there is rarely any difference).

2.3.2. Searching

We have seen linear search earlier. A binary search works when the input is a sorted array. It is faster, taking only \(\log(n)\) time for arrays of size \(n\). Instead of comparing elements from first to last, we pick the middle element of the array first, if it is less than the element being searched for, the needle, we can narrow our search to the second half of the array. If it is more, then narrow to the left half. Therefore, at each step, we halve our search space. So the total number of steps is logarithmic.

#include <stdio.h>

int *binsearch(int haystack[], size_t n, int needle)
{
    ssize_t low = 0, high = n-1;
    ssize_t mid;
    while (low <= high) {
        mid = low + (high - low)/2;
        if      (haystack[mid] < needle) low  = mid+1;
        else if (haystack[mid] > needle) high = mid-1;
        else                             return haystack+mid;
    }
    return NULL;
}

int main()
{
    int haystack[] = { 1, 2, 3, 4, 6, 7, 8 };
    int needles[]  = { 4, 1, 7, 0, 9, 5 };

    for (size_t i = 0; i < sizeof(needles)/sizeof(needles[0]); ++i)
        printf("%p\n", binsearch(haystack, 7, needles[i]));

    return 0;
}

0x7ffdbde77dfc
0x7ffdbde77df0
0x7ffdbde77e04
(nil)
(nil)
(nil)

2.3.3. Sorting

There are many algorithms to sort a sequence of elements based on some order. The following algorithm is called insertion sort:

#include <stdio.h>

void swap(int *a, int *b)
{
    int t = *a;
    *a = *b;
    *b = t;
}

void print_array(int xs[], size_t n)
{
    printf("(");
    for (size_t i = 0; i < n; ++i)
        printf("%d, ", xs[i]);
    printf(")\n");
}

void sort(int xs[], size_t n)
{
    if (!n) return;

    for (size_t i = 0; i < n-1; ++i) {
        size_t j = i+1;
        while (j > 0 && xs[j] < xs[j-1]) {
            swap(&xs[j], &xs[j-1]);
            --j;
        }
    }
}

int main()
{
    int xs[] = { 5, 1, 9, 4, 6, 7, 3, 2 };
    size_t n = sizeof(xs)/sizeof(xs[0]);
    sort(xs, n);
    print_array(xs, n);
    return 0;
}

(1, 2, 3, 4, 5, 6, 7, 9, )

It takes \(n^2\) time to sort an array of \(n\) elements. This is not the fastest algorithm for this problem. But it works quite well when the array is almost sorted.

3. Labs

3.1. C datatypes

One of the first goals is to understand precisely how fundamental types behave in C. Here are some exercises that will help.

Compile and run all C programs you saw in the lecture.
Write a program to print the binary representation of a uint8_t value.
Can you implement unary negation of an int8_t using only bitwise operations and addition?
Given an int8_t, how do you compute a uint8_t that holds the absolute value of the given integer?
Write a program to print the binary representation of a int8_t value.
There is a datatype called long, how many bytes is it on your machine?
Read and understand the shift operators in C.
Implement right rotation of uint8_t using bitwise and shift operators. A right rotate operation takes a uint8_t and a value between 0 and 7 inclusive and rotates the bits towards the right. For example, the eight-bit number 10110101 right rotated by three gives 10110110.
Implement left rotation similarly.
Swap two half-bytes of a uint8_t. For example, 0xca should become 0xac.
Write code to check whether an unsigned int is a power of two. You can do this with one simple expression!
Compute the absolute value of a int8_t without using any conditional statements (manipulate the bits).
Write code to find the largest power of two smaller than a given uint32_t. Do it in the most efficient way possible.

3.1.1. Solutions

You can print the binary representation by doing a bitwise AND with the appropriate values.

The first question is just an exercise to make sure that the setup on your machine is working.

You can print the binary representation by doing a bitwise AND with the appropriate values. Since we know our input has exactly eight bits, we don't need a loop either.

#include <stdio.h>
#include <stdint.h>

int main()
{
    uint8_t u = 0xa3;

    if (u & 0x80) printf("1"); else printf("0");
    if (u & 0x40) printf("1"); else printf("0");
    if (u & 0x20) printf("1"); else printf("0");
    if (u & 0x10) printf("1"); else printf("0");
    if (u & 0x08) printf("1"); else printf("0");
    if (u & 0x04) printf("1"); else printf("0");
    if (u & 0x02) printf("1"); else printf("0");
    if (u & 0x01) printf("1"); else printf("0");

    return 0;
}

10100011

Why does an expression like u & 0x80 work as a condition? In C, all integer values other than zero are considered true and zero is considered false.

Another way to do this is to iterate exactly eight times.

#include <stdio.h>
#include <stdint.h>

int main()
{
    uint8_t u = 0xa3;

    for (int i = 0; i < 8; ++i) {
        if (u & (0x80 >> i))
            printf("1");
        else
            printf("0");
    }

    return 0;
}

10100011

You can implement unary negation by using the equation -a == ~a + 1.

You can get absolute value by simply comparing with 0 and taking the negation if necessary.

int8_t i8 = -128;
uint8_t u8;

printf("%d\n", i8);

if (i8 >= 0)
    u8 = i8;
else
    u8 = -i8;

printf("%d\n", u8);

-128
128

Notice that it correctly worked with -128 too. But why? When we negate -128, we get 128 which is the same as 0 as a signed integer. But it still printed 128.

The same code for uint8_t works for printing bits of int8_t. Just change types.

The size of long can be printed using an expression such as:

printf("%d\n", sizeof(long));

We can write a function to implement right rotation. Let s be the amount to rotate. We first extract the lower-order s bits, then right-shift, and then place the extracted bits at the beginning.

#include <stdio.h>
#include <stdint.h>
#include <stddef.h>

uint8_t rrot(uint8_t u, size_t s)
{
    uint8_t lower = u & ((0x1 << s) - 1); // extract
    u >>= s; // shift
    u |= lower << (8-s); // put back
    return u;
}

int main()
{
    printf("%x\n", rrot(0xb5, 3));
    return 0;
}

b6

To understand the above code, take a pen and paper and try and simulate what the computer did by hand. You can also ask chatgpt to ask how this works. But don't forget to verify that its answer is correct by verifying with authoritative sources.

Left rotation is left as an exercise.

To swap the two bytes, one could use rotation. But, it is simpler to write the flags directly.

#include <stdio.h>
#include <stdint.h>

int main()
{
    uint8_t u = 0xac;

    printf("%x\n", (u & 0xf0) >> 4 | (u & 0x0f) << 4);

    return 0;
}

ca

To check whether a number is a power of two, remember the quiz question.

#include <stdio.h>
#include <stdbool.h>

bool is_pow2(unsigned int u)
{
    return !(u & (u - 1));
}

int main()
{
    printf("%d %d %d\n", is_pow2(16), is_pow2(17), is_pow2(0));
}

1 0 1

The expression gets it wrong only when u is 0. Zero is not a power of two. Recall that Booleans are integers in C. So we can do the following:

#include <stdio.h>
#include <stdbool.h>

bool is_pow2(unsigned int u)
{
    return u * !(u & (u - 1));
}

int main()
{
    printf("%d %d %d\n", is_pow2(16), is_pow2(17), is_pow2(0));
}

1 0 0

To compute the absolute value:

uint8_t abs(int8_t i)
{
    return (i ^ (i >> 7)) - (i >> 7);
}

int main()
{
    printf("%u %u %u %u\n", abs(-1), abs(1), abs(-128), abs(-127));
    return 0;
}

1 1 128 127

Why does this work? If i is non-negative, i >> 7 is 0 and just i is returned. If i is negative, then i >> 7 is just -1 (right-shifting shifts in the most significant bit value). So i ^ (i >> 7) flips all the bits of i. The result is off by one. So we add one back.

Finding the largest power of two:

uint32_t pow2(uint32_t n) {
    if (n == 0) return 0;

    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;

    return n - (n >> 1);
}

int main()
{
    printf("%u %u %u\n", pow2(16), pow2(13), pow2(100));
    return 0;
}

16 8 64

Analyze the code and try and come up with an explanation of why it works.

3.2. Structs, arrays, and pointers

The double type in C can approximate rational numbers. Use it to represent coordinates. Write a struct to represent lines in a plane and write code to check whether or not two lines intersect. Use the following code as a starting point.

#include <stdio.h>

struct line {
    // Fill in the members to represent a line.
    // Can you represent horizontal, vertical, and slanted lines?
    // SOLUTION:
    // We will use values of a, b, c from ax + by + c = 0.
    double a, b, c;
};

int main()
{
    // Fill these so that these become valid lines.
    // Notice that both a and b should not be 0.
    struct line L1 = { .a = 0, .b = 1, .c = 3 };
    struct line L2 = { .a = 0, .b = 2, .c = 6 };

    // Write code here to check whether L1 and L2 intersect.
    // Change L1 and L2 and test various cases.
    // And, print whether or not they do.
    double det = L1.a * L2.b - L2.a * L1.b;
    if (det != 0) {
        printf("The lines intersect at a unique point.\n");
    } else {
        if (L1.a * L2.c == L2.a * L1.c && L1.b * L2.c == L2.b * L1.c) {
            printf("The lines are coincident.\n");
        } else {
            printf("The lines are parallel.\n");
        }
    }

    return 0;
}

The lines are coincident.

Can you simplify the code? Did I cover all cases correctly? Can you print the point of intersection if any?

We can now write struct to represent rectangles and circles. Also, write code to check whether or not these shapes overlap.

#include <stdio.h>

struct point {
    double x;
    double y;
};

struct rectangle {
    // Fill here.
};

struct circle {
    // Fill here.
};

int main()
{

    // Fill in members so that you get valid rectangles and circles.
    struct rectangle R1 = {};
    struct rectangle R2 = {};

    struct circle C1 = {};
    struct circle C2 = {};

    // Write code here to check whether R1 and R2 intersect and print
    // the result.

    // Write code here to check whether C1 and C2 intersect and print
    // the result.

    return 0;
}

Finally, let us write some code to find the farthest point from the origin.

#include <stdio.h>

struct point {
    double x;
    double y;
};

int main()
{
    // Fill the array so that you have at least five points.
    struct point ps[] = {
        { .x = 0, .y = 0 },
        { .x = 0, .y = 1 },
        { .x = 1, .y = 0 },
        { .x = 2, .y = 0 },
        { .x = 1, .y = 1 },
    };
    struct point farthest;

    // Write code here to compute the farthest point from origin in the
    // array ps into the variable farthest.
    farthest = ps[0];
    for (size_t i = 1; i < sizeof(ps)/sizeof(ps[0]); ++i) {
        if (ps[i].x*ps[i].x + ps[i].y*ps[i].y
            > farthest.x*farthest.x + farthest.y*farthest.y)
            farthest = ps[i];
    }

    printf("The farthest point is (%.2lf, %.2lf).\n", farthest.x, farthest.y);

    return 0;
}

The farthest point is (2.00, 0.00).

An exercise on initializing pointers.

#include <stdio.h>

int main()
{
    int a = 14, b = 25;
    int *min;
    int *max;

    // Write code here so that the following printf works as
    // expected. Do not change the values of a and b.
    if (a < b) {
        min = &a;
        max = &b;
    } else {
        min = &b;
        max = &a;
    }
    printf("min = %d, max = %d\n", *min, *max);

    return 0;
}

min = 14, max = 25

Some array manipulation.

#include <stdio.h>

int main()
{
    // Fill the array with some elements.
    int xs[] = { 3, 9, -5, 6, 4, -1, 0, 4, 0, 2 };

    // Write code here so that the elements in xs are reversed.
    for (
         size_t i = 0, j = sizeof(xs)/sizeof(xs[0]) - 1;
         i < j;
         ++i, --j
         ) {
        int t = xs[j];
        xs[j] = xs[i];
        xs[i] = t;
    }

    // Use size_t for indexing into arrays.
    for (size_t i = 0; i < sizeof(xs)/sizeof(xs[0]); ++i) {
        printf("%d ", xs[i]);
    }

    return 0;
}

2 0 4 0 -1 4 6 -5 9 3

More array processing.

#include <stdio.h>

int main()
{
    // Fill the array with some elements.
    int xs[] = { 3, 9, -5, 6, 4, -1, 0, 4, 0, 2 };
    int positive = 0, negative = 0, zero = 0;

    // Modift the code below so that everything works as expected.
    for (size_t i = 0; i < sizeof(xs)/sizeof(xs[0]); ++i) {
        if (xs[i] > 0) ++positive;
        else if (xs[i] < 0) ++negative;
        else ++zero;
    }

    printf("The array has %d positive elements, %d negative elements, and %d zeroes.\n", positive, negative, zero);

    return 0;
}

The array has 6 positive elements, 2 negative elements, and 2 zeroes.

Write a struct to store a credit card number and write Luhn algorithm to verify the validity of the credit card number.

Write a struct to store date and time (year, month, day, hour, minute, second). Define the struct so that it has the smallest size. Can you pack all the information in such a struct into a uint32_t? What about a uint64_t? Write code to convert data between those two representations.

3.3. Functions

Fix the following function and test it.

void drop_to_x_axis(struct point p)
{
    p.x = 0;
}

First, it drops to y axis. Second, it cannot modify the point in the caller. The fixed version is:

void drop_to_x_axis(struct point *p)
{
    p->y = 0;
}

or you can return a new point.

Write a function to take an array of struct person with an age member and return their average age.

int average_age(struct person *ps, int n)
{
    double a = 0.0;

    for (size_t i = 0; i < n; ++i) {
        a += ps[i].age;
    }

    return a/n;
}

Write a function to compute the midpoint of a line segment. There are two ways to write this corresponding to the following functions. Write both of them. Discuss when one is better than the other.

struct point midpoint(struct point p1, struct point p2);
void midpoint(struct point p1, struct point p2, struct point *p3);

The first one is:

struct point midpoint(struct point p1, struct point p2)
{
    return (struct point) {
        .x = (p1.x + p2.x) / 2,
        .y = (p1.y + p2.y) / 2
    };
}

The second one is:

void midpoint(struct point p1, struct point p2, struct point *p3)
{
    *p3 = (struct point) {
        .x = (p1.x + p2.x) / 2,
        .y = (p1.y + p2.y) / 2
    };
}

The first one is better as it allows midpoint to be used in expressions naturally. As in, midpoint(midpoint(p, q), midpoint(r, s)). Try and writing the equivalent code in the second version.

Write a function to compute the intersection point of two lines. Implement the following two functions and discuss when one is better than the other.

struct point intersection(struct line l1, struct line l2);
void intersection(struct line l1, struct line l2, struct point *p);

This is same as previous question except for some calculations. There is a crucial difference here. The intersection of two lines need not exist (unlike midpoint of two points). So the functions should assert that the two given lines have an intersection and otherwise error as it cannot return any value. The following function is better:

bool intersection(struct line l1, struct line l2, struct point *p);

Now, the bool return value indicates whether or not an intersection exists. The function should be written so that if (and only if) the return value is true, the intersection should be stored in *p. This function need not ever error out.

Write a program that modifies an integer through a pointer to a pointer to an int.

int i = 0;
int *p = &i;
int **q = &p;

**q = 1;
printf("%d\n", i);

Notice that the expression &&i wouldn't work. Why?

Write a function to swap the values of two integers in memory. What are the parameters? What is the return value?

#include <stdio.h>

void swap(int *x, int *y)
{
    int t = *y;
    *y = *x;
    *x = t;
}

int main()
{
    int i = 10, j = 5;
    swap(&i, &j);
    printf("%d %d\n", i, j);
    return 0;
}

5 10

Write a function to sum the integers in an array. Can your function be used to sum sub-arrays?

#include <stdio.h>

int sum(int *xs, int n)
{
    int s = 0;
    for (size_t i = 0; i < n; ++i) s += xs[i];
    return s;
}

int main()
{
    int xs[] = { 2, 3, 5, 7, 11, 13 };
    printf("%d %d\n", sum(xs, 6), sum(xs + 2, 3));
    return 0;
}

41 23

Implement binary-search on an array of integers. (Left as exercise)

Write the following function. Determine the number of integer comparisons your function will make in the worst-case. The straight-forward algorithm will do 2n comparisons. There is an algorithm that uses at most 3n/2 comparisons.

#include <stdio.h>
#include <assert.h>

void minmax(int xs[], int n, int *min, int *max)
{
    assert(min && max);

    if (n <= 0) return;

    /* initialize */ *min = *max = xs[0];

    size_t i;
    for (i = 1; i < n-1; i += 2) {
        int m, M;

        if (xs[i] < xs[i+1]) {
            m = xs[i];
            M = xs[i+1];
        } else {
            m = xs[i+1];
            M = xs[i];
        }

        if (m < *min) *min = m;
        if (M > *max) *max = M;
    }

    if (i < n) {
      if (xs[i] < *min) *min = xs[i];
      if (xs[i] > *max) *max = xs[i];
    }
}

int main()
{
    int xs[] = { 0, 5, 20, 8, -5, 3, 9, -14, 7 };
    int m, M;

    minmax(xs, 9, &m, &M);
    printf("%d %d\n", m, M);

    minmax(xs, 7, &m, &M);
    printf("%d %d\n", m, M);

    return 0;

}

-14 20
-5 20

Observe that for every two elements in the array, we do three comparisons.

3.4. Dynamic memory allocation

Implement Python's range().

int *range(int start, int stop, int step);

The int * return type is not suitable here. If we return just a pointer to the base of an array, how does the caller know how many elements are there in the array? A solution is to return both a pointer and the number of elements packaged into a struct.

#include <stdio.h>
#include <stdlib.h>

struct Range
{
    int *array;
    size_t size;
};

struct Range range(int start, int stop, int step)
{
    if (step == 0) {
        return (struct Range) { NULL, 0 };
    }

    size_t length = 0;

    if (step > 0) {
        length = (stop > start) ? (stop - start + step - 1) / step : 0;
    } else {
        length = (start > stop) ? (start - stop - step - 1) / -step : 0;
    }

    int *arr = malloc(length * sizeof(int));
    if (arr == NULL) {
        return (struct Range) { NULL, 0 };
    }

    for (size_t i = 0; i < length; ++i) {
        arr[i] = start + i * step;
    }

    return (struct Range) { arr, length };
}

void free_range(struct Range r) {
    free(r.array);
}

void print_range(struct Range r) {
    printf("(");
    for (size_t i = 0; i < r.size; ++i) {
        printf("%d, ", r.array[i]);
    }
    printf(")\n");
}

int main() {
    struct Range r = range(0, 10, 1);
    if (r.array) print_range(r);
    free_range(r);

    r = range(5, -10, -3);
    if (r.array) print_range(r);
    free_range(r);

    return 0;
}

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, )
(5, 2, -1, -4, -7, )

Write a function to compute the prime factors of a given number.

int *prime_factors(int n);

Here, an int * suffices. An int in C can have at most 8 * sizeof(int) - 1 prime factors (why?). This is a small amount of memory. We can put a 0 as the last element in the array to signal that the prime factors has ended. That is, for input 60, we can output the just the base address of the array { 2, 2, 3, 5, 0 }. This works because 0 is never a valid divisor. This data representation of terminating a sequence with a special element is called the sentinel technique. Examples of this include a full-stop (".") to end a sentence in English, the word "over" to end a sentence when speaking over radio.

#include <stdio.h>
#include <stdlib.h>

int* prime_factors(int n)
{
    if (n <= 1)
        return NULL;

    int *factors = malloc(8 * sizeof(int) * sizeof(int));
    int index = 0;

    while (n % 2 == 0) {
        factors[index++] = 2;
        n /= 2;
    }

    for (int i = 3; i * i <= n; i += 2) {
        while (n % i == 0) {
            factors[index++] = i;
            n /= i;
        }
    }

    if (n > 2)
        factors[index++] = n;

    factors[index] = 0;

    return factors;
}

void print_prime_factors(int *factors)
{
    int i;
    for (i = 0; factors[i+1] != 0; i++)
        printf("%d * ", factors[i]);
    printf("%d\n", factors[i]);
}

int main()
{
    int xs[] = { 13, 39, 56, 1000007, 1000000007, 0 };

    for (size_t i = 0; xs[i]; ++i) {
      int *factors = prime_factors(xs[i]);

      if (factors != NULL) {
        printf("%12d = ", xs[i]);
        print_prime_factors(factors);
      }

      free(factors);
    }

    return 0;
}

        13 = 13
        39 = 3 * 13
        56 = 2 * 2 * 2 * 7
   1000007 = 29 * 34483
1000000007 = 1000000007

Implement Sieve of Eratosthenes to figure out the prime numbers and composite numbers below a given number. Use the following skeleton to get started.

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>

int main()
{
    int n = 30;

    bool *elts = malloc(n * sizeof(bool));

    for (size_t i = 0; i < n; ++i) elts[i] = true;

    for (size_t d = 2; d < n; ++d) {
        if (!elts[d]) continue;

        for (size_t m = 2 * d; m < n; m += d) {
            elts[m] = false;
        }
    }

    printf("primes under %d: (", n);
    for (size_t i = 2; i < n; ++i) {
        if (elts[i]) printf("%d, ", i);
    }
    printf(")\n");
}

primes under 30: (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, )

Read about Josephus problem. Implement a program to determine the survivor for a user entered n and k. Use the previous program as a guideline.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

size_t next_alive(bool *persons, size_t n, size_t cur)
{
    cur = (cur + 1) % n;
    while (!persons[cur]) cur = (cur + 1) % n;
    return cur;
}

size_t josephus(size_t n, size_t k)
{
    bool *persons = malloc(n * sizeof(bool));
    for (size_t i = 0; i < n; ++i) persons[i] = true;

    size_t ndead = 0;
    size_t cur = 0;
    while (ndead < n-1) {
        size_t i = 1;
        while (i < k) {
            ++i;
            cur = next_alive(persons, n, cur);
        }
        persons[cur] = false;
        ++ndead;
        cur = next_alive(persons, n, cur);
    }
    return cur+1;
}

int main()
{
    struct { size_t n, k; } xs[] = {
        { .n = 41, .k = 3 },
        { .n = 10, .k = 2 },
        { .n = 15, .k = 2 },
    };
    for (size_t i = 0; i < sizeof(xs)/sizeof(xs[0]); ++i)
        printf (
                "josephus(%zu, %zu) = %zu.\n",
                xs[i].n, xs[i].k,
                josephus(xs[i].n, xs[i].k));
    return 0;
}

josephus(41, 3) = 31.
josephus(10, 2) = 5.
josephus(15, 2) = 15.

3.5. Review

Design a scheme to represent a date in this semester in a uint8_t. Write a function void print_date(uint8_t u) that will show this date in a readable form.
Write a function uint8_t average(uint8_t a, uint8_t b). This function should work for any two uint8_t, even 255 and 255.
Implement a struct angle to represent an angle in a 2D plane. Use the earlier struct point to define this.
Write the function bool is_right(struct angle a) to check whether the angle is a right angle.
Write functions is_acute(), is_obtuse(), is_straight(), is_reflex(), and is_full() for checking for various types of angles.
Define a struct triangle to represent triangles and write a double area(struct triangle t) to calculate its area.
Write a function struct Range gp(int a, int n, int r) to generate the first n terms of a geometric progression with starting number a and common ratio r.
Read the e-spigot algorithm for calculating digits of e. Use it for defining the function int *e(int ndigits).
Define struct bignum { int *digits; size_t ndigits }; to represent arbitrarily large natural numbers. Define: struct bignum bignum_add(struct bignum a, struct bignum b) to add two such numbers. Write a function void print_bignum(struct bignum a) to print such numbers. An example use of these functions is given below.
```
struct bignum a = make_bignum("123451298289357892734587108947235");
struct bignum b = make_bignum("289347508783452346781234534574562");
print_bignum(bignum_add(a, b));
```
Write int bignum_cmp(struct bignum a, struct bignum b). This function should return 0 if a and b are equal, -1 if a < b and 1 if a > b.

The following is an implementation in C. For generating \(n\) digits, this procedure needs roughly \(n^2\) operations. On today's computers, this means that this algorithm is infeasible for generating a million digits of \(e\) due to time constraints.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int *e(int ndigits)
{
    int n = ndigits + 2; // See Lemma 3.

    // We use A[1],...A[n] as in the algorithm.
    // A[0] will store the generated digit.
    int *A = malloc((n + 1) * sizeof(int));
    assert(A);

    int *digits = malloc(n * sizeof(int));
    assert(digits);

    for (size_t i = 1; i <= n; ++i) {
        A[i] = 1;
    }

    for (size_t i = 0; i < n; ++i) {
        A[0] = 0;

        // Multiply by 10.
        for (size_t j = 1; j <= n; ++j) {
            A[j] *= 10;
        }

        // Take the fractional part.
        for (size_t j = n; j >= 1; --j) {
            int q = A[j] / (j + 1);
            A[j] %= j + 1;
            A[j-1] += q;
        }

        // Output the next digit.
        digits[i] = A[0];
    }

    free(A);
    return digits;
}

void show_e(int ndigits)
{
    int *xs = e(ndigits);
    printf("e is approximately 2.");
    for (size_t i = 0; i < ndigits; ++i) {
        printf("%d", xs[i]);
    }
    free(xs);
}

int main()
{
    show_e(10);
    return 0;
}

e is approximately 2.7182818284

We can implement arbitrarily large numbers by storing the digits in an array. We store the lowest digit first as this is how addition accesses the digits. To add two n-digit numbers, it takes roughly n operations. That is, time scales linearly with input size.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

struct bignum {
    int *digits;
    size_t ndigits;
};

struct bignum make(const char *s)
{
    size_t n = strlen(s);
    int *digits = malloc(n * sizeof(int));
    size_t i = n-1;
    size_t j = 0;
    while (j < n) {
        digits[j] = s[i] - '0';
        --i;
        ++j;
    }
    return (struct bignum) {
        .digits = digits,
        .ndigits = n
    };
}

size_t max(size_t a, size_t b)
{
    return a >= b ? a : b;
}

struct bignum add(struct bignum a, struct bignum b)
{
    size_t an = a.ndigits;
    size_t bn = b.ndigits;
    size_t cn = max(an, bn) + 1;

    int *c = malloc(cn * sizeof(int));

    size_t i = 0, j = 0, k = 0;
    int carry = 0;
    while (i < an && j < bn) {
        int d = a.digits[i++] + b.digits[j++] + carry;
        if (d > 9) {
            c[k++] = d - 10;
            carry = 1;
        } else {
            c[k++] = d;
            carry = 0;
        }
    }
    while (i < an) {
        c[k++] = a.digits[i++] + carry;
        carry = 0;
    }
    while (j < bn) {
        c[k++] = b.digits[j++] + carry;
        carry = 0;
    }
    if (carry)
        c[k++] = carry;
    return (struct bignum) {
        .digits = c,
        .ndigits = k
    };
}

bool bignum_eq(struct bignum a, struct bignum b)
{
    if (a.ndigits != b.ndigits) return false;
    for (size_t i = 0; i < a.ndigits; ++i)
        if (a.digits[i] != b.digits[i]) return false;
    return true;
}

bool bignum_le(struct bignum a, struct bignum b)
{
    if (a.ndigits < b.ndigits) return true;
    if (b.ndigits < a.ndigits) return false;
    for (size_t i = 1; i <= a.ndigits; ++i)
            if (a.digits[a.ndigits-i] < b.digits[a.ndigits-i])
                    return true;
            else if (b.digits[a.ndigits-i] < a.digits[a.ndigits-i])
                    return false;
    return true;
}

int bignum_cmp(struct bignum a, struct bignum b)
{
    if (bignum_eq(a, b)) return 0;
    if (bignum_le(a, b)) return -1;
    return 1;
}

void print_bignum(struct bignum a)
{
    for (size_t i = 1; i <= a.ndigits; ++i)
        printf("%d", a.digits[a.ndigits-i]);
}

int main()
{
    struct bignum a = make("9999999999");
    struct bignum b = make("9999999999");
    print_bignum(a); printf(" + "); print_bignum(b);
    printf(" = "); print_bignum(add(a, b)); printf(".\n");

    struct bignum n1 = make("1000"), n2 = make("999");
    printf("1000 cmp  999 = %d.\n", bignum_cmp(n1, n2));
    printf(" 999 cmp 1000 = %d.\n", bignum_cmp(n2, n1));
    printf("1000 cmp 1000 = %d.\n", bignum_cmp(n1, n1));
    return 0;
}

9999999999 + 9999999999 = 19999999998.
1000 cmp  999 = 1. 999 cmp 1000 = -1.1000 cmp 1000 = 0.

Try the same addition with "normal" integers.

int a = 9999999999;
int b = 9999999999;
printf("%d + %d = %d.\n", a, b, a+b);

1410065407 + 1410065407 = -1474836482.

3.6. Some time-complexity exercises

Consider the function int fib(int n) computing Fibonacci numbers:
```
int fib(int n)
{
    if (n <= 1) return 1;
    return fib(n-1) + fib(n-2);
}
```
Try to compute fib(n) for many values of n. How large an n can be reasonably computed? What is the time complexity of this algorithm? Is there a faster way to compute this? (Hint: There is an n time and a log(n) time algorithm this)
Write unsigned int power_of(unsigned int a, unsigned int b). What is your solution's time complexity? Is it optimal?
Write the function void running_sum(int xs[], size_t n, int sums[]). After execution, sums[i] should be xs[0] + ... xs[i] for all i. What is the time complexity of your algorithm?
Write bool triplet(int xs[], size_t n) to find three distinct (by position, not necessarily value) elements in xs such that the sum of first two is the third. Assume that the array xs is sorted in non-decreasing order. What is the time complexity of your algorithm?
Write int max_contiguous_sum(int xs[], size_t n) to return the maximum sum obtained from a sub-array of xs. For example, if the input is { -2, 1, -3, 4, -1, 2, 1, -5, 4 }, the answer is 6. What is the time complexity of your solution? Note that if the contiguous restriction is not present, we can simply pick all positive numbers in the array.

3.7. Linked lists

Implement the following functions and test each of them appropriately.

Write a function void unlink_next(struct node *here) to remove the node after here in constant-time from the list.
A linked list is circular if the next pointer of some node points to an earlier node in the list. Write a function bool exists_cycle(struct node *head) that returns true iff the list head has a cycle.
Modify the linked list node structure to hold pointers to both previous and next nodes. What should be the previous of first node?
Write a function void insert_before(struct node *here, struct node *what); to insert a node what before here in the list. This should be constant-time.
Write a function void unlink(struct node *here) to remove the node here from a list. This should be constant-time.