Data-Centric Computing
Table of Contents
1. Policies
1.1. Evaluation
Attend lectures and labs only if you want to. It is expected that you will keep up with the discussions during lectures and labs even if you don't attend them.
The evaluation policy is as follows:
Assessment | Percentage | Remarks |
---|---|---|
Quiz x10 | 60 | Best 6 out of 10. Normalized. |
Mid-semester | 20 | |
End-semester | 20 |
- Quizzes will be conducted during lectures or labs. There will be no surprise quizzes.
- There will be no retakes if you miss a quiz. We will simply take best 6 of your scores.
- Any violation of the honor code of the institute is automatically an "F" grade.
- The syllabus for all assessments is whatever that has been discussed or suggested until and including the previous meeting.
1.2. Resources
There are no prescribed textbooks for this course. The topics we cover in this course are spread across many books, research papers, and other technical documents. The following list of links will collect the external resources.
You are expected to use these in addition to the following lecture notes. Your TAs are also a resource from which to learn. Use them.
2. Lectures
2.1. Overview
We call presentation of values as data. Data has various forms: numbers, text, audio, video, lists, tables, dictionaries, etc. Data can be classified loosely into two categories as follows:
- Low-level
- Convenient for consumption by computers.
- High-level
- Convenient for consumption by humans.
As programmers, we have to learn both. We write programs which help computers consume data. We also use programs that help us consume data, for example, spreadsheet programs.
For presenting the same value, there are usually many choices. For example, we represent by 42 and LXII the same value. The place-value system won because it was compact and it was convenient to do arithmetic on it. As programmers, we must learn to evaluate various representations and choose the best one for our requirements.
If I have to store scores of various quizzes, should I use this table?
Name | Q1 | Q2 |
---|---|---|
Raj | 5 | 7 |
Rahul | 6 | 3 |
or this one?
Name | Assessment | Score |
---|---|---|
Raj | Q1 | 5 |
Raj | Q2 | 7 |
Rahul | Q1 | 6 |
Rahul | Q2 | 3 |
How do you determine which is better?
2.2. Low-level data
Data has types. The type determines meaning and valid operations on data. In a computer, everything is represented as a sequence of bits. A bit is a type of data that has two values – 0 and 1. A group of 8 bits is a byte.
We will see some of the low-level types supported by the C programming language.
2.2.1. Unsigned Integers
A sequence of 8 bits can represent unsigned integers from 0 to 255. We use a place-value system with 2 as the base to interpret these numbers. The least significant bit has place value 1, the next one 2, the next 4, and so on.
The usual arithmetic operations are supported and results wrap to fit. Adding 1 to 255 yields 0 for 8-bit unsigned integers.
Bitwise operators operate bit-by-bit. The AND (&) operation outputs 1 if both bits are 1, the OR (|) operation outputs 1 when at least one of the bits are 1, the XOR (^) operation outputs 1 when exactly one of the bits are 1. The unary NOT (~) operation changes the bit.
#include <stdio.h> #include <stdint.h> int main() { uint8_t a = 0xa7, b = 0x14; printf("%x\n", a&b); printf("%x\n", a|b); printf("%x\n", a^b); printf("%x\n", ~a); return 0; }
4 b7 b3 ffffff58
To understand why there is a 32-bit output, see Integer Promotions in Beej's guide.
2.2.2. Signed Integers
An 8-bit signed integer represents values from -128 to 127. This is achieved by negating the place-value of the most significant bit.
Arithmetic operations should not overflow or underflow. The programmer should ensure this.
There are types like int8_t
that have specified bit-width. Types
like int
have a fixed, unspecified width.
#include <stdio.h> #include <stdint.h> int main() { int8_t i8 = 0xa3; int i = 25; printf("%d\n", i8); printf("%x\n", i); printf("%d\n", sizeof(i)); return 0; }
-93 19 4
2.2.3. Characters
A character is an 8-bit type that represents characters of the English alphabet.
#include <stdio.h> int main() { char c = 'a'; printf("%c (%x)\n", c, c); c = 'A'; printf("%c (%x)\n", c, c); return 0; }
a (61) A (41)
2.2.4. Products
A product type holds values of possibly distinct types simultaneously.
#include <stdio.h> struct point { int x; int y; }; int main() { struct point p = { .x = 1, .y = 2 }; printf("The square of distance of p: %d\n", p.x*p.x + p.y*p.y); p.x += 2; p.y += 3; printf("The square of distance of p: %d\n", p.x*p.x + p.y*p.y); return 0; }
The square of distance of p: 5 The square of distance of p: 34
#include <stdio.h> struct employee { int salary; char designation; }; int main() { struct employee p = { .salary = 40000, .designation = 'M' }; printf ( "%s earning %d.\n", p.designation == 'M' ? "Manager" : "Worker", p.salary ); return 0; }
Manager earning 40000.
2.2.5. Unions
A union holds values of multiple types but not simultaneously. All values are stored at the same place. So only one of them can exist at a time.
#include <stdio.h> union ioc { int i; char c; }; int main() { union ioc u; u.i = 42; printf("%d %x\n", u.i, u.c); u.c = 10; printf("%d %x\n", u.i, u.c); return 0; }
42 2a 10 a
2.2.6. Sums
We can combine struct
and union
to create a sum. A sum holds one
value at a time along with a tag to identify which one is being held.
#include <stdio.h> struct animal { int tag; // 0 - cat, 1 - fish. union { struct { int whiskers; } cat; struct { int scales; } fish; }; }; void describe(struct animal a) { switch (a.tag) { case 0: printf("Cat with %d whiskers.\n", a.cat.whiskers); break; case 1: printf("Fish with %d scales.\n", a.fish.scales); break; default: printf("I don't recognize this animal.\n"); } } int main() { struct animal a = { .tag = 0, .cat.whiskers = 10 }; struct animal b = { .tag = 1, .fish.scales = 100 }; describe(a); describe(b); return 0; }
Cat with 10 whiskers. Fish with 100 scales.
2.2.7. Arrays
An array is a sequence of values of the same type. We can access individual elements using indexing. Indices start at zero.
#include <stdio.h> int main() { int primes[] = { 2, 3, 5, 7, 9, 13 }; printf("The fifth prime is %d.\n", primes[4]); primes[4] = 11; printf("The fifth prime is %d.\n", primes[4]); return 0; }
The fifth prime is 9. The fifth prime is 11.
We usually traverse arrays in C using a three-clause for
loop. Here's an implementation of the linear-search algorithm:
#include <stdio.h> ssize_t linsearch(int haystack[], int n, int needle) { for (size_t i = 0; i < n; ++i) { if (haystack[i] == needle) return i; } return -1; } int main() { int primes[] = { 2, 3, 5, 7, 11, 13 }; printf("%d found at %d.\n", 5, linsearch(primes, 6, 5)); printf("%d found at %d.\n", 12, linsearch(primes, 6, 12)); return 0; }
5 found at 2. 12 found at -1.
2.2.8. Pointers
Every memory location has an address. A pointer is a type where values are addresses. The value held at the address can be changed through a pointer. Changing the pointer makes it point to a different location.
#include <stdio.h> int main() { int i = 0, j = 5; int *p = &i; printf("%d@%x %d@%x\n", i, &i, j, &j); *p = 3; printf("%d@%x %d@%x\n", i, &i, j, &j); p = &j; *p = 2; printf("%d@%x %d@%x\n", i, &i, j, &j); return 0; }
0@1ed95954 5@1ed95950 3@1ed95954 5@1ed95950 3@1ed95954 2@1ed95950
Here are some relationships pointers in C have to other concepts.
- You can use the address-of operator, denoted
&
, to obtain the address of variables, struct or union members, and array members. - The special value
NULL
is valid for any pointer and means that the pointer is not pointing to any object. - If
a
is an array variable, then the expressiona
is a pointer to the first element of the array. This is called pointer-to-array decay in C. - The operator
s->a
is equivalent to(*s).a
.
2.2.9. Functions and data
The local variables of a function are created when the function begins and destroyed once it returns. Local variables include both parameters and any variable declared within the function.
int *min(int x, int y) { return (x < y) ? &x : &y; }
The above code has an error that won't be caught by the compiler. The function returns the address of a local variable. However, the caller of the function has no way of using that address as by the time the caller obtains the address, the object at that location has been destroyed.
The following code works.
int *min(int *x, int *y) { return (*x < *y) ? x : y; } int main() { int i = 5, j = 10; printf("%d\n", *min(&i, &j)); return 0; }
5
The return value of min()
is an address of a variable in
main()
. So main()
receives the address of an object that is local
to itself.
In C, arrays are pointers.
int second(int *a, int n) { assert(n >= 2); return a[1]; } int main() { int primes[] = { 2, 3, 5 }; printf("%d\n", second(primes, 3)); return 0; }
3
When we pass primes
as an argument to second()
, only the address
of the array is passed. The parameter n
is the number of integers
present contiguously from the base address. Notice that within
second()
, we can use a
like an array. An expression of the form
base[offset]
is interpreted by C as the offset-th object from the
base address. It does not care whether base is an array or a pointer.
We can also use address-passing as a mechanism to modify the local variables of a function from the called function.
void set_second(int *a, int n, int v) { assert(n >= 2); a[1]= v; } int main() { int primes[] = { 2, 4, 5 }; set_second(primes, 2, 3); printf("%d\n", primes[1]); return 0; }
3
Observe that modifying a[1]
changes primes[1]
. This is because a
and primes
are the same base address.
And, the modified objects need not be arrays.
void minmaxify(int *x, int *y) { if (*x < *y) return; int t = *y; *y = *x; *x = t; } int main() { int i = 10, j = 5; minmaxify(&i, &j); printf("%d %d\n", i, j); return 0; }
5 10
You can observe the similar behaviour in python. Somewhat unpredictably.
def modify1(x): x = x + 1 def modify2(xs): xs[1] = 3 y = 5 ys = [2, 4, 5] modify1(y) modify2(ys) print(y) print(ys)
5 [2, 3, 5]
Observe that y
remained the same while ys
got modified. This is
because x
and y
refer to different objects with the same value
initially while xs
and ys
refer to the same object.
2.2.10. Representing contiguous sequences
As we saw earlier, arrays when passed to functions become just pointers to the first element. So the receiver has no way of knowing how many elements are there in the array. One way to solve this problem is to pass around the length of the array along with this pointer.
Another approach is the sentinel technique, where a special element is used to signal that the sequence has ended. In C, strings use this data representation.
char *s = "Hello"; for (size_t i = 0; i <= 5; ++i) { printf("%c (%d)\n", s[i] ? s[i] : '?', s[i]); }
H (72) e (101) l (108) l (108) o (111) ? (0)
2.2.11. Recursive types
We can represent sequences using non-contiguous data in memory. This is used in situations where the sequence grows over time and we want to be able to place new elements at arbitrary positions in the sequence. This data structure is called a linked-list.
2.2.11.1. Lists
Here's an implementation of a linked-list. A linked-list stores a
sequence. Each element of the sequence is stored in a node. The node
also stores the address of the next node. The last node's next address
is NULL
to indicate that the sequence has ended.
#include <stdio.h> #include <stdlib.h> #include <assert.h> struct node { int data; struct node *next; }; struct node *empty() { return NULL; } struct node *cons(int a, struct node *rest) { struct node *n = malloc(sizeof(struct node)); assert(n); n->data = a; n->next = rest; return n; } void print_list(struct node *n) { printf("("); while (n) { printf("%d, ", n->data, n); n = n->next; } printf(")\n"); } int main() { struct node *xs = empty(); for (int i = 2; i < 1000; i += 2 * i) { xs = cons(i, xs); } print_list(xs); return 0; }
(486, 162, 54, 18, 6, 2, )
Observe that given a pointer to the first node and a number i
, there
is no way to calculate the address of the i
th node. We have to read
the pointers in first i-1
nodes. What about arrays?
What are some advantages of storing sequences this way?
Here's a function to insert a new node at some arbitrary position in the list. Note that it only takes a few operations. What if I wanted to do this in an array?
void insert_after(struct node *here, struct node *what) { assert(here); assert(what); struct node *save = here->next; here->next = what; what->next = save; }
We can also write a function to remove a node with a few operations. We should get a pointer to the predecessor to do this.
struct node *skip(struct node *here) { assert(here); struct node *skipped = here->next; here->next = skipped ? skipped->next ? NULL; return skipped; }
2.3. Fundamental algorithms
2.3.1. Time complexity
Every algorithm has an associated time complexity. First, we define the size of the input for the problem and then measure the rate of growth of the running time wrt the size. A few examples:
- Consider
is_prime(n)
. We can consider magnitude of inputn
to be the size of the input. Then, the straight-forward algorithm executes for at most \(a\sqrt{n} + b\) time for \(a, b\neq 0\). - Consider the problem of adding two n-digit numbers. The addition with carry algorithm takes \(a'n+b'\) time for some \(a', b'\neq 0\).
- Consider multiplying two n-digit numbers. It takes \(a''n^2+b''n+c''\) for some non-zero \(a'', b'', c''\).
We are not really interested in finding the values of \(a, b, c\) etc. What we really want to understand is the rate of change. We can understand this by computing certain limits. If \(f(n)\) is the time, then limit \(n\) tends to \(\infty\) \(f(2n)/f(n)\) tells us how time increases when we double the input size. We take \(n\) tends to \(\infty\) as we don't really care how long it takes for small inputs. Computing this limit for the three examples yield constants 1.42, 2, and 4 which tells us how much we can expect time to increase if we double the input size. Notice that the values of \(a, b, c\) etc. are irrelevant for this calculation.
Let us do some back of the hand calculations to understand the absolute growth too. A computer can do roughly \(10^8\) basic operations per second. So, it can add two million-digit numbers in under a second. But, to multiply two million-digit numbers, it will take more than two hours. This tells us that we need better methods if we want to multiply million-digit numbers on a computer.
Consider the problem of computing \(\lceil\log_2(n)\rceil\) given \(n\). It takes only \(a\log(n)+b\) steps for some \(a, b\). This function grows slower than the previous examples. Doubling the input size makes the algorithm do \(a\) more basic operations. Or, put differently to double the running-time, you have to square the input.
Some common functions that appear as time complexity of algorithms:
- Logarithms
- \(\log(n)\)
- Polynomials
- \(n^k\) for constant \(k\).
- Exponentials
- \(k^n\) for constant \(k\).
- Products
- \(fg\) where \(f\) and \(g\) are functions.
- Compositions
- \(f\circ g\) where \(f\) and \(g\) are functions.
Notice that sums are absent from the list. This is because \(f+g\) has the same rate of growth as the "larger" function. Consider \(f(n)=n^2+n\) and \(g(n)=n^2\) and observe that the above limit is two for both functions. For the same reason, time complexities \(n\) and \(n\log(n)\) or \(2^n\) and \(2^nn^2\) are also considered similar (This is less common in theory, but practically there is rarely any difference).
2.3.2. Searching
We have seen linear search earlier. A binary search works when the input is a sorted array. It is faster, taking only \(\log(n)\) time for arrays of size \(n\). Instead of comparing elements from first to last, we pick the middle element of the array first, if it is less than the element being searched for, the needle, we can narrow our search to the second half of the array. If it is more, then narrow to the left half. Therefore, at each step, we halve our search space. So the total number of steps is logarithmic.
#include <stdio.h> int *binsearch(int haystack[], size_t n, int needle) { ssize_t low = 0, high = n-1; ssize_t mid; while (low <= high) { mid = low + (high - low)/2; if (haystack[mid] < needle) low = mid+1; else if (haystack[mid] > needle) high = mid-1; else return haystack+mid; } return NULL; } int main() { int haystack[] = { 1, 2, 3, 4, 6, 7, 8 }; int needles[] = { 4, 1, 7, 0, 9, 5 }; for (size_t i = 0; i < sizeof(needles)/sizeof(needles[0]); ++i) printf("%p\n", binsearch(haystack, 7, needles[i])); return 0; }
0x7ffdbde77dfc 0x7ffdbde77df0 0x7ffdbde77e04 (nil) (nil) (nil)
2.3.3. Sorting
There are many algorithms to sort a sequence of elements based on some order. The following algorithm is called insertion sort:
#include <stdio.h> void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; } void print_array(int xs[], size_t n) { printf("("); for (size_t i = 0; i < n; ++i) printf("%d, ", xs[i]); printf(")\n"); } void sort(int xs[], size_t n) { if (!n) return; for (size_t i = 0; i < n-1; ++i) { size_t j = i+1; while (j > 0 && xs[j] < xs[j-1]) { swap(&xs[j], &xs[j-1]); --j; } } } int main() { int xs[] = { 5, 1, 9, 4, 6, 7, 3, 2 }; size_t n = sizeof(xs)/sizeof(xs[0]); sort(xs, n); print_array(xs, n); return 0; }
(1, 2, 3, 4, 5, 6, 7, 9, )
It takes \(n^2\) time to sort an array of \(n\) elements. This is not the fastest algorithm for this problem. But it works quite well when the array is almost sorted.
3. Labs
3.1. C datatypes
One of the first goals is to understand precisely how fundamental types behave in C. Here are some exercises that will help.
- Compile and run all C programs you saw in the lecture.
- Write a program to print the binary representation of a
uint8_t
value. - Can you implement unary negation of an
int8_t
using only bitwise operations and addition? - Given an
int8_t
, how do you compute auint8_t
that holds the absolute value of the given integer? - Write a program to print the binary representation of a
int8_t
value. - There is a datatype called
long
, how many bytes is it on your machine? - Read and understand the shift operators in C.
- Implement right rotation of
uint8_t
using bitwise and shift operators. A right rotate operation takes auint8_t
and a value between 0 and 7 inclusive and rotates the bits towards the right. For example, the eight-bit number10110101
right rotated by three gives10110110
. - Implement left rotation similarly.
- Swap two half-bytes of a
uint8_t
. For example,0xca
should become0xac
. - Write code to check whether an
unsigned int
is a power of two. You can do this with one simple expression! - Compute the absolute value of a
int8_t
without using any conditional statements (manipulate the bits). - Write code to find the largest power of two smaller than a given
uint32_t
. Do it in the most efficient way possible.
3.1.1. Solutions
You can print the binary representation by doing a bitwise AND with the appropriate values.
The first question is just an exercise to make sure that the setup on your machine is working.
You can print the binary representation by doing a bitwise AND with the appropriate values. Since we know our input has exactly eight bits, we don't need a loop either.
#include <stdio.h> #include <stdint.h> int main() { uint8_t u = 0xa3; if (u & 0x80) printf("1"); else printf("0"); if (u & 0x40) printf("1"); else printf("0"); if (u & 0x20) printf("1"); else printf("0"); if (u & 0x10) printf("1"); else printf("0"); if (u & 0x08) printf("1"); else printf("0"); if (u & 0x04) printf("1"); else printf("0"); if (u & 0x02) printf("1"); else printf("0"); if (u & 0x01) printf("1"); else printf("0"); return 0; }
10100011
Why does an expression like u & 0x80
work as a condition? In C, all
integer values other than zero are considered true and zero is
considered false.
Another way to do this is to iterate exactly eight times.
#include <stdio.h> #include <stdint.h> int main() { uint8_t u = 0xa3; for (int i = 0; i < 8; ++i) { if (u & (0x80 >> i)) printf("1"); else printf("0"); } return 0; }
10100011
You can implement unary negation by using the equation -a == ~a + 1
.
You can get absolute value by simply comparing with 0 and taking the negation if necessary.
int8_t i8 = -128; uint8_t u8; printf("%d\n", i8); if (i8 >= 0) u8 = i8; else u8 = -i8; printf("%d\n", u8);
-128 128
Notice that it correctly worked with -128
too. But why? When we
negate -128
, we get 128
which is the same as 0
as a signed
integer. But it still printed 128
.
The same code for uint8_t
works for printing bits of int8_t
. Just
change types.
The size of long
can be printed using an expression such as:
printf("%d\n", sizeof(long));
8
We can write a function to implement right rotation. Let s
be the
amount to rotate. We first extract the lower-order s
bits, then
right-shift, and then place the extracted bits at the beginning.
#include <stdio.h> #include <stdint.h> #include <stddef.h> uint8_t rrot(uint8_t u, size_t s) { uint8_t lower = u & ((0x1 << s) - 1); // extract u >>= s; // shift u |= lower << (8-s); // put back return u; } int main() { printf("%x\n", rrot(0xb5, 3)); return 0; }
b6
To understand the above code, take a pen and paper and try and simulate what the computer did by hand. You can also ask chatgpt to ask how this works. But don't forget to verify that its answer is correct by verifying with authoritative sources.
Left rotation is left as an exercise.
To swap the two bytes, one could use rotation. But, it is simpler to write the flags directly.
#include <stdio.h> #include <stdint.h> int main() { uint8_t u = 0xac; printf("%x\n", (u & 0xf0) >> 4 | (u & 0x0f) << 4); return 0; }
ca
To check whether a number is a power of two, remember the quiz question.
#include <stdio.h> #include <stdbool.h> bool is_pow2(unsigned int u) { return !(u & (u - 1)); } int main() { printf("%d %d %d\n", is_pow2(16), is_pow2(17), is_pow2(0)); }
1 0 1
The expression gets it wrong only when u
is 0
. Zero is not a power
of two. Recall that Booleans are integers in C. So we can do the following:
#include <stdio.h> #include <stdbool.h> bool is_pow2(unsigned int u) { return u * !(u & (u - 1)); } int main() { printf("%d %d %d\n", is_pow2(16), is_pow2(17), is_pow2(0)); }
1 0 0
To compute the absolute value:
uint8_t abs(int8_t i) { return (i ^ (i >> 7)) - (i >> 7); } int main() { printf("%u %u %u %u\n", abs(-1), abs(1), abs(-128), abs(-127)); return 0; }
1 1 128 127
Why does this work? If i
is non-negative, i >> 7
is 0 and just i
is returned. If i
is negative, then i >> 7
is just -1
(right-shifting shifts in the most significant bit value). So i ^ (i
>> 7)
flips all the bits of i
. The result is off by one. So we add
one back.
Finding the largest power of two:
uint32_t pow2(uint32_t n) { if (n == 0) return 0; n |= n >> 1; n |= n >> 2; n |= n >> 4; n |= n >> 8; n |= n >> 16; return n - (n >> 1); } int main() { printf("%u %u %u\n", pow2(16), pow2(13), pow2(100)); return 0; }
16 8 64
Analyze the code and try and come up with an explanation of why it works.
3.2. Structs, arrays, and pointers
The double
type in C can approximate rational numbers. Use it to
represent coordinates. Write a struct
to represent lines in a plane
and write code to check whether or not two lines intersect. Use the
following code as a starting point.
#include <stdio.h> struct line { // Fill in the members to represent a line. // Can you represent horizontal, vertical, and slanted lines? // SOLUTION: // We will use values of a, b, c from ax + by + c = 0. double a, b, c; }; int main() { // Fill these so that these become valid lines. // Notice that both a and b should not be 0. struct line L1 = { .a = 0, .b = 1, .c = 3 }; struct line L2 = { .a = 0, .b = 2, .c = 6 }; // Write code here to check whether L1 and L2 intersect. // Change L1 and L2 and test various cases. // And, print whether or not they do. double det = L1.a * L2.b - L2.a * L1.b; if (det != 0) { printf("The lines intersect at a unique point.\n"); } else { if (L1.a * L2.c == L2.a * L1.c && L1.b * L2.c == L2.b * L1.c) { printf("The lines are coincident.\n"); } else { printf("The lines are parallel.\n"); } } return 0; }
The lines are coincident.
Can you simplify the code? Did I cover all cases correctly? Can you print the point of intersection if any?
We can now write struct
to represent rectangles and circles. Also,
write code to check whether or not these shapes overlap.
#include <stdio.h> struct point { double x; double y; }; struct rectangle { // Fill here. }; struct circle { // Fill here. }; int main() { // Fill in members so that you get valid rectangles and circles. struct rectangle R1 = {}; struct rectangle R2 = {}; struct circle C1 = {}; struct circle C2 = {}; // Write code here to check whether R1 and R2 intersect and print // the result. // Write code here to check whether C1 and C2 intersect and print // the result. return 0; }
Finally, let us write some code to find the farthest point from the origin.
#include <stdio.h> struct point { double x; double y; }; int main() { // Fill the array so that you have at least five points. struct point ps[] = { { .x = 0, .y = 0 }, { .x = 0, .y = 1 }, { .x = 1, .y = 0 }, { .x = 2, .y = 0 }, { .x = 1, .y = 1 }, }; struct point farthest; // Write code here to compute the farthest point from origin in the // array ps into the variable farthest. farthest = ps[0]; for (size_t i = 1; i < sizeof(ps)/sizeof(ps[0]); ++i) { if (ps[i].x*ps[i].x + ps[i].y*ps[i].y > farthest.x*farthest.x + farthest.y*farthest.y) farthest = ps[i]; } printf("The farthest point is (%.2lf, %.2lf).\n", farthest.x, farthest.y); return 0; }
The farthest point is (2.00, 0.00).
An exercise on initializing pointers.
#include <stdio.h> int main() { int a = 14, b = 25; int *min; int *max; // Write code here so that the following printf works as // expected. Do not change the values of a and b. if (a < b) { min = &a; max = &b; } else { min = &b; max = &a; } printf("min = %d, max = %d\n", *min, *max); return 0; }
min = 14, max = 25
Some array manipulation.
#include <stdio.h> int main() { // Fill the array with some elements. int xs[] = { 3, 9, -5, 6, 4, -1, 0, 4, 0, 2 }; // Write code here so that the elements in xs are reversed. for ( size_t i = 0, j = sizeof(xs)/sizeof(xs[0]) - 1; i < j; ++i, --j ) { int t = xs[j]; xs[j] = xs[i]; xs[i] = t; } // Use size_t for indexing into arrays. for (size_t i = 0; i < sizeof(xs)/sizeof(xs[0]); ++i) { printf("%d ", xs[i]); } return 0; }
2 0 4 0 -1 4 6 -5 9 3
More array processing.
#include <stdio.h> int main() { // Fill the array with some elements. int xs[] = { 3, 9, -5, 6, 4, -1, 0, 4, 0, 2 }; int positive = 0, negative = 0, zero = 0; // Modift the code below so that everything works as expected. for (size_t i = 0; i < sizeof(xs)/sizeof(xs[0]); ++i) { if (xs[i] > 0) ++positive; else if (xs[i] < 0) ++negative; else ++zero; } printf("The array has %d positive elements, %d negative elements, and %d zeroes.\n", positive, negative, zero); return 0; }
The array has 6 positive elements, 2 negative elements, and 2 zeroes.
Write a struct
to store a credit card number and write Luhn
algorithm to verify the validity of the credit card number.
Write a struct
to store date and time (year, month, day, hour,
minute, second). Define the struct so that it has the smallest
size. Can you pack all the information in such a struct into a
uint32_t
? What about a uint64_t
? Write code to convert data
between those two representations.
3.3. Functions
Fix the following function and test it.
void drop_to_x_axis(struct point p) { p.x = 0; }
First, it drops to y
axis. Second, it cannot modify the point in the
caller. The fixed version is:
void drop_to_x_axis(struct point *p) { p->y = 0; }
or you can return a new point.
Write a function to take an array of struct person
with an age
member and return their average age.
int average_age(struct person *ps, int n) { double a = 0.0; for (size_t i = 0; i < n; ++i) { a += ps[i].age; } return a/n; }
Write a function to compute the midpoint of a line segment. There are two ways to write this corresponding to the following functions. Write both of them. Discuss when one is better than the other.
struct point midpoint(struct point p1, struct point p2); void midpoint(struct point p1, struct point p2, struct point *p3);
The first one is:
struct point midpoint(struct point p1, struct point p2) { return (struct point) { .x = (p1.x + p2.x) / 2, .y = (p1.y + p2.y) / 2 }; }
The second one is:
void midpoint(struct point p1, struct point p2, struct point *p3) { *p3 = (struct point) { .x = (p1.x + p2.x) / 2, .y = (p1.y + p2.y) / 2 }; }
The first one is better as it allows midpoint
to be used in
expressions naturally. As in, midpoint(midpoint(p, q), midpoint(r,
s))
. Try and writing the equivalent code in the second version.
Write a function to compute the intersection point of two lines. Implement the following two functions and discuss when one is better than the other.
struct point intersection(struct line l1, struct line l2); void intersection(struct line l1, struct line l2, struct point *p);
This is same as previous question except for some calculations. There is a crucial difference here. The intersection of two lines need not exist (unlike midpoint of two points). So the functions should assert that the two given lines have an intersection and otherwise error as it cannot return any value. The following function is better:
bool intersection(struct line l1, struct line l2, struct point *p);
Now, the bool
return value indicates whether or not an intersection
exists. The function should be written so that if (and only if) the
return value is true, the intersection should be stored in *p
. This
function need not ever error out.
Write a program that modifies an integer through a pointer to a pointer to an int.
int i = 0; int *p = &i; int **q = &p; **q = 1; printf("%d\n", i);
1
Notice that the expression &&i
wouldn't work. Why?
Write a function to swap the values of two integers in memory. What are the parameters? What is the return value?
#include <stdio.h> void swap(int *x, int *y) { int t = *y; *y = *x; *x = t; } int main() { int i = 10, j = 5; swap(&i, &j); printf("%d %d\n", i, j); return 0; }
5 10
Write a function to sum the integers in an array. Can your function be used to sum sub-arrays?
#include <stdio.h> int sum(int *xs, int n) { int s = 0; for (size_t i = 0; i < n; ++i) s += xs[i]; return s; } int main() { int xs[] = { 2, 3, 5, 7, 11, 13 }; printf("%d %d\n", sum(xs, 6), sum(xs + 2, 3)); return 0; }
41 23
Implement binary-search on an array of integers. (Left as exercise)
Write the following function. Determine the number of integer
comparisons your function will make in the worst-case. The
straight-forward algorithm will do 2n
comparisons. There is an
algorithm that uses at most 3n/2
comparisons.
#include <stdio.h> #include <assert.h> void minmax(int xs[], int n, int *min, int *max) { assert(min && max); if (n <= 0) return; /* initialize */ *min = *max = xs[0]; size_t i; for (i = 1; i < n-1; i += 2) { int m, M; if (xs[i] < xs[i+1]) { m = xs[i]; M = xs[i+1]; } else { m = xs[i+1]; M = xs[i]; } if (m < *min) *min = m; if (M > *max) *max = M; } if (i < n) { if (xs[i] < *min) *min = xs[i]; if (xs[i] > *max) *max = xs[i]; } } int main() { int xs[] = { 0, 5, 20, 8, -5, 3, 9, -14, 7 }; int m, M; minmax(xs, 9, &m, &M); printf("%d %d\n", m, M); minmax(xs, 7, &m, &M); printf("%d %d\n", m, M); return 0; }
-14 20 -5 20
Observe that for every two elements in the array, we do three comparisons.
3.4. Dynamic memory allocation
Implement Python's range()
.
int *range(int start, int stop, int step);
The int *
return type is not suitable here. If we return just a
pointer to the base of an array, how does the caller know how many
elements are there in the array? A solution is to return both a
pointer and the number of elements packaged into a struct.
#include <stdio.h> #include <stdlib.h> struct Range { int *array; size_t size; }; struct Range range(int start, int stop, int step) { if (step == 0) { return (struct Range) { NULL, 0 }; } size_t length = 0; if (step > 0) { length = (stop > start) ? (stop - start + step - 1) / step : 0; } else { length = (start > stop) ? (start - stop - step - 1) / -step : 0; } int *arr = malloc(length * sizeof(int)); if (arr == NULL) { return (struct Range) { NULL, 0 }; } for (size_t i = 0; i < length; ++i) { arr[i] = start + i * step; } return (struct Range) { arr, length }; } void free_range(struct Range r) { free(r.array); } void print_range(struct Range r) { printf("("); for (size_t i = 0; i < r.size; ++i) { printf("%d, ", r.array[i]); } printf(")\n"); } int main() { struct Range r = range(0, 10, 1); if (r.array) print_range(r); free_range(r); r = range(5, -10, -3); if (r.array) print_range(r); free_range(r); return 0; }
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ) (5, 2, -1, -4, -7, )
Write a function to compute the prime factors of a given number.
int *prime_factors(int n);
Here, an int *
suffices. An int
in C can have at most 8 * sizeof(int) - 1
prime factors (why?). This is a small amount of
memory. We can put a 0
as the last element in the array to signal
that the prime factors has ended. That is, for input 60
, we can
output the just the base address of the array { 2, 2, 3, 5, 0
}
. This works because 0
is never a valid divisor. This data
representation of terminating a sequence with a special element is
called the sentinel technique. Examples of this include a full-stop
(".") to end a sentence in English, the word "over" to end a sentence
when speaking over radio.
#include <stdio.h> #include <stdlib.h> int* prime_factors(int n) { if (n <= 1) return NULL; int *factors = malloc(8 * sizeof(int) * sizeof(int)); int index = 0; while (n % 2 == 0) { factors[index++] = 2; n /= 2; } for (int i = 3; i * i <= n; i += 2) { while (n % i == 0) { factors[index++] = i; n /= i; } } if (n > 2) factors[index++] = n; factors[index] = 0; return factors; } void print_prime_factors(int *factors) { int i; for (i = 0; factors[i+1] != 0; i++) printf("%d * ", factors[i]); printf("%d\n", factors[i]); } int main() { int xs[] = { 13, 39, 56, 1000007, 1000000007, 0 }; for (size_t i = 0; xs[i]; ++i) { int *factors = prime_factors(xs[i]); if (factors != NULL) { printf("%12d = ", xs[i]); print_prime_factors(factors); } free(factors); } return 0; }
13 = 13 39 = 3 * 13 56 = 2 * 2 * 2 * 7 1000007 = 29 * 34483 1000000007 = 1000000007
Implement Sieve of Eratosthenes to figure out the prime numbers and composite numbers below a given number. Use the following skeleton to get started.
#include <stdio.h> #include <stdbool.h> #include <stdlib.h> int main() { int n = 30; bool *elts = malloc(n * sizeof(bool)); for (size_t i = 0; i < n; ++i) elts[i] = true; for (size_t d = 2; d < n; ++d) { if (!elts[d]) continue; for (size_t m = 2 * d; m < n; m += d) { elts[m] = false; } } printf("primes under %d: (", n); for (size_t i = 2; i < n; ++i) { if (elts[i]) printf("%d, ", i); } printf(")\n"); }
primes under 30: (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, )
Read about Josephus problem. Implement a program to determine the
survivor for a user entered n
and k
. Use the previous program as a
guideline.
#include <stdio.h> #include <stdlib.h> #include <stdbool.h> size_t next_alive(bool *persons, size_t n, size_t cur) { cur = (cur + 1) % n; while (!persons[cur]) cur = (cur + 1) % n; return cur; } size_t josephus(size_t n, size_t k) { bool *persons = malloc(n * sizeof(bool)); for (size_t i = 0; i < n; ++i) persons[i] = true; size_t ndead = 0; size_t cur = 0; while (ndead < n-1) { size_t i = 1; while (i < k) { ++i; cur = next_alive(persons, n, cur); } persons[cur] = false; ++ndead; cur = next_alive(persons, n, cur); } return cur+1; } int main() { struct { size_t n, k; } xs[] = { { .n = 41, .k = 3 }, { .n = 10, .k = 2 }, { .n = 15, .k = 2 }, }; for (size_t i = 0; i < sizeof(xs)/sizeof(xs[0]); ++i) printf ( "josephus(%zu, %zu) = %zu.\n", xs[i].n, xs[i].k, josephus(xs[i].n, xs[i].k)); return 0; }
josephus(41, 3) = 31. josephus(10, 2) = 5. josephus(15, 2) = 15.
3.5. Review
- Design a scheme to represent a date in this semester in a
uint8_t
. Write a functionvoid print_date(uint8_t u)
that will show this date in a readable form. - Write a function
uint8_t average(uint8_t a, uint8_t b)
. This function should work for any twouint8_t
, even 255 and 255. - Implement a
struct angle
to represent an angle in a 2D plane. Use the earlierstruct point
to define this. - Write the function
bool is_right(struct angle a)
to check whether the angle is a right angle. - Write functions
is_acute()
,is_obtuse()
,is_straight()
,is_reflex()
, andis_full()
for checking for various types of angles. - Define a
struct triangle
to represent triangles and write adouble area(struct triangle t)
to calculate its area. - Write a function
struct Range gp(int a, int n, int r)
to generate the firstn
terms of a geometric progression with starting numbera
and common ratior
. - Read the e-spigot algorithm for calculating digits of e. Use it for
defining the function
int *e(int ndigits)
. Define
struct bignum { int *digits; size_t ndigits };
to represent arbitrarily large natural numbers. Define:struct bignum bignum_add(struct bignum a, struct bignum b)
to add two such numbers. Write a functionvoid print_bignum(struct bignum a)
to print such numbers. An example use of these functions is given below.struct bignum a = make_bignum("123451298289357892734587108947235"); struct bignum b = make_bignum("289347508783452346781234534574562"); print_bignum(bignum_add(a, b));
- Write
int bignum_cmp(struct bignum a, struct bignum b)
. This function should return0
ifa
andb
are equal,-1
ifa < b
and1
ifa > b
.
The following is an implementation in C. For generating \(n\) digits, this procedure needs roughly \(n^2\) operations. On today's computers, this means that this algorithm is infeasible for generating a million digits of \(e\) due to time constraints.
#include <stdio.h> #include <stdlib.h> #include <assert.h> int *e(int ndigits) { int n = ndigits + 2; // See Lemma 3. // We use A[1],...A[n] as in the algorithm. // A[0] will store the generated digit. int *A = malloc((n + 1) * sizeof(int)); assert(A); int *digits = malloc(n * sizeof(int)); assert(digits); for (size_t i = 1; i <= n; ++i) { A[i] = 1; } for (size_t i = 0; i < n; ++i) { A[0] = 0; // Multiply by 10. for (size_t j = 1; j <= n; ++j) { A[j] *= 10; } // Take the fractional part. for (size_t j = n; j >= 1; --j) { int q = A[j] / (j + 1); A[j] %= j + 1; A[j-1] += q; } // Output the next digit. digits[i] = A[0]; } free(A); return digits; } void show_e(int ndigits) { int *xs = e(ndigits); printf("e is approximately 2."); for (size_t i = 0; i < ndigits; ++i) { printf("%d", xs[i]); } free(xs); } int main() { show_e(10); return 0; }
e is approximately 2.7182818284
We can implement arbitrarily large numbers by storing the digits in an array. We store the lowest digit first as this is how addition accesses the digits. To add two n-digit numbers, it takes roughly n operations. That is, time scales linearly with input size.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <stdbool.h> struct bignum { int *digits; size_t ndigits; }; struct bignum make(const char *s) { size_t n = strlen(s); int *digits = malloc(n * sizeof(int)); size_t i = n-1; size_t j = 0; while (j < n) { digits[j] = s[i] - '0'; --i; ++j; } return (struct bignum) { .digits = digits, .ndigits = n }; } size_t max(size_t a, size_t b) { return a >= b ? a : b; } struct bignum add(struct bignum a, struct bignum b) { size_t an = a.ndigits; size_t bn = b.ndigits; size_t cn = max(an, bn) + 1; int *c = malloc(cn * sizeof(int)); size_t i = 0, j = 0, k = 0; int carry = 0; while (i < an && j < bn) { int d = a.digits[i++] + b.digits[j++] + carry; if (d > 9) { c[k++] = d - 10; carry = 1; } else { c[k++] = d; carry = 0; } } while (i < an) { c[k++] = a.digits[i++] + carry; carry = 0; } while (j < bn) { c[k++] = b.digits[j++] + carry; carry = 0; } if (carry) c[k++] = carry; return (struct bignum) { .digits = c, .ndigits = k }; } bool bignum_eq(struct bignum a, struct bignum b) { if (a.ndigits != b.ndigits) return false; for (size_t i = 0; i < a.ndigits; ++i) if (a.digits[i] != b.digits[i]) return false; return true; } bool bignum_le(struct bignum a, struct bignum b) { if (a.ndigits < b.ndigits) return true; if (b.ndigits < a.ndigits) return false; for (size_t i = 1; i <= a.ndigits; ++i) if (a.digits[a.ndigits-i] < b.digits[a.ndigits-i]) return true; else if (b.digits[a.ndigits-i] < a.digits[a.ndigits-i]) return false; return true; } int bignum_cmp(struct bignum a, struct bignum b) { if (bignum_eq(a, b)) return 0; if (bignum_le(a, b)) return -1; return 1; } void print_bignum(struct bignum a) { for (size_t i = 1; i <= a.ndigits; ++i) printf("%d", a.digits[a.ndigits-i]); } int main() { struct bignum a = make("9999999999"); struct bignum b = make("9999999999"); print_bignum(a); printf(" + "); print_bignum(b); printf(" = "); print_bignum(add(a, b)); printf(".\n"); struct bignum n1 = make("1000"), n2 = make("999"); printf("1000 cmp 999 = %d.\n", bignum_cmp(n1, n2)); printf(" 999 cmp 1000 = %d.\n", bignum_cmp(n2, n1)); printf("1000 cmp 1000 = %d.\n", bignum_cmp(n1, n1)); return 0; }
9999999999 + 9999999999 = 19999999998. 1000 cmp 999 = 1. 999 cmp 1000 = -1.1000 cmp 1000 = 0.
Try the same addition with "normal" integers.
int a = 9999999999; int b = 9999999999; printf("%d + %d = %d.\n", a, b, a+b);
1410065407 + 1410065407 = -1474836482.
3.6. Some time-complexity exercises
Consider the function
int fib(int n)
computing Fibonacci numbers:int fib(int n) { if (n <= 1) return 1; return fib(n-1) + fib(n-2); }
Try to compute
fib(n)
for many values ofn
. How large ann
can be reasonably computed? What is the time complexity of this algorithm? Is there a faster way to compute this? (Hint: There is ann
time and alog(n)
time algorithm this)- Write
unsigned int power_of(unsigned int a, unsigned int b)
. What is your solution's time complexity? Is it optimal? - Write the function
void running_sum(int xs[], size_t n, int sums[])
. After execution,sums[i]
should bexs[0] + ... xs[i]
for alli
. What is the time complexity of your algorithm? - Write
bool triplet(int xs[], size_t n)
to find three distinct (by position, not necessarily value) elements inxs
such that the sum of first two is the third. Assume that the arrayxs
is sorted in non-decreasing order. What is the time complexity of your algorithm? - Write
int max_contiguous_sum(int xs[], size_t n)
to return the maximum sum obtained from a sub-array ofxs
. For example, if the input is{ -2, 1, -3, 4, -1, 2, 1, -5, 4 }
, the answer is6
. What is the time complexity of your solution? Note that if the contiguous restriction is not present, we can simply pick all positive numbers in the array.
3.7. Linked lists
Implement the following functions and test each of them appropriately.
- Write a function
void unlink_next(struct node *here)
to remove the node afterhere
in constant-time from the list. - A linked list is circular if the
next
pointer of some node points to an earlier node in the list. Write a functionbool exists_cycle(struct node *head)
that returnstrue
iff the listhead
has a cycle. - Modify the linked list node structure to hold pointers to both previous and next nodes. What should be the previous of first node?
- Write a function
void insert_before(struct node *here, struct node *what);
to insert a nodewhat
beforehere
in the list. This should be constant-time. - Write a function
void unlink(struct node *here)
to remove the nodehere
from a list. This should be constant-time.