I am tackling algorithms in CS50’s Week 3 lecture.
x————x
This is also my first post, so let me introduce myself and explain a little bit of what I’m doing here! My name is Judson, and I am a recent graduate with a Liberal Arts degree. I’ve been interested in coding off and on for quite some time now, and seeing as I have the time to invest in a career now, my goal is to take some quality courses and learn Python, and potentially react down the line.
I’m starting with Harvard’s CS50, then I’ll move to Python for Everybody, and will likely take Introduction to Python by Harvard’s David Malan (also the author of the CS50 course). I actually began Week 1 on June 1st, so it’s been two weeks since I started learning.
In this Substack, I’ll be essentially documenting my progress throughout the courses I’m taking, along with whatever projects and snippets of code I am working on. I’m doing this to create a sort of ‘journey’ blog, while also helping myself learn and understand the concepts I’m being confronted with. So, it might seem like I’m teaching/explaining different programming concepts as if I’m some expert, but it’s really just for myself. However, if it just so happens that it proves useful to anybody else, well, great!
For reference, everything that I’ve learned up to this point has been in C. I’m using VSCode to compile and run everything. I’m also using ChatGPT to help me understand different concepts and lines of code if I get stuck. As for my study method, as I outlined above, I’m trying to take my time and go through everything I’m learning and write it all down here so that I can have a resource in a language that I understand, and hopefully cement not only muscle memory of these concepts in my head but also practice the Feynman technique—something else which I’ve never done before.
x————x
PLEASE feel free to reach out to me if I make any errors in this, or if I am misunderstanding. I’ll take all the help I can get.
Let’s get into it!
x————x
So far, we’ve covered how computers can efficiently locate data in arrays through a divide-and-conquer process, where they essentially cut certain bits of data away by a simple algorithm.
Linear / Binary searching
For instance, say a program wants to find the number 50 in an array of boxes, and there are 7 possibilities as to where that box could be.
Without an efficient algorithm, that program proceeds from left to right, finally ending up with the desired number 50 in the seventh box, because there is no binary indexing and the values of each box are randomized. This is inefficient because the running time of that program finding its proper value is slow, as it has to run through all of the available options before finding what it wants.
The alternative to this route is the algorithm. The algorithm—in pseudocode, at least—says:
“Boxes (7). If boxes = 7 && i = 50, gotoMiddle. If middle = i, return true. If i = <50, go left: if i = >50, go right.”
That’s the loop. Then, it reruns again, determines whether value #2 is greater or lesser than the desired value, and loops again until it returns true instead of false, meaning that it has discovered the desired variable.
Binary search: fast and good (generally)
Linear search: slow and bad (generally)
However, there are different use-case scenarios within which a linear search may be more efficient, it seems. I am still trying to understand this.
x—x
Example of linear search program in C:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
int numbers[] = {20, 500, 10, 5, 100, 1, 50};
int n = get_int("Number: ");
for (int i = 0; i < 7; i++)
{
if (numbers[i] == n)
{
printf("Found\n");
return 0;
}
}
printf("Not found\n");
return 1;
}
What this does is allow the user to search for a number specified in the given array —> “int numbers[] = {20, 500, 10, 5, 100, 1, 50};” to which the program subsequently provides a ‘found/not found’ output depending on the number that is inputted by the user. For instance, 50/20 will return found, while 1000 or 4 would return not found.
x—x
Now, if we want to search for words instead of numbers as in the above code, we can retain the logic while switching out the array this following line of code:
Old:
int numbers[] = {20, 500, 10, 5, 100, 1, 50}
int n = get_int("Number: ");
for (int i = 0; i < 7; i++)
New:
string strings[] = {"battleship", "boot", "cannon", "iron", "thimble", "tophat"};
string s = get_string("String: ");
for (int i = 0; i < 7; i++)
However, running the program with a term such as ‘cannon’ or ‘iron’ results in found. Suppose I search ‘onion.’ I like onions. In the previous example with the old code, it would have returned ‘Not found’—which is precisely what I specified in the 2nd printf function outside of the loop. However, running the program with a different array returns the result of….
Segmentation fault(core dumped)
Why is that?
Here’s why: for (int i = 0; i < 7; i++)
As you can see here, I hardcoded the number ‘7’ into this line of code, because there were originally 7 integers within the first numerical array. However, the array (of strings) only has 6 strings, so the program ALT-F4’s itself and perishes under the insurmountable guilt due to only being able to search 6 boxes when ordered to search 7. So, it is better to avoid hardcoding where possible, and instead fix the code by entering 6. This is still a hardcode, and a temporary solution for this given example, but I do not know the more permanent workaround. I’ve seen it using ChatGPT, but I don’t know how to implement it yet, so I won’t copy and paste it and pretend that I do.
x—x
So now, I’ve just learned a little something about defining your own datatype, something I’ve been curious about for a while.
The context within which I learned this concept was a program that bore very similar code to the previous linear search program, this time in the context of a phonebook. If we want to search for a name, such as ‘John’ or ‘Henry’, we would like to see the corresponding phone number be output as the result of our search in the program.
However, using multiple arrays would be inefficient and bad design, while it would technically still work. This is what I learned was called the ‘honor system’—where the program takes on faith that you can, essentially, see the code and already know the number of strings (of individual names/numbers) in the array that the program is linearly searching from.
The solution to this is to define your own data type.
Structs
typedef struct
{
string name;
string number;
}
person;
Here, I’ve constructed my own data type. Now, once I run the program, I can search for the name and the number of whoever I’m looking for, and I’ve declared it at the top of my code above int main(void) [void because there is no command line argument necessary for this program] so that anything that follows after the declaration of this data type can access it using ‘person;’.
Ok, let’s take a look at the rest of the code here.
int main(void)
{
person people[2];
people[0].name = "Quindale";
people[0].number = "123-456-7890";
people[1].name = "Dingle";
people[1].number = "098-765-4321";
string name = get_string("Name: ");
for(int i = 0; i < 2; i++)
{
if (strcmp(people[i].name, name) == 0)
{
printf("Found %s\n", people[i].number);
return 0;
}
}
printf("Not found\n");
return 1;
}
Notice that in the first line of code I declare people as the term that I will use to allow my code to address the datatype of person.
I have two people, hence [2], but since we count from 0 in programming, Mr. Quindale is [0] and Mr. Dingle is [2], even though they are really 1 and 2 respectively. Then, I use the ‘.’ annotation to add their names and numbers.
Next, I use string name = get_string("Name: "); to request that the user provide the name of the person whose phone number they’d like.
Then, it’s time to implement the loop that will allow me to search for the people in question. I’ll write for(int i = 0; i < 2; i++).
i = 0 is the return value (I think?), i < 2 is (sadly) the hardcoded value of people in this program, or better said, strings in the above array. Then, i++ will simply tell the program to move on to the next step in the loop. I feel like that’s a really bad way of explaining i++.
Next, the strcmp line will compare the name the user has given (again, honor code) with either of the names it knows: if it matches, it’ll have found it and will return 0. If not, it will return 1, and restart the loop using i++ to check the next string in the array to see if that name is the one being searched for.
The printf lines are pretty self-explanatory: they’ll print found/not found if the strcmp returns 0 or 1, respectively.
Here is what the terminal looks like once the program is compiled and run:
$ ./phonebook
Name: Quindale
Found 123-456-7890
$ ./phonebook
Name: dingle ←——«
Not found ←——«
$ ./phonebook
Name: Dingle
Found 098-765-4321
I’ve actually just noticed that, unless the name is not capitalized as Quindale or Dingle, but instead quindale or dingle, the program gets confused, returns 1, and produces the not found output.
So, though I am not sure, perhaps I need some function that tells the program to isupper or islower the user’s input (%s) and to ‘ignore’ it in a way, since the names (at least at a high level) are still the same. Anyway, this was fun. Time for the next program.
x—x
That’s all for today, folks. Time to eat some leftovers for lunch and get back to copywriting.
Thanks for reading!