Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Structures and Algorithms: Arrays, Linked Lists, Stacks, and Queues, Exercises of Advanced Data Analysis

A comprehensive introduction to fundamental data structures in computer science, including arrays, linked lists, stacks, and queues. It explores the concepts of iteration, invariants, and recursion, illustrating how these principles are applied to efficiently process and manipulate data within these structures. The document also delves into the abstract data types associated with each structure, outlining their constructors, selectors, and conditions, and discusses various implementation approaches. This resource is valuable for students seeking a foundational understanding of data structures and their applications in algorithm design.

Typology: Exercises

2023/2024

Uploaded on 10/24/2024

hugger
hugger 🇺🇸

4.7

(11)

923 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Structures and Algorithms:
Efficient Organization and
Processing of Information
Arrays, Iteration, Invariants
Storing and Accessing Data in Arrays
Data in computers is ultimately stored as patterns of bits, but
programming languages deal with higher-level objects like characters,
integers, and floating-point numbers.
To store an ordered collection of such objects, the obvious way is to use
an array.
Arrays store items in a sequence of computer memory locations, and
can be written down on paper as a sequence of items enclosed in
square brackets, separated by commas.
For example, a = [1, 4, 17, 3, 90, 79, 4, 6, 81] is an array of
integers with 9 items.
In computer science, arrays often start indexing from 0, so the 8th
position in the array a would be accessed as a[8], which contains the
value 81.
The index i is used to access the individual elements a[i] in the array
a, and algorithms can move sequentially through the array by
incrementing or decrementing the index.
Iterating through Arrays
Algorithms that process data stored in arrays typically need to visit all
the items in the array and apply appropriate operations on them.
This is done by iterating through the array, usually by incrementing the
index.
For example, the following pseudocode iterates through the array a and
prints each element:
for i = 0 to (size of a) - 1: print a[i]
Invariants
When working with arrays and other data structures, it is often useful
to identify and maintain certain properties that hold true throughout
the execution of an algorithm.
These properties are called invariants, and they can help in reasoning
about the correctness and efficiency of the algorithm.
For example, when iterating through an array, a common invariant is
that the index i is always within the valid range of array indices.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Data Structures and Algorithms: Arrays, Linked Lists, Stacks, and Queues and more Exercises Advanced Data Analysis in PDF only on Docsity!

Data Structures and Algorithms:

Efficient Organization and

Processing of Information

Arrays, Iteration, Invariants

Storing and Accessing Data in Arrays

Data in computers is ultimately stored as patterns of bits, but programming languages deal with higher-level objects like characters, integers, and floating-point numbers. To store an ordered collection of such objects, the obvious way is to use an array. Arrays store items in a sequence of computer memory locations, and can be written down on paper as a sequence of items enclosed in square brackets, separated by commas. For example, a = [1, 4, 17, 3, 90, 79, 4, 6, 81] is an array of integers with 9 items. In computer science, arrays often start indexing from 0, so the 8th position in the array a would be accessed as a[8], which contains the value 81. The index i is used to access the individual elements a[i] in the array a, and algorithms can move sequentially through the array by incrementing or decrementing the index.

Iterating through Arrays

Algorithms that process data stored in arrays typically need to visit all the items in the array and apply appropriate operations on them. This is done by iterating through the array, usually by incrementing the index. For example, the following pseudocode iterates through the array a and prints each element:

for i = 0 to (size of a) - 1: print a[i]

Invariants

When working with arrays and other data structures, it is often useful to identify and maintain certain properties that hold true throughout the execution of an algorithm. These properties are called invariants, and they can help in reasoning about the correctness and efficiency of the algorithm. For example, when iterating through an array, a common invariant is that the index i is always within the valid range of array indices.

Maintaining and proving invariants is an important part of algorithm design and analysis.

Conclusion

Arrays are a fundamental data structure for storing and processing ordered collections of data in computer science. Iterating through arrays and maintaining relevant invariants are crucial techniques for developing efficient algorithms that operate on array-based data.

Lists, Recursion, Stacks, Queues

Linked Lists

A list can involve virtually anything, for example, a list of integers [3, 2, 4, 2, 5], a shopping list [apples, butter, bread, cheese], or a list of web pages each containing a picture and a link to the next web page. When considering lists, we can speak about them on different levels - on a very abstract level (on which we can define what we mean by a list), on a level on which we can depict lists and communicate as humans about them, on a level on which computers can communicate, or on a machine level in which they can be implemented.

Graphical Representation

Non-empty lists can be represented by two-cells, in each of which the first cell contains a pointer to a list element and the second cell contains a pointer to either the empty list or another two-cell. We can depict a pointer to the empty list by a diagonal bar or cross through the cell. For instance, the list [3, 1, 4, 2, 5] can be represented as:

❄ ✲ 3 ❄ ✲ 1 ❄ ✲ 4 ❄ ✲ 2 ❄ � � � 5

Abstract Data Type "List"

On an abstract level, a list can be constructed by the two constructors:

EmptyList, which gives you the empty list, and MakeList(element, list), which puts an element at the top of an existing list.

Using those, our last example list can be constructed as MakeList(3, MakeList(1, MakeList(4, MakeList(2, MakeList(5, EmptyList))))). This inductive approach to data structure creation is very powerful, and we shall use it many times throughout these notes. It starts with the "base case", the EmptyList, and then builds up increasingly complex lists by repeatedly applying the "induction step", the MakeList(element, list) operator.

It is also important to be able to get back the elements of a list, and we no longer have an item index to use like we have with an array. The way to

important primitive data structure, in other languages, it may be more natural to implement lists as arrays. However, array-based implementations can only approximate the general concept of lists, as lists are conceptually not limited in size.

The text also discusses a pointer-based approach to implementing lists, which is closer to the diagrammatic representation of lists.

Recursion

The text introduces recursion as a natural way to process linked lists, where there is no index for each item. It provides examples of two important derived procedures on lists: last and append. The last procedure finds the last element of a list, while the append procedure appends one list to another.

The time complexity of these procedures is discussed, with the last procedure having a linear time complexity and the append procedure having a time complexity proportional to the length of the first list.

Stacks

The text explains that stacks are, on an abstract level, equivalent to linked lists. Stacks are the ideal data structure to model a First-In-Last-Out (FILO) or Last-In-First-Out (LIFO) strategy in search.

The text provides a graphical representation of a stack and discusses the abstract data type "Stack", including its constructors, selectors, and conditions. It also explores the implementation of stacks, noting that there are two different ways to think about implementing them: one that does not change the original stack and one that destructively changes the original stack.

Queues

The text introduces queues as a data structure used to model a First-In- First-Out (FIFO) strategy. It provides a graphical representation of a queue and discusses the abstract data type "Queue", including its constructors, selectors, and conditions.

The text notes that the arrangement of a queue allows for efficient addition of elements to the back of the queue and removal of elements from the front of the queue, both of which can be done with constant effort, independent of the queue length.

Doubly Linked Lists

Graphical Representation

Non-empty doubly linked lists can be represented by three-cells, where the first cell contains a pointer to another three-cell or to the empty list, the

second cell contains a pointer to the list element, and the third cell contains a pointer to another three-cell or the empty list. The empty list is depicted by a diagonal bar or cross through the appropriate cell. For instance, the doubly linked list [3, 1, 4, 2, 5] would be represented as:

┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ ┌┼┐ │ 3 │ │ ┌┼┐ │ 1 │ │ ┌┼┐ └─┘│└───┘ └───┘│└───┘ └───┘│└───┘ │ ┌┼┐ ┌┼┐ └───────────┘ │ ┘ │ └───────────┘

Abstract Data Type "Doubly Linked List"

On an abstract level, a doubly linked list can be constructed by the following three constructors:

EmptyList, the empty list. MakeListLeft(element, list), which takes an element and a doubly linked list and returns a new doubly linked list with the element added to the left of the original doubly linked list. MakeListRight(element, list), which takes an element and a doubly linked list and returns a new doubly linked list with the element added to the right of the original doubly linked list.

It is possible to construct a given doubly linked list in more than one way. For example, the doubly linked list represented above can be constructed by either of:

MakeListLeft(3, MakeListLeft(1, MakeListLeft(4, MakeListLeft(2, MakeListLeft(5, EmptyList)))))

Selectors and Conditions

In the case of doubly linked lists, we have four selectors:

firstLeft(list) restLeft(list) firstRight(list) restRight(list)

Additionally, we need a condition that returns whether a list is empty:

isEmpty(list)

This leads to automatically-true relationships such as:

isEmpty(EmptyList) not isEmpty(MakeListLeft(x, l)) (for any x and l) not isEmpty(MakeListRight(x, l)) (for any x and l) firstLeft(MakeListLeft(x, l)) = l

more scalable for large arrays, as the number of steps grows logarithmically with the size of the array, rather than linearly.

Boundary Conditions and Correctness

The text notes that the correctness of the binary search algorithm is not immediately obvious, particularly regarding the handling of boundary conditions, such as an empty array. The text encourages the reader to convince themselves of the algorithm's correctness and to explain the argument to a colleague, as this process often reveals subtle mistakes or leads to a clearer understanding of the algorithm.

Linked List Considerations

The text also discusses the possibility of implementing the search algorithms on linked lists instead of arrays. While a linear search can be performed on a linked list in a similar manner to an array, the binary search algorithm is not easily adaptable to linked lists, as there is no efficient way to split a linked list into two segments. The text suggests that the array-based approach is the best option with the data structures studied so far, but mentions that more complex data structures, such as trees, can enable efficient recursive search algorithms.

Sorting Overhead

The text raises the important consideration of the overhead required to sort the array before applying the binary search algorithm. It notes that until the cost of sorting is taken into account, it cannot be determined whether the binary search algorithm is more efficient overall than the linear search algorithm on the original unsorted array, especially if the search needs to be performed only a few times.

Time and Space Complexity

The text introduces the concepts of time complexity and space complexity as measures of algorithm efficiency. Time complexity refers to how the execution time of an algorithm depends on the size of the data structure, while space complexity refers to how the memory requirement depends on the size of the data structure. The text notes that there is often a trade-off between time and space complexity, where algorithms may optimize one at the expense of the other.

Worst-Case versus Average-Case Complexity

The text also discusses the distinction between worst-case and average-case complexity, and how different applications may prioritize one over the other. For time-critical applications, the worst-case performance may be more important, while for many applications, the average-case performance is the primary concern. The text suggests that algorithms often involve a trade-off between average-case and worst-case efficiency.

Concrete Measures for Performance

These days, we are mostly interested in time complexity. To measure time complexity, we cannot simply implement the algorithm and run it, as that approach has several problems:

If it is a big application with several potential algorithms, they would all have to be programmed first before they can be compared, wasting considerable time on writing programs that may not be used in the final product. The machine on which the program is run or the compiler used might influence the running time. Ensuring that the data used for testing is typical for the application is not feasible, particularly with big applications. The empirical method will not tell you anything useful about the next time you are considering a similar problem.

Therefore, complexity is usually best measured in a different way. To avoid being bound to a particular programming language or machine architecture, it is better to measure the efficiency of the algorithm rather than its implementation. This requires the algorithm to be described in a form of pseudocode that comes close to the implementation language.

Determining Time Complexity

To determine the time complexity of an algorithm, we need to count the number of times each operation will occur, which will usually depend on the size of the problem. The size of a problem is typically expressed as an integer, which is usually the number of items that are manipulated.

The complexity of an algorithm will be given by a function that maps the number of items to the (usually approximate) number of time steps the algorithm will take when performed on that many items.

In the early days of computers, the various operations were each counted in proportion to their particular 'time cost', and added up. Nowadays, the differences in time costs have become less important, but we still need to be careful when deciding to consider all operations as being equally costly.

Big-O Notation for Complexity Class

Very often, we are not interested in the actual function C(n) that describes the time complexity of an algorithm in terms of the problem size n, but just its complexity class. This ignores any constant overheads and small constant factors, and just tells us about the principal growth of the complexity function with problem size, and hence something about the performance of the algorithm on large numbers of items.

Definition: A function g belongs to the complexity class O(f) if there is a number n₀ ∈ N and a constant c > 0 such that for all n ≥ n₀, we have that g(n) ≤ c * f(n). We say that the function g is 'eventually smaller' than the function c * f.

This definition makes it clear that constant factors do not change the growth class (or O-class) of a function. It also allows us to simplify terms when adding functions from different growth classes, as the sum will always be in the larger growth class.

When we say that an algorithm 'belongs to' some class O(f), we mean that it is at most as fast growing as f. For example, linear searching has linear complexity, i.e. it is in growth class O(n), which holds for both the average case and the worst case.

The issue of efficiency and complexity class, and their computation, will be a recurring feature throughout the chapters to come. Concentrating on the complexity class rather than finding exact complexity functions can often render the whole process of considering efficiency much easier.

General Specification of Trees

Tree Structure

Generally, a tree can be specified as consisting of:

Nodes (also called vertices or points) Edges (also called lines, or arcs in the case of directed trees) with a tree-like structure.

Trees are often represented pictorially, as shown in the example in Figure 6.1.

More formally, a tree can be defined as either:

The empty tree A node with a list of successor trees

Nodes are usually labeled with a data item, such as a number or search key, referred to as the node's value.

Tree Terminology

The unique 'top level' node is known as the root. Nodes connected to a given node via a branch are called the children of that node. The node (at most one) connected to a given node on the level above is its parent. Nodes with the same parent are known as siblings. A node that is a child of a child (of a child, etc.) of another node is a descendant of that node, and the other node is an ancestor. Nodes without any children are known as leaves.

A path is a sequence of connected edges from one node to another. The depth or level of a node is the length of the path from the root to that node. The maximal length of a path in a tree is the height of the tree. The size of a tree is the number of nodes it contains.

Tree Operations

Like most data structures, trees require a set of primitive operators (constructors, selectors, and conditions) to build and manipulate them. The specific details depend on the type and purpose of the tree.

Quad-trees

Quad-tree Definition

A quad-tree is a particular type of tree in which each leaf-node is labeled by a value and each non-leaf node has exactly four children.

Formally, a quad-tree can be defined as either:

A root node with a value (e.g., in the range 0 to 255) A root node without a value but with four quad-tree children: lu, ll, ru, and rl

Quad-tree Operators

isValue(qt): Returns true if the quad-tree qt is a single node. baseQT(value): Constructs a single-node quad-tree with the given value. makeQT(luqt, ruqt, llqt, rlqt): Constructs a quad-tree from four constituent quad-trees. lu(qt), ru(qt), ll(qt), rl(qt): Selectors that return the corresponding sub-trees of a non-leaf quad-tree.

Quad-trees are commonly used to store grey-value pictures, with 0 representing black and 255 white. Algorithms can be developed using the quad-tree operators to perform manipulations such as rotation or computing average values.

Binary Trees

Binary Tree Definition

A binary tree is a tree in which every node has at most two children, and can be defined inductively as:

The empty tree (EmptyTree) A node with a value and two binary tree children (left subtree and right subtree)

Determining the Size of a Binary Tree

A binary tree may not always be perfectly balanced, so we need an algorithm to determine its size, i.e., the number of nodes it contains. This can be done recursively:

The terminating case is simple: an empty tree has a size of 0. Otherwise, any binary tree is assembled from a root node, a left sub- tree l, and a right sub-tree r. The size of the tree is the sum of the sizes of its components: 1 for the root, plus the size of l, plus the size of r.

We can define the size(t) procedure, which takes a binary tree t and returns its size, as follows:

size(t) { if (isEmpty(t)) return 0; else return (1 + size(left(t)) + size(right(t))); }

This recursively processes the entire tree, and the process will terminate because the trees being processed get smaller with each call, eventually reaching an empty tree.

Implementing Binary Trees

The natural way to implement binary trees is using records and pointers, similar to how linked lists were represented. The details depend on the number of children each node can have, but trees can generally be represented as data structures consisting of a pointer to the root-node content (if any) and pointers to the children sub-trees.

A binary tree can be implemented as a data record for each node, consisting of the node value and two pointers to the children nodes. The MakeTree function creates a new data record of this form, and root, left, and right functions simply read the relevant contents of the record. The absence of a child node can be represented by a Null Pointer.

Recursive Algorithms

Some people have difficulties with recursion, as it may appear that the algorithm is "calling itself" and could get confused about what it is operating on. However, this is a misleading way of thinking about it.

The algorithm itself is a passive entity that cannot do anything on its own. What happens is that a processor (a machine or a person) executes the algorithm. When a recursive call is encountered, new processors are given the task with a copy of the same algorithm.

For example, in the size(t) algorithm, if the tree t is not empty, the processor can extract the left and right sub-trees l and r using the left(t) and right(t) selectors. The processor can then ask two other processors, say Steve and Mary, to execute the same algorithm on the sub-trees l and r, respectively. When Steve and Mary finish, the original processor can

compute and return 1 + m + n, where m and n are the sizes of the left and right sub-trees.

In this way, the algorithm is not calling itself, but rather there are multiple processors running their own copies of the same algorithm on different trees.

It is also possible for a single processor to impersonate multiple processors by using a stack to keep track of the various positions of the same algorithm that are currently being executed. However, this knowledge is not necessary for our purposes.

Recursive Algorithms with Counters

In some cases, we may want to keep track of the number of recursions by passing integers along with the data structures being operated on. For example:

function(int n, tree t) { // terminating condition and return // procedure details return function(n-1, t2); }

This allows us to do something n times or look for the nth item, etc. The classic example is the recursive factorial function:

factorial(int n) { if (n == 0) return 1; return n * factorial(n-1); }

Another example is a direct implementation of the recursive definition of Fibonacci numbers:

F(int n) { if (n == 0) return 0; if (n == 1) return 1; return F(n-1) + F(n-2); }

However, this Fibonacci algorithm is extremely inefficient, with a time complexity of O(2^n). There exists a straightforward iterative algorithm that has only O(n) time complexity. It is also possible to create an O(n) recursive algorithm to compute Fibonacci numbers.

In most cases, we won't need to worry about counters, as the relevant data structure will have a natural end point condition, such as isEmpty(x), that will bring the recursion to an end.

Binary Search Trees

Binary search trees are a particular type of binary tree that provide an efficient way of storing data and performing searches. The key idea is that at each tree node, the value of that node either tells us that we have found the required item, or tells us which of its two subtrees we should search for it in.

A binary search tree is a binary tree that is either empty or satisfies the following conditions:

  • All values occurring in the left subtree are smaller than that of the root.

In practice, the search may return a pointer to the full record associated with the search key, rather than just a true/false result.

Time Complexity of Insertion and Search

Both item insertion and search in a binary search tree take at most as many comparisons as the height of the tree plus one. The average height of a binary search tree is O(log₂ n), where n is the number of nodes. Therefore, the average number of comparisons needed to search a binary search tree is O(log₂ n), which is the same as the complexity of binary search in a sorted array. Inserting a new node into a binary search tree has an average time complexity of O(log₂ n), which is better than the O(n) complexity of inserting an item into a sorted array.

Deleting Nodes from a Binary Search Tree

To delete a node from a binary search tree: If the node is a leaf, it is simply removed. If the node has only one non-empty subtree, the remaining subtree is "moved up" to replace the node. If the node has two non-empty subtrees, the "left-most" node in the right subtree (the smallest item in the right subtree) is used to overwrite the node to be deleted. The left-most node is then removed from the right subtree. The delete algorithm has an average time complexity of O(log₂ n), the same as the complexity of searching and inserting.

Checking if a Binary Tree is a Binary Search Tree

To check if a given binary tree t is a binary search tree: If t is empty, it is a binary search tree. Otherwise, check that: All nodes in the left subtree are smaller than the root. The left subtree is a binary search tree. All nodes in the right subtree are larger than the root. The right subtree is a binary search tree. This can be implemented recursively using the isbst(t) function.