Hashtables & Hashsets
A data structure that provides fast insertion, deletion, and lookup of key/value pairs.
Under the hood, a hash table uses a dynamic array of linked lists to efficiently store key/value pairs. When inserting a key/value pair, a hash function first maps the key, which is typically a string (or any data that can be hashed, depending on the implementation of the hash table), to an integer value and, by extension, to an index in the underlying dynamic array. Then, the value associated with the key is added to the linked list stored at that index in the dynamic array, and a reference to the key is also stored with the value
Iteration
Iteration over a key-value collection yields the keys.
To iterate over the key-value pairs, iterate over .items()
;
to iterate over values use .values()
;
the .keys()
method returns an iterator to the keys.
Check for keys
days_set = {"Mon", "Tue", "Wed"}
print("Mon" in days_set) # True
print("Sun" in days_set) # False
days_dict = {"Mon": 1, "Tue": 2, "Wed": 3}
print("Mon" in days_dict) # True
print("Sun" in days_dict) # False
Deleting items
this_dict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
this_dict.pop("model")
foo = {42,41}
foo.remove(42)
# foo.remove(42) -> **throws error**
foo.discard(41)
foo.discard(41) # -> **no error .discard()**
Examples
-
Invalid Transactions **
-
Record all transactions done at a particular time. Recording the person and the location.Example:
{time: {person: {location}, person2: {location1, location2}}, time: {person: {location}}}
['alice,20,800,mtv', 'bob,50,1200,mtv', 'bob,20,100,beijing'] { 20: {'alice': {'mtv'}, 'bob': {'beijing'}}, 50: {'bob': {'mtv'}} }
-
For each transaction, check if the amount is invalid - and add it to the invalid transactions if so.
- For each transaction, go through invalid times (+-60), check if a transaction by the same person happened in a different city - and add it to the invalid transactions if so.
from collections import defaultdict """ https://www.notion.so/paulonteri/Hashtables-Hashsets-220d9f0e409044c58ec6c2b0e7fe0ab5#cf22995975274881a28b544b0fce4716 """ class Solution(object): def invalidTransactions(self, transactions): """ - Record all transactions done at a particular time. Recording the person and the location. Example: `['alice,20,800,mtv','bob,50,1200,mtv','bob,20,100,beijing']` :\n ` { 20: {'alice': {'mtv'}, 'bob': {'beijing'}}, 50: {'bob': {'mtv'}} } ` \n `{time: {person: {location}, person2: {location1, location2}}, time: {person: {location}}}` - For each transaction, check if the amount is invalid - and add it to the invalid transactions if so. - For each transaction, go through invalid times (+-60), check if a transaction by the same person happened in a different city - and add it to the invalid transactions if so. """ invalid = [] # Record all transactions done at a particular time # including the person and the location. transaction_time = defaultdict(dict) for transaction in transactions: name, str_time, amount, city = transaction.split(",") time = int(str_time) if name not in transaction_time[time]: transaction_time[time][name] = {city, } else: transaction_time[time][name].add(city) for transaction in transactions: name, str_time, amount, city = transaction.split(",") time = int(str_time) # # check amount if int(amount) > 1000: invalid.append(transaction) continue # # check if person did transaction within 60 minutes in a different city for inv_time in range(time-60, time+61): if inv_time not in transaction_time: continue if name not in transaction_time[inv_time]: continue trans_by_name_at_time = transaction_time[inv_time][name] # check if transactions were done in a different city if city not in trans_by_name_at_time or len(trans_by_name_at_time) > 1: invalid.append(transaction) break return invalid
-
-
Longest Substring with K Distinct Characters
-
Dot Product of Two Sparse Vectors
""" Dot Product of Two Sparse Vectors: Given two sparse vectors, compute their dot product. Implement class SparseVector: SparseVector(nums) Initializes the object with the vector nums dotProduct(vec) Compute the dot product between the instance of SparseVector and vec A sparse vector is a vector that has mostly zero values, you should store the sparse vector efficiently and compute the dot product between two SparseVector. Follow up: What if only one of the vectors is sparse? Example 1: Input: nums1 = [1,0,0,2,3], nums2 = [0,3,0,4,0] Output: 8 Explanation: v1 = SparseVector(nums1) , v2 = SparseVector(nums2) v1.dotProduct(v2) = 1*0 + 0*3 + 0*0 + 2*4 + 3*0 = 8 Example 2: Input: nums1 = [0,1,0,0,0], nums2 = [0,0,0,0,2] Output: 0 Explanation: v1 = SparseVector(nums1) , v2 = SparseVector(nums2) v1.dotProduct(v2) = 0*0 + 1*0 + 0*0 + 0*0 + 0*2 = 0 Example 3: Input: nums1 = [0,1,0,0,2,0,0], nums2 = [1,0,0,0,3,0,4] Output: 6 https://leetcode.com/problems/dot-product-of-two-sparse-vectors """ class SparseVector: def __init__(self, nums): self.non_zero = {} for idx, num in enumerate(nums): if num != 0: self.non_zero[idx] = num # Return the dotProduct of two sparse vectors def dotProduct(self, vec: 'SparseVector'): total = 0 for idx in self.non_zero: if idx in vec.non_zero: total += self.non_zero[idx] * vec.non_zero[idx] return total # Your SparseVector object will be instantiated and called as such: # v1 = SparseVector(nums1) # v2 = SparseVector(nums2) # ans = v1.dotProduct(v2)
-
Random Pick Index *
""" 398. Random Pick Index Given an integer array nums with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array. Implement the Solution class: Solution(int[] nums) Initializes the object with the array nums. int pick(int target) Picks a random index i from nums where nums[i] == target. If there are multiple valid i's, then each index should have an equal probability of returning. Example 1: Input ["Solution", "pick", "pick", "pick"] [[[1, 2, 3, 3, 3]], [3], [1], [3]] Output [null, 4, 0, 2] Explanation Solution solution = new Solution([1, 2, 3, 3, 3]); solution.pick(3); // It should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning. solution.pick(1); // It should return 0. Since in the array only nums[0] is equal to 1. solution.pick(3); // It should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning. https://leetcode.com/problems/random-pick-index """ from typing import List import collections import random class Solution: def __init__(self, nums): self.store = collections.defaultdict(list) for idx, num in enumerate(nums): self.store[num].append(idx) def pick(self, target: int): indices = self.store[target] return indices[random.randint(0, len(indices)-1)] # Your Solution object will be instantiated and called as such: # obj = Solution(nums) # param_1 = obj.pick(target) # Other solution https://leetcode.com/problems/random-pick-index/discuss/88153/Python-reservoir-sampling-solution. class Solution: def __init__(self, nums: List[int]): self.nums = nums def pick(self, target: int): """ Reservoir sampling - at every valid index, try to see whether the current index can be picked """ count = 0 idx = 0 for i in range(len(self.nums)): if self.nums[i] == target: count += 1 if random.randint(0, count - 1) == 0: idx = i return idx
How to store an class in a hashtable
# it can also work without the __hash__ function lol
class Node:
def __init__(self, x):
self.val = int(x)
x = Node(1)
y = Node(2)
z = Node(2)
store = {x: 1}
store[y] = 2
store[z] = 2
print(store.keys())
# dict_keys([<__main__.Node object at 0x7fe0482a5f50>, <__main__.Node object at 0x7fe0482a5fd0>, <__main__.Node object at 0x7fe0482a8050>])
Sorting by keys/value
store = {'Zebra':2, 'Apple':3, 'Honey Badger':1 }
print(sorted(store)) # ['Apple', 'Honey Badger', 'Zebra']
print(sorted(store, reverse=True)) # ['Zebra', 'Honey Badger', 'Apple']
print(sorted(store, key=lambda x: store[x])) # ['Honey Badger', 'Zebra', 'Apple']
hashtable vs hashset vs hashmap python
HashTable == HashMap == Dictionary
Hashset == Set
You can compare hashtables
>>> x = {'a': 1, 'e': 1, 'b': 1}
>>> y = {'e': 1, 'a': 1, 'b': 1}
>>> x == y
True
Examples:
Hash table libraries
There are multiple hash table-based data structures commonly used in set
, dict
, collections.defaultdict
, collections.OrderedDict
and collections.Counter
.
The difference between set
and the other three is that is set simply stores keys, whereas the others store key-value pairs. All have the property that they do not allow for duplicate keys.
collections.defaultdict
In a dict
, accessing value associated with a key that is not present leads to a KeyError exception. However, a collections.defaultdict
returns the default value of the type that was specified when the collection was instantiated.
from collections import defaultdict
list_dict = defaultdict(list)
print(list_dict['key']) # []
list_dict['key'].append(1) # adding constant 1 to the list
print(list_dict['key']) # [1] -> list containing the constant [1]
int_dict = defaultdict(int)
print(int_dict['key2']) # 0
int_dict['key2'] += 1
print(int_dict['key2']) # 1
ice_cream = defaultdict(lambda: 'Vanilla')
ice_cream['Sarah'] = 'Chunky Monkey'
ice_cream['Abdul'] = 'Butter Pecan'
print(ice_cream['Sarah']) # Chunky Monkey
print(ice_cream['Joe']) # Vanilla
print(ice_cream.values()) # dict_values(['Chunky Monkey', 'Butter Pecan', 'Vanilla'])
# Works like normal dict
for key in list_dict:
print(key, list_dict[key])
when it comes to build hash's hash's hash kind of jobs, defaultdict
is really handy.
The behavior of defaultdict
can be easily mimicked using dict.setdefault
instead of d[key]
in every call.
In other words, the code:
from collections import defaultdict
d = defaultdict(list)
print(d['key']) # empty list []
d['key'].append(1) # adding constant 1 to the list
print(d['key']) # list containing the constant [1]
is equivalent to:
d = dict()
print(d.setdefault('key', list())) # empty list []
d.setdefault('key', list()).append(1) # adding constant 1 to the list
print(d.setdefault('key', list())) # list containing the constant [1]
-
Invalid Transactions **
-
Record all transactions done at a particular time. Recording the person and the location.Example:
{time: {person: {location}, person2: {location1, location2}}, time: {person: {location}}}
['alice,20,800,mtv', 'bob,50,1200,mtv', 'bob,20,100,beijing'] { 20: {'alice': {'mtv'}, 'bob': {'beijing'}}, 50: {'bob': {'mtv'}} }
-
For each transaction, check if the amount is invalid - and add it to the invalid transactions if so.
- For each transaction, go through invalid times (+-60), check if a transaction by the same person happened in a different city - and add it to the invalid transactions if so.
from collections import defaultdict """ https://www.notion.so/paulonteri/Hashtables-Hashsets-220d9f0e409044c58ec6c2b0e7fe0ab5#cf22995975274881a28b544b0fce4716 """ class Solution(object): def invalidTransactions(self, transactions): """ - Record all transactions done at a particular time. Recording the person and the location. Example: `['alice,20,800,mtv','bob,50,1200,mtv','bob,20,100,beijing']` :\n ` { 20: {'alice': {'mtv'}, 'bob': {'beijing'}}, 50: {'bob': {'mtv'}} } ` \n `{time: {person: {location}, person2: {location1, location2}}, time: {person: {location}}}` - For each transaction, check if the amount is invalid - and add it to the invalid transactions if so. - For each transaction, go through invalid times (+-60), check if a transaction by the same person happened in a different city - and add it to the invalid transactions if so. """ invalid = [] # Record all transactions done at a particular time # including the person and the location. transaction_time = defaultdict(dict) for transaction in transactions: name, str_time, amount, city = transaction.split(",") time = int(str_time) if name not in transaction_time[time]: transaction_time[time][name] = {city, } else: transaction_time[time][name].add(city) for transaction in transactions: name, str_time, amount, city = transaction.split(",") time = int(str_time) # # check amount if int(amount) > 1000: invalid.append(transaction) continue # # check if person did transaction within 60 minutes in a different city for inv_time in range(time-60, time+61): if inv_time not in transaction_time: continue if name not in transaction_time[inv_time]: continue trans_by_name_at_time = transaction_time[inv_time][name] # check if transactions were done in a different city if city not in trans_by_name_at_time or len(trans_by_name_at_time) > 1: invalid.append(transaction) break return invalid
-
collections.OrderedDict
An OrderedDict is a dictionary subclass that remembers the order that keys were first inserted. The only difference between dict()
and collections.OrderedDict()
is that:
OrderedDict preserves the order in which the keys are inserted. A regular dict doesn’t track the insertion order, and iterating it gives the values in an arbitrary order. By contrast, the order the items are inserted is remembered by OrderedDict.
from collections import OrderedDict
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
print(od.keys()) # odict_keys(['a', 'b', 'c', 'd'])
od.popitem(last=True)
print(od.keys()) # odict_keys(['a', 'b', 'c'])
sortedcontainers.SortedDict
set
A set is an unordered collection of items. Every set element is unique (no duplicates) and must be immutable (cannot be changed).
However, a set itself is mutable. We can add or remove items from it.
Sets can also be used to perform mathematical set operations like union, intersection, symmetric difference, etc.
A set is a collection that is both unordered and unindexed.
Simple operations
The remove()
method raises an error when the specified element doesn’t exist in the given set, however, the discard()
method doesn’t raise any error if the specified element is not present in the set and the set remains unchanged.
Examples
foo = set()
foo.add(42)
foo.remove(42)
# foo.remove(42) -> **throws error**
foo.add(41)
foo.discard(41)
foo.discard(41) # -> **no error**
foo.add(43)
foo.add(43)
foo.add(44)
print(foo) # {43, 44}
bar = {1, 2, 2, 2, 3, 3}
print(bar) # {1, 2, 3}
# empty set
print(set())
# from string
print(set('Python')) # {'n', 'P', 'o', 't', 'h', 'y'}
# from list
print(set(['a', 'e', 'i', 'o', 'u'])) # {'a', 'o', 'e', 'i', 'u'}
# from set
print(set({'a', 'e', 'i', 'o', 'u'})) # {'u', 'i', 'a', 'o', 'e'}
# from dictionary
# {'u', 'i', 'a', 'o', 'e'}
print(set({'a': 1, 'e': 2, 'i': 3, 'o': 4, 'u': 5}))
# from frozen set
frozen_set = frozenset(('a', 'e', 'i', 'o', 'u')) # {'u', 'i', 'a', 'o', 'e'}
print(set(frozen_set))
Advanced operations
Union of Sets
The union operation on two sets produces a new set containing all the distinct elements from both the sets. In the below example the element “Wed” is present in both the sets.
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Wed","Thu","Fri","Sat","Sun"])
AllDays = DaysA|DaysB
print(AllDays) # set(['Wed', 'Fri', 'Tue', 'Mon', 'Thu', 'Sat'])
Intersection of Sets
The intersection operation on two sets produces a new set containing only the common elements from both the sets. In the below example the element “Wed” is present in both the sets.
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Wed","Thu","Fri","Sat","Sun"])
AllDays = DaysA & DaysB
print(AllDays) # set(['Wed'])
Difference of Sets
The difference operation on two sets produces a new set containing only the elements from the first set and none from the second set. In the below example the element “Wed” is present in both the sets so it will not be found in the result set.
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Wed","Thu","Fri","Sat","Sun"])
AllDays = DaysA - DaysB
print(AllDays) # set(['Mon', 'Tue'])
Compare Sets
We can check if a given set is a subset or superset of another set. The result is True or False depending on the elements present in the sets.
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Mon","Tue","Wed","Thu","Fri","Sat","Sun"])
print(DaysA <= DaysB) # True
print(DaysB >= DaysA) # True
collections.Counter
A collections.Counter
is used for counting the number of occurrences of keys, with a number of setlike operations.
Missing keys will return a count of 0.
import collections
list1 = ['x', 'y', 'z', 'x', 'x', 'x', 'y', 'z']
x = collections.Counter(list1)
print(x) # Counter({'x': 4, 'y': 2, 'z': 2})
x['a'] -= 1
print(x['a']) # -1
foo = collections.Counter(a=3, b=1)
bar = collections.Counter(a=1, b=2)
# add two counters together
print(foo+bar) # Counter({'a': 4, 'b': 3})
# subract -> ignores negative
print(foo - bar) # Counter({'a': 2})
# intersection: min(foo[x], bar[x]),
print(foo & bar) # Counter({'a': 1, 'b': 1})
# union: max(foo[x], bar[x]),
print(foo | bar) # ({'a': 3, 'b': 2})
"""
>>> c
Counter({'5': 2, '3': 2, '.': 1, 'e': 1, '9': 1})
>>> c['E']
0
>>> 'E' in c
False
"""
Find the original version of this page (with additional content) on Notion here.
Created: December 13, 2021 16:05:48