Multikey Dictionary – A multikey index lookup collection.

When querying a large collection which consists of millions of items, you may encounter some serious performance issues. If you are using an unordered collection, In worst case scenario it will take O(n) to fetch an item. Where as is in a sorted collection, it will be search better than that of an unordered collection – with a performance complexity O(log n). Where as with a dictionary it is a constant time O(1) and offers the best deal.

Unfortunately, a dictionary can only accept one key. What if you have to query on other parameters without affecting the performance. Nested dictionaries is “an” answer.(Not always though, if you need perform comparison)

A sample declaration is as follows

   Dim dict As Dictionary(Of Integer, Dictionary(Of Long, Dictionary(Of Integer, Dictionary(Of String, Product))))

Now creating such a nested dictionary is not an easy work. For Each column, For Each order, you have tailor the dictionary and there is no “abstract” silver bullets to ease the creation of such a dictionary.
As for future convenience I just created a simple extension to IEnumerable which will convert to a “MultiKeyDictionary”.

Here is a simple walkthrough:

Dim students As New List(Of Student)
students.Add(New Student() With {.Age = 10, .Batch = 1231456, .Name = "Tony", .Year = 2012})
students.Add(New Student() With {.Age = 11, .Batch = 1231435, .Name = "James", .Year = 2013})
students.Add(New Student() With {.Age = 10, .Batch = 1231456, .Name = "Andy", .Year = 2012})

And all you need to do is.

dict = students.ToMultiKeyDictionary({Function(x) x.Age, Function(x) x.Batch, Function(x) x.Year, Function(x) x.Name, Function(x) x})

I have made this for fun and cant guarantee that this will be bug free. If you find some issues please do report.

The gist to the implementation is available at : https://gist.github.com/1845121

<ExtensionAttribute()>
 Public Function ToMultiKeyDictionary(Of TSource, TSecKey)(ByVal source As IEnumerable(Of TSource), ByVal ParamArray func() As Func(Of TSource, TSecKey)) As IDictionary
 Dim reverseCollection As New Dictionary(Of Integer, Type)
 Dim functionsCount As Integer = func.Count
 Dim dictTypeCollection As New List(Of IDictionary)
Dim instanceOfCurrentDictionaryArgument As New Object
 Dim dictionaryArgQueue As Object = Nothing
 Dim dictionaryTypes As New List(Of Type)
 For i = 0 To functionsCount - 1
 reverseCollection.Add(i, source.Select(func(func.Count() - (i + 1))).FirstOrDefault().GetType())
 Next
 Dim typeOfGenericDictionary As Type
 Dim dictionaryTypeArgs() As Type
 Dim newConstructedType As Type
 Dim currentDictionaryInstance As IDictionary
For i = 0 To reverseCollection.Count - 1
If reverseCollection(i) IsNot GetType(String) Then
instanceOfCurrentDictionaryArgument = Activator.CreateInstance(reverseCollection(i))
Else
 instanceOfCurrentDictionaryArgument = String.Empty
End If

If instanceOfCurrentDictionaryArgument IsNot Nothing AndAlso dictionaryArgQueue IsNot Nothing Then
typeOfGenericDictionary = GetType(Dictionary(Of ,))
 dictionaryTypeArgs = {instanceOfCurrentDictionaryArgument.GetType(), dictionaryArgQueue.GetType()}
 newConstructedType = typeOfGenericDictionary.MakeGenericType(dictionaryTypeArgs)
 currentDictionaryInstance = Activator.CreateInstance(newConstructedType)
 dictionaryTypes.Add(newConstructedType)
 dictionaryArgQueue = currentDictionaryInstance
 Else
 dictionaryArgQueue = instanceOfCurrentDictionaryArgument
 End If
 Next
 Dim dict As IDictionary = Activator.CreateInstance(dictionaryTypes.Last())
 'If dict IsNot Nothing Then
 Dim queueObj As Object
 For Each item In source
Dim currentItem = item

Dim itemAddedFlag As Boolean = False
 Dim currentItemInCollection As IEnumerable(Of TSource)
Dim iterDict As IDictionary = dict 'Reset dictionary
For i = 0 To func.Count - 1
 currentItemInCollection = {currentItem}
Dim currentKey As TSecKey = currentItemInCollection.Select(func(i)).FirstOrDefault()
If iterDict.Contains(currentKey) AndAlso iterDict(currentKey) IsNot Nothing Then
 iterDict = iterDict(currentKey)
 Else
queueObj = Nothing
 For k = 0 To reverseCollection.Count - 1
If queueObj Is Nothing Then
queueObj = currentItemInCollection.Select(func(functionsCount - (k + 1))).FirstOrDefault()
Else
Dim curObject As Object = currentItemInCollection.Select(func(functionsCount - (k + 1))).FirstOrDefault()
 Dim currentDictionaryFromReverse As IDictionary = Activator.CreateInstance(dictionaryTypes(k - 1))
 If iterDict.GetType() = currentDictionaryFromReverse.GetType() Then
iterDict.Add(curObject, queueObj)
 itemAddedFlag = True
 Exit For
End If
 currentDictionaryFromReverse.Add(curObject, queueObj)
 queueObj = currentDictionaryFromReverse
End If
Next
If itemAddedFlag Then
 Exit For
 End If
 End If
Next
 Next
Return dict
End Function

Resizing a Generic List

One of the main benefit of using Generic List to Array is it convenience to Add new items without exposing the re-initialization details. However, if you are a person who counts ticks, and nanoseconds and if you are obsessed about memory usage then it is important to understand what is happening the in background when a new item is added to a generic list.

When you create a new Generic List, with no capacity specified, it will first create an empty array. And when you add the first element the size of the background array is updated to 4. And when no.of items reaches the size of the background array, the array is copied to a new array with a size double to the current size. This process of creation of new array, copying the existing values causes additional pressure to Garbage Collector and to the CPU and is not an ideal approach if the size of the collection is already known.

'Sample code to initialize a generic list with capacity specified.
Dim capacity As Integer = 10000
Dim collection As New List(Of Integer)(capacity)

If you don’t know the size of the collection during the construction of the Generic List. You can set the Capacity property of the Generic List. Whenever you set this property to a higher value than of the size of the background array, the array is copied to a new array with the size specified.

'Sample code to initialize generic list and capacity specified later
Dim collection2 As New List(Of String)
collection2.Capacity = 30000

Hence, if the length of the Generic List can be determined without causing much computation efforts or memory usage, it is always better to specify the capacity of List.


Image Credits goes to : http://www.flickr.com/photos/jemsweb

Blog at WordPress.com.

Up ↑