Memoization technique
Fibonacci example
The below example calculates the 20th element of Fibonacci series. The code takes top-down and recursive approach. But it’s only a demo purpose. Bottom up and iterative approach is a lot more efficient in this case.
function getFib(n) {
if (n == 0) return 0;
if (n == 1) return 1;
return getFib(n - 1) + getFib(n - 2);
}
This algorithm returns the correct result which is 6765, however, the getFib method gets called 21891 times which is highly inefficient in terms of runtime and memory. And the number of the method call increases exponentially. In other words, this algorithm is not scalable for a large input. This is a typical issue with recursive functions.
Look at the second example. It uses a technique called “memoization” or you can simply call it “caching”
// Cache (Memoization) the result to avoid overlapping sub-problems
var cache = new Array();
cache[0] = 0;
cache[1] = 1;
function getFibWithCaching(n) {
if (cache.hasOwnProperty(n)) {
return cache[n];
}
cache[n] = getFibWithCaching(n - 1) + getFibWithCaching(n - 2);
return cache[n];
}
In this example, the number of the function call has been reduced down to 39 which is about the double size of the input. And the number of the function calls increase lineally.
Overlapping sub-problems
The first example contains lots overlapping sub-problems. So the memoization allows you to avoid re-processing the same subroutine, instead it grabs the result from the cache.
Iterative approach is much better
Lastly, below is an iterative version which is most space efficient
function getFibFast(n) {
prevprev = 0;
prev = 1;
var sum;
var temp;
for (var i = 2; i <= n; i++) {
sum = prev + prevprev;
temp = prev;
prev = sum;
prevprev = temp;
}
return sum;
}
Is it scalable?
Is it scalable?
I have been asking this question a lot recently to myself and my coworkers.
Is it scalable?, Is it going to scale?
The scalability has been one of the most important aspects in designing and implementing our system and we have spent a lot of time figuring out how to make the system scalable.
But what does it mean really?
Although we all seem to agree on the term, I wonder if we all think about the same thing when we ask this question to each other? I actually struggle to come up with one clear answer to the question. It means many things and it also covers a lot of areas.
Since I am a software engineer, most likely, I would think from software design perspectives. I would answer the scalability means the capability of growing the system without re-architecting the software. Or the software architecture that accommodates the system growth.
For example, Inproc caching or session is not scalable since http is stateless and a next request can hit any other machine in the cluster. But this example is only limited to scaling out web servers dealing with increased traffic.
How about data and network perspectives?
I think I need more general characteristics to define the scalability.
What is scalability?
In a recent experience with dealing with high traffic, I found the system works fine until the usage reaches a certain point. I know it sounds so obvious but my point is that the system that breaks with a little increased usage is not a scalable system. We didn’t noticed many issues with CPU and memory usage through a lot of load tests. So with the software architecture issues aside, whether the system consists of a single machine or a thousand machines, the first characteristic should be:
The system can accommodate increased usage
CPU, Memory, Network, Disk I/O are usage examples.
Another important characteristic should be about data. Let’s say we expect to double the amount of the data from 2 billion records to 4 billion records this year. Can the system maintain the same performance with increased amount data not to mention the capability of storing the data?. As an example, If the response time of querying the same record increases more than 4 times, I don’t think It’s a scalable system even though it managed to store the double amount of data.
The system can accommodate an increased dataset
The scalability is one of the main factors motivating the recent rise in popularity of NoSql databases.
Software architecture also needs to accomodate the scalability. For example, in Asp.Net, the application layer should be decoupled from the data layer which allows the storage layer to grow without affecting the application layer and vise versa.
Push it til it breaks
We did a lot of load tests to find out if the system is capable of handling increased loads. However, what we missed out is that we didn’t push the load tests until the system finally chokes. So we didn’t know at what point the system starts to break. Also the load test should include requests that cause the system to throw exceptions to see if the system can handle the increased amount of exceptions
Fun with bit shifting
I think bit shifting is one of those things you learn at college and never use once in your life at work. Well, it might not be true for some people, it’s been pretty much the case for me.
I came across some interesting code example where bit shifting is kind of useful as an alternative to the hash table which I
would normally use to find the intersection between two datasets. Let’s say we want to create a method that checks if the given string has duplicate characters in it.
bool HasDuplicateChars(string anyString)
{
// TODO: return true if there are duplicate chars
// otherwise false;
}
Well, since I am a C# programmer, I will go like this in a sec.
bool HasDuplicateChars(string anyString)
{
return anyString.Distinct().Count() != anyString.Length;
}
The end of the story. It exactly takes one line of code and I stop thinking from this point.
Don’t get me wrong. I am a big fan of Linq and I love the high level of abstraction it brings to C# so you can write code in 4GL style. That said, the most fun part of the programming is in the intimate details of how things work which I don’t want to miss either.
One simple approach to find a duplicate character within the string is to compare the every character against the every character in the string one by one.
bool HasDuplicateChars(string anyString)
{
char[] charArray = anyString.ToCharArray();
foreach (char c1 in charArray)
{
foreach(char c2 in charArray)
{
if (c1 == c2)
{
return true;
}
}
}
return false;
}
It’s probably the most straight forward solution and does not require extra storage but definitely not the most efficient one. The worst case senario is n * n, n square times of iterations, which can be pretty large if the string size is big (like 100 * 100 = 10,000).
Instead, we can leverage a data structure like the hash table to make it more efficient.
bool HasDuplicateChars(string anyString)
{
HashSet charHashTable = new HashSet();
foreach (char c in anyString.ToCharArray())
{
if (charHashTable.Contains(c))
{
return true;
}
else
{
charHashTable.Add(c);
}
}
return false;
}
The worst case senario here is n. So if the string length is 100, it takes 100 loops in the worst case senario. So it’s much better than the previous one. However, it requires extra memory for the HashSet object.
Note that the “Contains” method is highly efficient since it does not perform any type of searching. Instead, it uses the hash function to determine if the key exists in the table. So the lookup time is always the constant 1.
Now here is a little tricky one but maybe the most fun one, bit shifting. The code below shows how you can use bit shifting to accomplish the same task (in a hard way)
bool HasDuplicateChars(string anyString)
{
if (anyString.Length > 26)
{
return true;
}
UInt32 board = 0;
UInt32 check = 1;
for (int i = 0; i < length; i++)
{
int shiftCount = anyString[i] - 'A';
if ((board & (check << shiftCount)) > 0)
{
return return;
}
board |= (check << shiftCount);
}
return false;
}
This example assumes the input characters are all ASCII and range only between ‘A’ and ‘B’. The code above basically uses each bit of the 32 bit unsigned integer as place holders. I could just choose the signed 32 bit integer. The signed integer uses the most left bit to hold “+” or “-” so the bits that you can actually use are 31. But it does not matter in this case. I restricted the input to 26 chars which is small enough to be held in 31 bits.
Let’s say you have a byte which has 8 bits which means you have 8 place holders to store 0 or 1. So you can turn each bit on or off like a switch. As an example, let’s say the bye looks like this:
0 0 0 0 0 0 0 1
if you shift a bit to the left by 1, it will look like this
0 0 0 0 0 0 1 0
C# has bit shift operators. “«” and “»”. Using the two operators, you can shift the bit back and forth respectively.
if you have 16 bit signed integer of which value is 1 and if you shift the bit to the left 16 times, you will see the number turns to the negative all of a sudden. That puzzled me for a minute and realized that the most left bit indicates the number is minus, which was interesting to know. So if I use an unsigned 16 bit integer, I won’t have the same issue since the most left bit is just another bit.
Back to the example, if I apply the “&” operator between the first and second byte.
0 0 0 0 0 0 0 1 & 0 0 0 0 0 0 1 0 = 0 0 0 0 0 0 0 0
since there are no intersections between those two. For “|”, the result becomes
0 0 0 0 0 0 1 1
So from this, we can use this to check the duplicate chars.
anyString[i] - ‘A’ will give me the offset in the ASCII table and I can use the offset as the index of the bit place holder. As an example, if anyString[0] == ‘A’, the offset is 0 and I shift the first bit “0” times to the left which means the number remains the same. The next char is ‘B’ and the offset is 2 and I shift the bit 2 times to the left so the number will be:
0 0 0 0 0 0 1 0
This code: checker |= (matcher « val) combines the two result using “|” operator and the result becomes
0 0 0 0 0 0 1 1
So far there are no duplicates. Let’s say third char is ‘A’ again, a duplicate char. I shift 0 times and the result is:
0 0 0 0 0 0 0 1
so for this code: if ((checker & (matcher « val)) > 0) returns true since
the result is
0 0 0 0 0 0 0 1
So it’s pretty good in terms of lookup times and space usage. The big O notation of the algorithm is O(n) where the best case senario is 2 and the worst is N.
References
https://developer.mozilla.org/en/JavaScript/Reference/Operators/Bitwise_Operators
Peer pressure is much more powerful than a concept of a boss
Malcolm Gladwell in Tipping Point
Linq and Simplicity
The Linq can save a lot of lines of your code. But the beauty of it is, instead of making the code more cryptic, it makes the code more self explanatory and easier to read.
The following code checks if the user agent string contains either iphone, ipad or ipod.
if (request.UserAgent.IndexOf("iphone") > -1 || request.UserAgent.IndexOf("ipad") > -1 || request.UserAgent.IndexOf("ipod") > -1 )
{
return true;
}
return false;
The code appears to do the job but what if you have more than 3 iDevices to check?. Cramming 20 conditions in a if statement won’t look very good.
The below loops through the list of the iDevices and check if the user agent string contains each iDevice string.
foreach(string iDevice in new string[] { "iphone", "ipad", "ipod" })
{
if (request.UserAgent.IndexOf(iDevice) > -1)
{
return true;
}
}
return false;
This looks better. We can compress the above code into one line using a Linq expression as below
return new string[] { "iphone", "ipad", "ipod" }.FirstOrDefault(i => request.UserAgent.IndexOf(i) > -1) != null;
Which one would you like better?
Scalability isn’t just a challenge for Facebook or Twitter, nor is performance solely the province of Google. Those qualities are going to be at a premium for all applications going forward
Read more: http://redmonk.com/sogrady/2010/05/13/node-js/#ixzz1nSjjNol4
Korean BBQ (Taken with Instagram at Sam Won BBQ)
Mieun (Taken with instagram)
The tipping point (Taken with instagram)
1 of 15 pages