If you try to show UIActionSheet in the child of a UITabBarController then your "cancel" button will be partially working. This is due to the implementation of tab bar. One can easily avoid this by giving window for action sheet in showInView method:
[actionSheet showInView:[self.view window]];
Thursday, January 27, 2011
Sunday, January 23, 2011
Lessons from Giant-Scale Services by Eric A. Brewer
This is an "experience" paper from UC Berkeley Professor Eric A. Brewer on giant-scale web services. It has been nearly 10 years (published in 2001) after this paper. Main contribution of this paper is to introduce a metric called DQ to adresses the challenges of giant-scale web services such as high-availability, evolution and growth. Paper mainly consists of 3 parts: "The Basic Model" of the giant-scale services, "High Availability" of these services and lastly the "Online Evaluation and Growth" of these services.
This paper focus on "internet-based systems" and the discussion is limited to single-site single-owner, well-connected clusters which may be part of a larger service. Most of the service issues related to network partitioning, multiple admin domains etc. are not covered in this paper. Hence it specifically focuses on the basic block of giant-scale web services. This section starts with set of advantages of this basic blocks. These are: "Access anywhere any time", "Availability via multiple devices" including smart phones, tablets etc, "Groupware support" which explains the possibility of exploiting group-based applications, "Lower overall cost" in the sense of utilization and finally "simplified service updates".
Basic Model
After introducing the advantages of this giant-scale service, paper explains the components of the system: clients, load manager, servers, persistent data store and backplane. Basic duty of Load manager is to balance the loads and hide down the faults from the external world. The original approach used in load management is "Round-robin DNS" in which loads are distributed among different nodes in a round robin fashion. Main disadvantage of this system is: it doesn't hide inactive servers. However as the author explains, now most of the services now include "layer-4" switches. These switches can understand the TCP and port numbers and can make decision on whether the node is down or not. Furthermore, author explains/examines two other load management approach. First one uses a custom "front-end" nodes that acts as a service specific layer-7 routers by controlling session information. The other approach is to use smart client.
High Availability
The backbone and main aim of the giant-scale web services is to have "high availability" very close to 100% of the time. I can't imagine Facebook down for an hour a week which would make life disaster for them in the sense that users do not like services which disappear for some time. Economic impact of this should be very huge. Hence "high availability" is the major requirement of giant-scale services. In order for such systems to be evaluated there needs to some metrics and the traditional metric for availability is the uptime which can be defined as follows:
This paper focus on "internet-based systems" and the discussion is limited to single-site single-owner, well-connected clusters which may be part of a larger service. Most of the service issues related to network partitioning, multiple admin domains etc. are not covered in this paper. Hence it specifically focuses on the basic block of giant-scale web services. This section starts with set of advantages of this basic blocks. These are: "Access anywhere any time", "Availability via multiple devices" including smart phones, tablets etc, "Groupware support" which explains the possibility of exploiting group-based applications, "Lower overall cost" in the sense of utilization and finally "simplified service updates".
Basic Model
After introducing the advantages of this giant-scale service, paper explains the components of the system: clients, load manager, servers, persistent data store and backplane. Basic duty of Load manager is to balance the loads and hide down the faults from the external world. The original approach used in load management is "Round-robin DNS" in which loads are distributed among different nodes in a round robin fashion. Main disadvantage of this system is: it doesn't hide inactive servers. However as the author explains, now most of the services now include "layer-4" switches. These switches can understand the TCP and port numbers and can make decision on whether the node is down or not. Furthermore, author explains/examines two other load management approach. First one uses a custom "front-end" nodes that acts as a service specific layer-7 routers by controlling session information. The other approach is to use smart client.
High Availability
The backbone and main aim of the giant-scale web services is to have "high availability" very close to 100% of the time. I can't imagine Facebook down for an hour a week which would make life disaster for them in the sense that users do not like services which disappear for some time. Economic impact of this should be very huge. Hence "high availability" is the major requirement of giant-scale services. In order for such systems to be evaluated there needs to some metrics and the traditional metric for availability is the uptime which can be defined as follows:
uptime = (MTBF-MTTR)/MTBF
Hence, uptime is the fraction of time the service is up. Although this is a traditional approach, one can easily realize that this is not a good metric for availability. Because service may be down for some time but no one used the service at that time, hence no real impact occurs. On the other hand, if the down time intersects with the peak usage of the system, then this might cause a disaster. Therefore author suggests two more metrics as follows:
yield = queries completed/queries offered
harvest = data available/complete data
This two metrics basically capture the availability in a more meaningful way and perfect system would require to have 100% yield and 100%harvest all the time. However, this is unrealistic given the current technology and the demand for such giant-scale services. At this point author introduces the DQ principle which is basically:
Data Per Query x Queries Per Second ----> constant
Intuition comes from the fact that system's overall capacity tends to have a particular physical bottleneck such as total I/O bandwidth etc. It is in fact true and valid assumption and mostly network-bounded rather that disk-bounded considering the giant-scale web services. DQ is the main contribution of this paper and as the formula stated it is measurable and tunable. In fact, it can scale up linearly in regards to adding new nodes or down having a fault/down in another node. Hence, DQ is very valuable for future traffic predictions and also for the future improvement on the hardwares and softwares. One important point to aware of is that this measurements are for data-intensive sites so it is not suitable to apply these principles to computation-bounded sites in which yield and harvest probably defined differently.
Replication vs. Partitioning
Replication is a traditional technique of increasing availability and in this part of the paper discusses the comparison of replication and partitioning under fault in regards to DQ, yield and harvest. The example given in the paper consists of a 2 nodes cluster where one of the node is down due to some fault. For the replicated system: data availability is not changed because it is replicated so harvest is not affected. However, yield reduces by 50%, because all the queries are now directed to one node instead of two. For the partitioned system, data availability is changed because half of the data is now unusable, hence harvest is down by 50%. On the other hand, yield is not affected. As a result DQ change for both case is same and down by 50%. Note that the real bottleneck is in the DQ value not in the replicated data, therefore even if you replicated the data under the faulty condition other nodes will have a higher loads than they have before and this affects the system. Assuming there is enough excess capacity to redirect queries (load redirection problem) is not realistic under the heavy load of the giant-scale services.
Another alternative presented in the paper is to replicate key data where if the main node fails then you can use the replicated one. Last approach presented in the paper is randomized partition where using some has function data is partitioned. In this way, worst case and best case are harmonized and we can obtain average case losses.
Graceful Degradation
Graceful Degradation is the process of effectively managing the saturation by controlling yield, harvest and DQ. It can be achieved either by controlling AC (Admission Control) which reduces Q or trough dynamic database reduction which reduces D. Paper also exemplifies some more sophisticated techniques for graceful degradation such as Cost-based AC: which control the admission of queries based on their DQ scores. By this way on the cost of not providing service for costly queries we can provide more lower costly queries and this increases both Q. Another example is Priority or Value-based AC where requests treated differently in terms of their priorities. And lastly, reducing data freshness by increasing expiration time will increase the yield, but reduces harvest (due to old cache data).
Online Evolution and Growth
Update is an inevitable fact of the giant-scale web services. Although the traditional approach dictates minimal changes to the system, giant-scale services generally needs some changes in terms of upgrades, maintenance etc.. Paper states 3 main approach for online evolution, these are;
Fast Reboot: It is simply the rebooting all nodes into its new version. This guarantees some downtime and yield is affected by this. One can reduce the impact by rebooting in a convenient time where small amount of people are using the system.
Rolling Upgrade: Maybe the most convenient one. In this approach nodes are updated in rolling-based one by one. Assuming having enough capacity, this will provide no reduction in yield and harvest (if replicated).
Big Flip: Most complicated one. First half of the nodes are updated and the layer-4 switches used to direct the traffic to other half. Then the other half is updated. In this scenario, we'll have 50% reduction in DQ (see the 2 nodes cluster example above).
As a conclusion, I here make the verbatim copies of the points in the paper.
Yes it is valid, but these kinds of sites probably heavier write/update traffic than the sites mentioned in the paper. So the discussions in the paper may not valid for these sites. For example, thinking about DQ value for heavy write/update sites, replicated sites will have more DQ value than the partitioned ones.
Replication vs. Partitioning
Replication is a traditional technique of increasing availability and in this part of the paper discusses the comparison of replication and partitioning under fault in regards to DQ, yield and harvest. The example given in the paper consists of a 2 nodes cluster where one of the node is down due to some fault. For the replicated system: data availability is not changed because it is replicated so harvest is not affected. However, yield reduces by 50%, because all the queries are now directed to one node instead of two. For the partitioned system, data availability is changed because half of the data is now unusable, hence harvest is down by 50%. On the other hand, yield is not affected. As a result DQ change for both case is same and down by 50%. Note that the real bottleneck is in the DQ value not in the replicated data, therefore even if you replicated the data under the faulty condition other nodes will have a higher loads than they have before and this affects the system. Assuming there is enough excess capacity to redirect queries (load redirection problem) is not realistic under the heavy load of the giant-scale services.
Another alternative presented in the paper is to replicate key data where if the main node fails then you can use the replicated one. Last approach presented in the paper is randomized partition where using some has function data is partitioned. In this way, worst case and best case are harmonized and we can obtain average case losses.
Graceful Degradation
Graceful Degradation is the process of effectively managing the saturation by controlling yield, harvest and DQ. It can be achieved either by controlling AC (Admission Control) which reduces Q or trough dynamic database reduction which reduces D. Paper also exemplifies some more sophisticated techniques for graceful degradation such as Cost-based AC: which control the admission of queries based on their DQ scores. By this way on the cost of not providing service for costly queries we can provide more lower costly queries and this increases both Q. Another example is Priority or Value-based AC where requests treated differently in terms of their priorities. And lastly, reducing data freshness by increasing expiration time will increase the yield, but reduces harvest (due to old cache data).
Online Evolution and Growth
Update is an inevitable fact of the giant-scale web services. Although the traditional approach dictates minimal changes to the system, giant-scale services generally needs some changes in terms of upgrades, maintenance etc.. Paper states 3 main approach for online evolution, these are;
Fast Reboot: It is simply the rebooting all nodes into its new version. This guarantees some downtime and yield is affected by this. One can reduce the impact by rebooting in a convenient time where small amount of people are using the system.
Rolling Upgrade: Maybe the most convenient one. In this approach nodes are updated in rolling-based one by one. Assuming having enough capacity, this will provide no reduction in yield and harvest (if replicated).
Big Flip: Most complicated one. First half of the nodes are updated and the layer-4 switches used to direct the traffic to other half. Then the other half is updated. In this scenario, we'll have 50% reduction in DQ (see the 2 nodes cluster example above).
As a conclusion, I here make the verbatim copies of the points in the paper.
- Get the basics right. Start with a professional data center and layer-7 switches, and use sym- metry to simplify analysis and management.
- Decide on your availability metrics. Everyone should agree on the goals and how to measure them daily. Remember that harvest and yield are more useful than just uptime.
- Focus on MTTR at least as much as MTBF. Repair time is easier to affect for an evolving system and has just as much impact.
- Understand load redirection during faults. Data replication is insufficient for preserving uptime under faults; you also need excess DQ.
- Graceful degradation is a critical part of a high-availability strategy. Intelligent admission control and dynamic database reduction are the key tools for implementing the strategy.
- Use DQ analysis on all upgrades. Evaluate all proposed upgrades ahead of time, and do capacity planning.
- Automate upgrades as much as possible. Develop a mostly automatic upgrade method, such as rolling upgrades. Using a staging area will reduce downtime, but be sure to have a fast, simple way to revert to the old version.
- Is the assumption that queries outnumber the writes or the updates valid in case of web site like youtube.com and application like Picasa by Google? Or in other words, is it safe to consider only the query part in case of these web sites for designing the Giant Scale infrastructure?
Yes it is valid, but these kinds of sites probably heavier write/update traffic than the sites mentioned in the paper. So the discussions in the paper may not valid for these sites. For example, thinking about DQ value for heavy write/update sites, replicated sites will have more DQ value than the partitioned ones.
- If uploading or updates comprises of a significant part of some giant scale application or web site, then are the metrics Yield, Harvest and DQ enough to consider the design issues or some other metric(s) is/are needed to be introduced to cater to the design issues?
- DQ Principle (Pg6): How do we conclude here that scaling is linear for any given system?
- Rolling Upgrades (Pg9): How are restarts delays related to interdependent services accounted?? This should lead to more downtime in a Rolling upgrade when compare to Ideal Upgrade
- Were any new technologies, techniques or approaches developed since this article was written to increase MTBF in a reasonable amount of time? Note, the author describes uptime as (MTBF-MTTR)/MTBF and claims that it is easier to reduce the time it takes to fix failures than it is to reduce the frequency of failures. It appears more effort goes into reducing MTTR than increasing MTBF (and rightly so). Also, we know that data replication is insufficient for preserving uptime under faults since this technique reduces yield in terms of availability. Have advancements been made to utilize data replication to preserve uptime with little effect on its yield?
- How does CAP theorem relate to harvest & yield?
- How does a node failure affect DQ limit? How does replication affect DQ?
- Even with replication, can't we reduce harvest, and keep the same yield?
- DQ principle is the key factor in this paper. But as I understanding, the DQ is only relevant to hardware field. For example, “behind this principle is that the system’s overall capacity tends to have a particular physical bottleneck. The DQ value is the total amount of data that has to be moved per second on average and it is thus bonded by the underlying physical limitation.” As the basic model proposed in this paper, giant-scale services include load manager, servers which obviously involve software issue. So I don’t understand, why the whole paper discussing DQ principle as the most significant factor since replication vs. partitioning, graceful degradation, disaster tolerance and online evolution and growth are all evaluated by DQ. Software such as loader manager is nothing contributes to the performance of giant-scale services.
- The paper mentioned:” the small test cluster is a good predictor for DQ changes on the production system since DQ normally scales linearly with number of nodes, it is easy to measure the DQ impact of faults if given metric and load generator”. How can we deduce this conclusion?
Tuesday, January 18, 2011
How to add custom UITableViewCell
Well my first impression for iphone development is: there are not many documentation available to address all the problems like we have for Java or C etc. Therefore, I decided write some of the basic things that I was looking for while I was developing an app.
- (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath {
This one basically is needed on almost every application that you build on iphone. In one way or another, you need to write your custom table view cell classes. For example if you want to have a table view with images on the left and text on the right you need to overwrite UITableViewCell. Good thing is; it is not that difficult. Here are the steps:
- First create a new View XIB file from your interface builder named it as X.
- Then remove the view and put a UITableViewCell to your UI.
- Now you need to click on your UITableViewCell element and change it class to to X from your interface builder.
- Now double click yo your custom table view cell and put some labels image views etc on your need.
- Now wire all elements in your IB by defining them in your .m file and referencing them as outlets in your IB.
- You are ready to go. In your Table View you just need to replace the standart UITableViewCell object by a new Custom Table View Cell X by the following code
static NSString *CellIdentifier = @"X";
X *cell = [tableView dequeueReusableCellWithIdentifier:CellIdentifier];
if (cell == nil) {
NSArray *topLevelObjects = [[NSBundle mainBundle] loadNibNamed:@"X" owner:self options:nil];
for(id currentObject in topLevelObjects) {
if([currentObject isKindOfClass:[X class]]) {
cell = (X *)currentObject; break;
}
}
//configure your cell
return cell;
}
Monday, January 17, 2011
Crash-Only Software
This paper is from USENIX 9. HotOS,2003. Paper explores the possibility of designing and implementing a crash-only software. Crash-only software can be defined as a software which can safely crash and recover. Since it crashes safely there is no need to shut it down. Paper explores this on the domain of Internet applications.
Crash-only design depicted as generalization of a typical transactional model that we already have in database systems. Application is divided into crash-only components where similar components can be grouped into bigger crash-only components (Recursive).
All important non-volatile states are managed by dedicated state stores. And note they should also be crash-safe otherwise, it will crash unsafely just after one step. For this purpose crash-safe state stores are chosen such as databases. Paper also describes how inter component communication is done by exemplifying with time-out mechanism. And lastly it explains the restart/recovery mechanism.
References
Hints for Computer System Design by Butler W. Lampson
This is a paper from 1983 published in ACM OS'83. It could be regarded as a blueprint paper for computer system design. It includes lots of hints and proposal as how to build/improve the design of a typical computer systems. Although author suggests to take it "in small doses at bedtime", I suspect that it should be read at once and put as a reference publication who want to design computer system.
Paper sits on a three important features. These are "Functionality", "Speed" and "Fault Tolerance". And each features is explained further under three different topics such as: "Completeness", "Interface" and "Implementation". Sections are decorated with beautiful words from different authors mostly from computer science. Here is one: "Algo 60 was not only an improvement on its predecessors but also on nearly all its successors" by (C. Hoare) where I suppose the founder of the algorithm for the Quick Sort.
There are lots of points in the paper that should be stated, however since it is verbose and no need explanation in most parts I suggest to read the paper from the link that I provided below. However, there are some points that I should note: for example, a designer should keep in mind that neither abstraction nor simplicity is a substitue for "getting it right" (Functionality). Another important point that I thought to be important is the constant tension between the desire to improve a design and the need for stability. I mostly have encountered this tension during my design and it might also be thought with the "fault tolerance". On the other hand one should keep the basic, essential parts stable while trying to improve the other parts constantly.
References:
Subscribe to:
Comments (Atom)