Designing a system usually starts out to be abstract - we have large functional blocks that need to work together and are abstracted away into frontend, backend and database layers. However, when it is time to implement the system, especially as an SRE we have no other choice but to think in specific terms. Servers have a fixed amount of memory, storage capacity and processing power. So we need to think about the realistic expectations from our system, assess the requirements, translate them into specific requirements from each component of the system like network, storage and compute. This is typically how almost all large scale systems are built. The folks over at Google have formalized this approach to designing systems as ‘Non abstract large system design’ (NALSD). According to the Google site reliability workbook, > “Practically, NALSD combines elements of capacity planning, component isolation, and graceful system degradation that are crucial to highly available production systems.” We will be using an approach similar to this to build our system. ## Application requirements Let us define our application requirements in more concrete terms i.e., specific functions: Our photo sharing application must let the user - Sign up to become a member, and login to the application - Upload photographs, and optionally add a description and tag location and/or people - Follow other users on the platform - See a feed comprising of photos from other users that they follow - View their own profile page, and manage who they follow Let us define expectations for the application’s performance for a better user experience. We also need to define the health of the system. SLIs and SLOs help us do just that. ## SLIs and SLOs The Google SRE book defines service level indicator(SLI) as “a carefully defined quantitative measure of some aspect of the level of service that is provided.” For our application, we can define multiple SLIs. One indicator can be the response time for loading the feed for our photo sharing application. Picking the right set of SLIs is very important since they essentially help us define the health of the system as a whole using concrete data. SLIs for an application are defined by the owners of the service, in consultation with the SREs. Service level objective (SLO) is defined as “a target value or range of values for a service level that is measured by an SLI”. SLO is a way for us to anchor ourselves to an optimal user experience by defining SLI boundaries. If our application takes a long time to load the feed, users might not open it very often. As a result, an example of SLO can be that at least 99% of the users should see their feed loaded within 1 second. Now that we have defined SLIs and SLOs, let us define the application’s scalability, reliability and performance characteristics in terms of specific SLI and SLO levels. ## Defining application requirements in terms of SLIs and SLOs The following can be some of the expectations for our application: - Once the user successfully uploads the image, it should be accessible to the user and their followers 100% of the time, barring user elected deletion. - At least 50000 unique visitors should be able to visit the site at any given time and view their feed. - 99% of the users should be able to view their feeds in less than 1 second. - Upon uploading a new image, it should show up in the feed of the user’s followers within 15 minutes. - Users should be able to upload potentially thousands of images. (as long as they are not abusing the service) Since our ultimate aim is to learn system design, we will arbitrarily limit the functionalities of the system. This will help us keep sight of our aim, and keep us focussed. Having defined the functionalities and expectations for our system, let us quickly sketch an initial design.  As of now, all the functionalities are residing on a single server, which has endpoints for all of these functions. We will attempt to build a system that satisfies our SLOs, is able to serve 50k concurrent users, and about a million total users. In the course of this attempt, we will discuss a string of concepts, some of which we have already seen in Phase 1 of this course.