跳到主要內容

[Job Hunting] System Design


A. Usage

1. find out the main usage scenarios
2. find out the side usage scenarios and group them into 3 categories-
  a) need to be considered now.
  b) will possibly happen in the future but not now.
  c) more likely will never need

B. Scale

from the aspects of different practical resources-
  a. processing resource: which might include CPU and RAM
  b. network: not only mean something like LAN bandwidth. should be considered from the viewpoint of request/response processing flow. Its corresponding path. So this part may also include something like the data passing in two system module is through RAM or Ethernet cable or crossing Internet.
  c. storage

from the 3 aspects above, we will ask the questions below
  1. how many request per second/day/week/month/year by average
  2. how many request in peak situation
  3. how long expected the current system last being workable
  4. to serve each request, how mush corresponding storage is expected
  5. come out to read/write per second

C. Abstract Design

to come out core components and their connections.
might be like
1. web server: for a web kind service. to deliver webpage to user. to pass request from user to application server. to collect response from application server to construct the result page for the user.
1. application layer (to service requests): main computing resource. handling more complicated works.
2. data storage layer (to support application)

D. Bottlenecks

find out the possible bottleneck components in the components of abstract design given the "scale" part we already have some numbers.
is it traffic, computing, storage, or searching the storage?

E. Scaling

Treatments:

  • Vertical scaling
  • Horizontal scaling
  • Caching
  • Load balancing
  • Database replication
  • Database partitioning
way to think...
  • you already target the bottleneck tier. first try to list the characteristic of the bottlenecks. 
  • for example, if the bottleneck is the amount of the data is very huge and consequently the performance of read/write/search is concerned, figure out the detailed characteristics first. How's the expected frequency of reading data? How's writing? How big of data each read/write carry?

Rules

  • Public servers of a scalable web service are hidden behind a load balancer. Every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory.
    Sessions need to be stored in a centralized data store which is accessible to all your application servers. It can be an external database or an external persistent cache, like Redis.
  • How can you make sure that a code change is sent to all your servers without one server still serving old code?
    => Capistrano

DB Scaling

  • switch to a better and easier to scale NoSQL database like MongoDB or CouchDB.
  • Joins will now need to be done in your application code.
  • cache
    • always mean in-memory caches like Memcached or Redis. Never do file-based caching, it makes cloning and auto-scaling of your servers just a pain
    • Cached Database Queries:
      • Whenever you do a query to your database, you store the result dataset in cache.
      • A hashed version of your query is the cache key.
      • The main issue is the expiration. It is hard to delete a cached result when you cache a complex query. When one piece of data changes (for example a table cell) you need to delete all cached queries who may include that table cell.
    • Cached Objects
      • strong recommendation and I always prefer this pattern.
      • For example, a class called “Product” which has a property called “data”. It is an array containing prices, texts, pictures, and customer reviews of your product. The property “data” is filled by several methods in the class doing several database requests which are hard to cache, since many things relate to each other. Now, do the following: when your class has finished the “assembling” of the data array, directly store the data array, or better yet the complete instance of the class, in the cache!
  • sharding
    • data for User A is stored on one server and the data for User B is stored on another server
    • It doesn't use replication. Replicating data from a master server to slave servers is a traditional approach to scaling. Data is written to a master server and then replicated to one or more slave servers. At that point read operations can be handled by the slaves, but all writes happen on the master. Obviously the master becomes the write bottleneck and a single point of failure. 
    • problems
      • Rebalancing data. Let's say some user has a particularly large friends list that blows your storage capacity for the shard. You need to move the user to a different shard.
      • Joining data from multiple shards. To create a complex friends page, or a user profile page, or a thread discussion page, you usually must pull together lots of different data from many different sources.
      • How do you partition your data in shards?

Computing/processing asynchronism

  • doing the time-consuming work in advance and serving the finished work with a low request time
  • Very often this paradigm is used to turn dynamic content into static content.  Pages of a website, maybe built with a massive framework or CMS, are pre-rendered and locally stored as static HTML files on every change.
  • RabbitMQ is one of many systems which help to implement async processing. You could also use ActiveMQ or a simple Redis list. The basic idea is to have a queue of tasks or jobs that a worker can process.

Peak

  • add a load balancer with a cluster of machines for supporting peak traffic and ensure availability. Not necessary to be on in the normal traffic time.

Examples

Reference




留言

這個網誌中的熱門文章

[Development] git

本篇介紹幾個 git 會用到的基本 command line 指令 安裝 on Mac OS [tested on macOS Catalina 10.15.1] (可以用 Brew 安裝,若尚未安裝 Brew:  https://brew.sh/index_zh-tw ) $ brew install git $ git --version git version 2.21.0 (Apple Git-122.2)   github 官網就有提供 GUI 的應用 ,想用 GUI 的直接去下載安裝就行了,也有簡單明瞭的教學,非常容易。另外也有好幾個第三方的好用 GUI 介面,Google 一下比較一下選自己喜歡的也行。   但以下還是用 command line 的方法來操作,因為這還是最 general 到哪都可以用的基本方法。因為實務上,比如 code 都放在公司的 server,你可能也是都要 ssh 到 sever 上去改 code,改完之後要上傳到 github。而公司的 server 就是一台 Linux 環境,很可能是沒有提供 GUI 讓你使用的,所以你就只能用 command line 的方式完成 git 的上傳。 Create Repo   去本地一個你想要放置 git 專案的地方,比如我想把我之後的 git code 都放在我 Mac local 的 /Users/chungchris/git $ cd /Users/chungchris/git $ git init   就會看到在此 git 目錄下產生一個隱藏的 .git 資料夾,這樣就完成了: (base) Chris-MBP:git chungchris$ ls -al total 0 drwxr-xr-x   3 chungchris   staff     96 11 21 11:01 . drwxr-xr-x+ 53 chungchris   staff   1696 11 21 10:45 .. drwxr-xr-x   9 chungchris   staff  ...

[Coding] Compiler

Something about compiling. #compiler #link #gcc Reference PTT LinuxDev, 作者: cole945 (躂躂..) 看板: LinuxDev, 標題: [心得] 用gcc 自製Library, 時間: Sun Nov 5 04:15:45 2006 Static Link Compile 時將 library 加入程式碼,執行快但佔空間,code size 和 mem 使用都比較多 Compile source codes to generate object files $ gcc -c file1.c file2.c file3.c -c 編出 object 檔 Create a static library named libmylib.a $ ar rcs lib mylib .a file1.o file2.o file3.o 把一堆 object 檔用 ar(archiver) 包裝集合起來,檔名以`.a’ 結尾 Using a Static Library to generate a executable files $ gcc -o main main.c -L. -l mylib -L: the directory of the library. 可以指定多次 -LDIR -l: the name of the library (注意 藍色 部分是 match 的) Dynamic Link Compile 時不將 library 加入程式碼,執行程式的時後再將 library 載入程式碼,若有多個程式共用同一個 library,只需載一個 library 進 memory Compile source code $ gcc -c -fPIC file1.c file2.c file3.c -fPIC 表示要編成 position-independent code,這樣不同 process 載入 shared library 時,library 的程式和資料才能放到記憶體不同位置。-fPIC 較通用於不同平台,但產生的 code 較大,而且編譯速度較慢 Create a shared library...

Let's Move On

今天決定要建個部落格 身為資工系的學生, 又那麼愛做筆記 XD 卻到今天才想做這件事似乎有點落漆 還沒明確知道想用來記錄些什麼, 有可能只是與程式相關的東西, 也有可能會很雜 但也很有可能過了一年什麼屁都沒有, 到時候也可以拿出來嘲笑自己一下自我檢討一番 Decide creating the blog today. It seems abnormal for a computer science student to do this so late. After all, I like taking note a lots... Still not sure about what am gonna write here. Maybe all about programming. Maybe it will be really mixed. Nevertheless... very likely that it will still be empty after a year... Then I can open it and piss myself. 迅速 Google 了一下該用哪個平台, 只想找個簡單好用的 外貌先決下 wordpress 和 blogger 脫穎而出, 進一步看了一下覺得並沒有複雜架站的需求所以最後選了 blogger