Let me remind you how websites was created in past: there was server application that receives the request from the user, processes it, draws the HTML page (or performs the requested operation and draws the similar page) and gives it back to user. Simple rule: the more RPS you can process – the more visitors you can serve.
When internet grows, some people began to counteract with “high load” with typical methods: they setuped nginx as front-server, several backend-servers (upstreams) with copies of their web application, and spreaded the load to them. Randomly (by round-robin) or with a little trick: for example, the first upstream had 1s timeout, second upstream – 2s timeout, and so on. Of course there were more clever schemes.
It was a typical way more than 6 years ago.
Then a SPA revolution occurs. There are no term “server-side rendering” at all. Most of business logic came to browser (into javascript), and the visitor began to receive all HTML/CSS/JS of application on the start point, as one complete packet. Also, full application could be placed on a public CDN like CloudFlare.
Now there are no sence how “fast” is your backend server, how many RPS can you process. It ceased to be a characteristic of website. Because nobody can DDoS the CloudFlare, and if someone can – you couldn’t overcome it anyway.
And the second revolutionary thing was an asynchronous delay between user action and system reaction – due to async nature of javascript and websockets. Even if a request was flying to the backend server via normal HTTP, but the answer (response) flew back later, by asynchronous way (through websocket or SSE transport).
This are the reasons why long-term position “catch as much requests as you can or die” was rolled over to opposite strategy “don’t loose any single request”. When user interface became asynchronous, it doesn’t really matter that your like under the facebook post come to your friends browsers after 1 second or 1 minute. You will see instantaneous effect in both cases, because facebook imitates it for you, when real background request (and response) travels through the network and business logic. And this is suitable behavior for anybody.
It doesn’t matter if the request was processed for 0.1s or 15s or more, because there is only one pattern for all cases: “thanks, your request was accepted, you will see the response in your interface as soon as possible”.
Therefore, if you have (for example) e-commerce website, and your customer press the button “Submit the order”, you can’t tell him “Sorry, my server is busy, please wait while nginx will find suitable upstream”. You can’t tell him 503, because he will left your website – and will be right. You should only tell him “Thanks, request accepted”, save it to safe place, process as soon as possible and delivery the results to customer’s web interface.
This is the romantic story where the queues come from in web. If you still not sure, please take look at Trello – very good web application, I think. Look at tickets or flats search on a market leader’s websites. Look at Ruby on Rails – with it’s ActiveCable (websockets out of the box). Look at Facebook. Be aware about modern technologies.