Networking
This assumes you know only that "computers talk over a network," and takes you to where you can trace a packet through every layer and debug any connectivity problem out loud. Every idea is built from the one before it. Grounded in Stevens' TCP/IP Illustrated and the Google SRE book. Read it top to bottom; the labs and the Interview Gauntlet at the end will then feel like review.
1 · What a network actually is
Strip away the jargon. A network is just two or more machines connected so they can send each other bytes. Two laptops joined by a cable is a network. The internet is the same idea scaled to billions of machines and a web of cables, radios, and fibre between them.
The moment you try to actually do this, hard problems appear, and every piece of networking technology exists to solve one of them:
142.250.72.14. How do we use google.com instead? → DNS.The genius move that makes all of this manageable is layering: rather than one gigantic system that solves everything at once, networking is built as a stack of layers, each solving one problem and relying on the layer below. The local-wire layer just moves bytes between two directly-connected machines; the layer above it handles getting across many wires to a distant machine; the layer above that makes the delivery reliable; and so on. Each layer treats the one below as a dumb pipe and offers a cleaner pipe to the one above. This is the single most important mental model in networking — and, as you'll see, it turns debugging into a mechanical process: unwrap the layers in order until one of them is broken.
2 · The layered model and encapsulation
The layers have standard names and numbers. You don't need the academic 7-layer OSI model memorised, but you must know these four cold, because every tool and every bug lives at a specific layer:
Now the mechanism that makes layers work: encapsulation. When you send data, each layer wraps the data from the layer above in its own header — like putting a letter in an envelope, that envelope in a bigger envelope, and so on. Each header holds exactly what that layer needs to do its job (the L3 header holds source/destination IP; the L4 header holds source/destination port). On the receiving machine, each layer unwraps its envelope, reads its header, and hands the contents up. Watch a request get wrapped on the way out and unwrapped on the way in:
Your browser produces an L7 message: GET / HTTP/1.1. That's the payload everything below will carry. The lower layers neither know nor care that it's HTTP — to them it's just bytes to deliver.
L4 (TCP) wraps it, adding a header with source and destination ports (your random 52012 → the server's 443). Ports are how the receiving machine will know this belongs to the web server, not to SSH. TCP also adds sequence numbers so the bytes can be reassembled in order.
L3 (IP) wraps that, adding a header with source and destination IP addresses. This is the envelope routers read to move the packet across the internet — they look only at the destination IP and forward toward it, ignoring everything inside.
L2 (Ethernet) wraps it one more time with source and destination MAC addresses — but only for the next hop on the local wire (usually your router). This frame is what actually goes out on the physical link as electrical or radio signals. The full nesting: [Eth [IP [TCP [HTTP]]]].
At the destination, the process runs in reverse: L2 strips the Ethernet header (this frame reached me), L3 checks the IP (this packet is for me), L4 reads the port and hands the bytes to the right program, and the web server finally sees the original GET /. Debugging is just doing this unwrapping in your head and finding the layer where it stops.
3 · L2: the local wire, MAC addresses, and ARP
Start at the bottom, on your local network — the machines that share a wire or Wi-Fi with you (your other devices, your router). At this layer, machines are identified not by IP but by a MAC address: a 48-bit hardware ID burned into every network card, written like a4:83:e7:2b:19:0c. Where an IP address is logical (assigned, changeable), a MAC is physical (tied to the card). L2 delivers frames from one MAC to another on the same local network.
The device that connects everyone on the local network is a switch. It learns which MAC address is on which of its physical ports and forwards each frame only to the port where its destination lives (flooding to all ports only when it doesn't yet know). All the machines a switch connects form one broadcast domain — they can reach each other directly at L2, without a router. (This is exactly the Linux bridge you'll build by hand later — a software switch.)
ARP: the IP → MAC bridge
Here's the gap that must be filled: your program wants to send to an IP address, but the wire delivers to a MAC address. How does the machine learn which MAC owns a given local IP? It asks, using ARP (Address Resolution Protocol). To send to 192.168.1.1 on the local network, your machine broadcasts to everyone: "who has 192.168.1.1?" The owner replies "that's me, here's my MAC," and your machine caches the answer (see it with ip neigh). Only then can the frame be addressed and sent. Every local conversation begins with this tiny question-and-answer.
The crucial rule that ties L2 and L3 together: a machine can only directly reach IPs that are on its own subnet (its local wire). For anything else, it doesn't ARP for the destination — it sends the frame to its router (the default gateway) and lets the router forward it onward. Which raises the question of how a machine decides "local vs. not local" and where to send non-local traffic — that's L3 routing, next.
4 · L3: IP addresses, subnets, and routing
An IP address (IPv4, the common case) is a 32-bit number written as four bytes: 192.168.1.42. Unlike a MAC, it's assigned and it's routable — the whole internet's job is getting a packet to the machine holding a given IP, wherever it is. But an address alone isn't enough; you need to know which addresses are "near" (on your wire) and which are "far" (across routers). That's what a subnet defines.
Subnets and CIDR — the one piece of math you must know
A subnet is a contiguous block of IP addresses that share a common prefix. It's written in CIDR notation as 192.168.1.0/24: the /24 means "the first 24 bits are the network part; the remaining 8 bits identify hosts within it." So 192.168.1.0/24 covers 192.168.1.0 through 192.168.1.255 — 256 addresses on one local network. A /16 is bigger (65,536 addresses); a /32 is a single host. When a machine wants to send to an IP, it checks: is this IP inside my own subnet? If yes, deliver directly at L2 (ARP + frame). If no, it's not on my wire — send it to the gateway.
The routing decision
Every machine has a routing table: a list of subnets and where to send traffic for each. When sending a packet, the kernel picks the matching route using longest-prefix match — the most specific subnet that contains the destination wins; if nothing else matches, the default route (0.0.0.0/0, "everything else") catches it and points at the gateway.
ip route get <dst> — it runs this exact decision and prints the answer.A packet crossing the internet is this decision repeated at every hop: each router looks only at the destination IP, consults its routing table, and forwards to the next router closer to the destination — a bucket brigade of longest-prefix matches. The IP header carries a TTL (time-to-live) that each router decrements; if it hits zero the packet is dropped (which is exactly how traceroute discovers the path — it sends packets with TTL 1, 2, 3… and notes which router complains at each distance). A gateway must be on-link (reachable on an attached subnet), which is why Network is unreachable means the kernel had no route and no on-link gateway for your destination.
5 · L4: TCP, UDP, ports, and the handshake
L3 gets a packet to the right machine. But a machine runs many programs — which one gets the data? And IP itself makes no promises: packets can be lost, duplicated, delayed, or reordered. L4 solves both problems, and it comes in two flavours with opposite philosophies.
First, the shared idea: ports. A port is a 16-bit number identifying a specific program on a machine. A web server listens on port 443; SSH on 22. Your outgoing connection uses a random high ephemeral port. A connection is uniquely identified by the four-tuple: source IP, source port, destination IP, destination port. That's how your machine keeps a hundred simultaneous connections straight — each has a unique tuple.
The TCP 3-way handshake
Because TCP is connection-based, both sides must agree to talk before any data flows — the famous 3-way handshake. Watch it:
The client sends a SYN ("synchronise") with an initial sequence number x. This says "I want to open a connection, and I'll number my bytes starting near x." The client is now in SYN_SENT.
The server replies with SYN-ACK: it acknowledges the client's SYN (ack=x+1, "I got your x, send x+1 next") and sends its own SYN with sequence y. Both directions are now being set up. The server is in SYN_RECV.
The client acknowledges the server's SYN (ack=y+1). Now both sides have confirmed they can send and receive. The connection is ESTABLISHED. Three messages — SYN, SYN-ACK, ACK — hence "3-way."
Now data flows. Every chunk is acknowledged; if an ACK doesn't arrive in time, the sender retransmits — that's TCP's reliability. Sequence numbers let the receiver reassemble bytes in order even if packets arrive scrambled. This handshake is why a new TCP connection costs one full round-trip before any data — the reason connection reuse and keep-alive matter for latency.
The TCP state machine (and TIME_WAIT)
A TCP connection moves through defined states — ss -t shows them live. You don't need every transition, but these appear constantly in real debugging:
SYN_SENT = your SYNs aren't being answered (firewall, wrong address, dead server).close(). Piles of CLOSE_WAIT = your application is leaking sockets (not closing them). Fix the code, not the network.6 · Sockets: networking is just file descriptors
Everything above becomes concrete through one abstraction you already know from OS: the socket, which is just a file descriptor. When a program does networking, the kernel gives it an fd representing the connection; the program reads and writes that fd exactly like a file. "A socket is just an fd" isn't a slogan — it's why the same epoll event loop can watch files and network connections together (the C10K design from OS).
The server and client each follow a fixed sequence of syscalls — worth knowing because tools like strace and ss map directly onto it:
socket() → bind() to a port → listen() (now in LISTEN) → accept() blocks until a client connects, returning a new fd for that connection. It keeps accepting while serving each connection on its own fd.socket() → connect() to the server's IP:port (this triggers the 3-way handshake) → then read/write the fd. No bind needed; the kernel assigns an ephemeral source port.ss -tlnp (listening) and ss -tnp (established) reveal.The everyday tool here is ss (socket statistics — the modern netstat). ss -tlnp answers "what's listening and which process owns it?" — the first thing to check for "connection refused" (is anything actually listening on that port?). ss -tnp shows established connections and their states. Because a socket is an fd, socket leaks are fd leaks: a server that never close()s connections piles up in CLOSE_WAIT and eventually hits EMFILE — the exact failure from the OS module, now wearing a networking costume.
7 · DNS: turning names into addresses
You type google.com, but IP routing needs 142.250.72.14. The Domain Name System is the internet's distributed phone book that translates names to IPs. It's the step behind an astonishing share of "the site is slow / down" incidents, so understanding it pays off constantly. Crucially, DNS is a hierarchy resolved by asking a chain of servers — walk it:
Your machine asks its configured recursive resolver (your ISP's, or 1.1.1.1 / 8.8.8.8 — set in /etc/resolv.conf). "Recursive" means it does all the legwork below on your behalf and returns the final answer. First it checks its cache — if it already knows google.com, you skip straight to the end.
On a cache miss, the resolver asks a root server. The root doesn't know google.com, but it knows who handles .com: "go ask the .com servers." The hierarchy is read right-to-left: root → TLD → domain.
The resolver asks a .com TLD server. It doesn't know the IP either, but it knows which servers are authoritative for google.com: "ask ns1.google.com." Getting closer.
Finally the resolver asks google's authoritative name server — the source of truth for that domain — and gets the actual A record: 142.250.72.14. This is the answer.
The resolver returns the IP to you and caches it for the record's TTL (time-to-live) — so the next lookup of google.com is instant, skipping the whole walk. TTL is the trade-off knob: long TTL = fast + less load but slow to reflect changes; short TTL = quick propagation but more lookups. This caching is also why a DNS change "takes time to propagate" and why stale cache causes "it works for me but not you."
The one-line debugging tool is dig: dig google.com shows the answer, the TTL, and which server replied; dig +trace walks the whole hierarchy above so you can see exactly where resolution breaks. When something "can't connect," always separate the two failures: is it name resolution (DNS returns nothing/wrong) or connectivity (DNS is fine but the connection fails)? dig answers the first; ping/ss/tcpdump answer the second.
8 · The whole journey: what happens when you type a URL
Time to assemble everything. "What happens when you type https://example.com and press Enter?" is the canonical systems interview question because it touches every layer you've learned. Here's the complete path:
DNS first. The browser needs an IP, so it resolves example.com via the resolver walk from Section 7 (checking local caches — browser, OS — first). Out comes something like 93.184.216.34. If this step fails, nothing else can happen — always suspect DNS first.
Routing decision (Section 4). The destination isn't on my subnet, so the kernel sends the packet to the default gateway. To put it on the wire it ARPs for the gateway's MAC (Section 3), wraps the packet in an Ethernet frame, and sends it. Each router along the way repeats the longest-prefix-match decision toward the destination.
TCP handshake (Section 5) to port 443. One round-trip to establish the connection before any data. Now there's a reliable, ordered byte stream between browser and server, identified by the four-tuple.
TLS (the S in HTTPS). Over the TCP connection, client and server negotiate encryption: the server presents a certificate (proving it really is example.com), they agree on keys, and everything after is encrypted. This adds another round-trip or two — which is why HTTPS has a small connection-setup cost that keep-alive and TLS resumption exist to amortise.
HTTP at last (L7). The browser sends GET / HTTP/1.1 over the encrypted connection; the server responds with 200 OK and the HTML. The browser parses it, discovers more resources (CSS, JS, images), and repeats the whole process for each — reusing the connection where it can. Then it renders the page. Every single layer of this module just fired, in order.
9 · MTU, fragmentation, and the black hole
A subtle but interview-favourite topic that causes maddening real-world bugs. Every link has a maximum frame size it can carry, the MTU (maximum transmission unit) — classically 1500 bytes for Ethernet. A packet bigger than the MTU can't cross that link in one piece. Historically it would be fragmented (split into MTU-sized pieces and reassembled at the destination), but fragmentation is inefficient and often blocked, so modern IP mostly avoids it: packets are marked "don't fragment," and if one is too big, the router drops it and sends back an ICMP "fragmentation needed" message telling the sender to use a smaller size. This feedback loop is Path MTU Discovery.
Now the failure mode — the MTU black hole. If a firewall or misconfigured device blocks the ICMP feedback messages, the sender never learns its packets are too big. Small packets (the TCP handshake, tiny requests) sail through fine, so the connection looks healthy — but the moment a large packet is sent (a big response, a file), it's silently dropped with no error. The classic symptom: "the connection works, ping works, small requests work, but large transfers hang forever." The fix is usually MSS clamping — telling TCP to advertise a smaller maximum segment size so packets never exceed the path MTU in the first place.
Why this matters enormously in your world: overlay networks (VXLAN, used by many Kubernetes CNI plugins) wrap each packet in an extra header, stealing ~50 bytes from the usable MTU. If the overlay MTU isn't lowered to account for this, you get exactly the black hole above inside your cluster — pods handshake fine but large payloads between nodes hang. This is a direct callback to the MTU labs in Linux and Kubernetes, and now you understand precisely why it happens.
10 · NAT: how private networks reach the internet
There aren't enough IPv4 addresses for every device, so most machines don't have a public IP — your laptop, phone, and every container use private addresses (like 192.168.x.x or 10.x.x.x) that aren't routable on the public internet. So how do they reach the internet at all? NAT (Network Address Translation).
Your router has one public IP. When your laptop (say 192.168.1.42) sends a packet to the internet, the router rewrites the source address from your private IP to its own public IP before forwarding, and remembers the mapping in a table. When the reply comes back to the public IP, the router looks up the table and rewrites the destination back to 192.168.1.42, delivering it to you. Dozens of private devices thus share one public IP, distinguished by port. This source-rewriting on the way out is SNAT / masquerade; the reverse, rewriting a public destination to an internal one (for port-forwarding a service in), is DNAT.
This is a direct preview of the Linux module, where you'll build this by hand with iptables: a container on a private 10.x network reaches the internet because the host masquerades its source IP (SNAT), and docker run -p 8080:80 works because the host DNATs incoming host:8080 to the container. NAT is also stateful — the mapping is created on the first packet and reused for replies (via conntrack) — which is why, when the connection-tracking table fills under extreme load, new connections are silently dropped. Every concept here reappears, concretely, in Linux networking.
11 · A debugging method that always works
When something "can't connect," don't guess — walk the layers from the bottom up (or the request from the start), and find the first layer that's broken. Every layer has a command that answers a yes/no question:
Network is unreachable here = no route.The two distinctions that resolve most incidents: "connection refused" vs "connection timed out." Refused means your packet reached the host and the host actively rejected it (sent a TCP RST) — the host is up, but nothing is listening on that port (wrong port, service down). Timed out means your packet got no response at all — it was silently dropped, almost always by a firewall or a routing/MTU black hole, or the host is down. And always separate name resolution from connectivity: half of "it's down" reports are just DNS.
An interface has a hardware address (L2, the MAC) and an IP address (L3). Packets are addressed by IP but delivered on the local wire by MAC — two layers, one interface.
ip addr shows both: link/ether is the burned-in MAC used by L2/ARP on the local wire; inet is the assigned, routable IP used by L3. The /24 after the IP is the subnet mask that tells the kernel which destinations are "local" (deliver directly) vs "remote" (send to the gateway).
List your interfaces and identify the MAC and the IP, and note the subnet.
$ ip addr showlink/ether = L2 MAC; inet …/24 = L3 IP + subnet. Two addresses, two layers. ip -br addr is a compact view.
Reveal solution
$ ip addr show $ ip -br addr $ ip neigh # the ARP cache: local IP -> MAC mappings
The kernel decides where each packet goes by longest-prefix match over the routing table, falling back to the default route.
ip route lists your routes; the default via <gateway> line is 0.0.0.0/0. ip route get <ip> runs the actual forwarding decision and prints which interface and next-hop the kernel would use — the single most useful routing command, because it shows reality, not your assumption.
Show your routes, then ask the kernel how it would reach a public IP and a local one.
$ ip route get 1.1.1.1For a public IP you'll see it routed via your gateway; for a local IP, delivered directly on your interface (no gateway). That difference is the local-vs-remote decision.
Reveal solution
$ ip route $ ip route get 1.1.1.1 # remote -> via gateway $ ip route get 192.168.1.1 # local -> direct (adjust to your subnet)
A TCP connection opens with SYN → SYN-ACK → ACK before any data. You can watch it on the wire.
tcpdump shows the flags: [S] is SYN, [S.] is SYN-ACK, [.] is ACK. Seeing the handshake complete proves L3+L4 connectivity to the port. Seeing repeated [S] with no [S.] reply is the signature of a firewall/black hole — your SYNs leave but nothing answers.
Sniff TCP on port 443, then make an HTTPS request in another terminal.
$ sudo tcpdump -i any -n 'tcp port 443' -c 6You'll see Flags [S], then [S.], then [.] — the handshake — followed by encrypted data. The exact sequence from the stepper, live on your machine.
Reveal solution
$ sudo tcpdump -i any -n 'tcp port 443' -c 10 & $ curl -sI https://example.com >/dev/null # watch the [S] / [S.] / [.] flags in the capture
Reach engineers who read the man page
Native, contextual, no tracking — this is how the curriculum stays free.
Every connection is a socket, which is just an fd. ss shows what's listening and what's established, with the owning process and TCP state.
ss -tlnp answers "what is listening and who owns it?" — the first check for "connection refused" (is anything on that port?). ss -tnp shows established connections and states (ESTABLISHED, TIME_WAIT, CLOSE_WAIT). A pile of CLOSE_WAIT means your app isn't closing sockets — an fd leak wearing a networking costume.
List listening sockets with their processes, then established connections.
$ ss -tlnpYou see each LISTEN socket, its port, and the process holding it. ss -tnp then shows live connections and their TCP states from the state-machine diagram.
Reveal solution
$ ss -tlnp # listening sockets + process $ ss -tnp # established connections + state $ ss -s # summary counts by state
DNS turns a name into an IP by walking resolver → root → TLD → authoritative, caching by TTL.
dig NAME shows the answer, the TTL, and which server replied. dig +trace NAME performs the walk yourself from the root down, so you can see exactly which level fails when resolution breaks. Separating DNS from connectivity is the fastest way to cut an incident in half.
Resolve a name, read its TTL, then trace the full hierarchy.
$ dig +noall +answer google.comYou get the A record and its TTL. dig +trace google.com then shows root → .com → authoritative — the stepper, live.
Reveal solution
$ dig +noall +answer google.com $ dig +trace google.com | tail -20 $ cat /etc/resolv.conf # which resolver you are using
"Refused" and "timeout" are different failures with different causes — mixing them up sends you debugging the wrong thing.
Refused: your SYN reached the host and got a TCP RST back — the host is up but nothing is listening (wrong port, service down). Timeout: your SYN got no reply at all — silently dropped by a firewall, or the host is unreachable. nc -vz distinguishes them instantly, and tcpdump confirms (RST returned vs no reply).
Trigger a "connection refused" against a closed local port, and reason about what a timeout would look like instead.
$ nc -vz 127.0.0.1 9 # port 9 has nothing listeningYou get Connection refused immediately — the host (localhost) is up and sent a RST because nothing listens on port 9. A timeout (hanging, no answer) would instead mean a firewall silently dropped the packet — a completely different root cause.
Reveal solution
Refused = reached host, nothing listening (check the service / port). Timeout = never reached / silently dropped (check firewall, routing, MTU).
$ nc -vz 127.0.0.1 9 # refused: nothing listening $ nc -vz 8.8.8.8 12345 # likely timeout: dropped by firewall $ ss -tlnp | grep :9 # confirm nothing is listening
You can now trace a packet through every layer: L2 frames and MAC/ARP on the local wire, L3 IP and routing across networks, L4 TCP's handshake/reliability/state-machine and UDP's speed, sockets as file descriptors, DNS resolving names, and NAT/MTU shaping the real-world path. "What happens when you type a URL" is a story you can tell end to end — and every debugging problem is just walking the layers until one breaks. This is the exact machinery you'll build by hand in Linux and watch Kubernetes orchestrate.
The questions actually asked for SRE, network, and systems roles — conceptual, debugging, and the famous synthesis prompts. Each expands to show what the interviewer is really probing for, a model answer, and the follow-up traps. Answer all of these out loud and you have mastered this module.
Q1What happens when you type a URL and press Enter?
The synthesis question — whether you can narrate every layer in order, from DNS to render.
DNS resolves the name to an IP (checking browser/OS caches, then resolver → root → TLD → authoritative). The kernel makes a routing decision — not on my subnet, so send to the default gateway; it ARPs for the gateway MAC and frames the packet. A TCP 3-way handshake (SYN/SYN-ACK/ACK) opens a connection to port 443. For HTTPS, a TLS handshake negotiates encryption and verifies the certificate. Then HTTP: GET / → 200 OK + HTML; the browser parses it, fetches sub-resources over reused connections, and renders. Every layer of the stack fires in sequence.
- "Where would you look if it's slow?" — time each stage: DNS (dig), connect+TLS (curl -w), server response.
- "What if DNS is the problem?" — nothing downstream works; always check DNS first.
Q2TCP vs UDP — differences, and when do you choose each?
Understanding reliability vs speed trade-offs and real use cases.
TCP is connection-based, reliable, and ordered: a handshake, then acknowledgements and retransmission guarantee every byte arrives once, in order — at the cost of setup latency and overhead. UDP is connectionless with no guarantees: fire a packet, no handshake, no retransmit, no ordering — minimal latency. Choose TCP when correctness matters (web, SSH, databases, file transfer). Choose UDP when timeliness beats perfection (DNS — one small query; live video/voice/gaming — a late packet is worse than a lost one). Modern protocols like QUIC/HTTP3 build reliability on top of UDP to get both.
- "Why does DNS use UDP?" — one small request/response; a handshake would double the latency.
- "How does TCP guarantee order?" — sequence numbers let the receiver reassemble regardless of arrival order.
Q3Explain the 3-way handshake. Why three messages?
Whether you understand that both directions must be established, not just one.
SYN → SYN-ACK → ACK. The client's SYN opens its direction and proposes a starting sequence number. The server's SYN-ACK both acknowledges the client's SYN and opens the server's own direction with its sequence number. The client's ACK acknowledges the server's SYN. Three messages because a TCP connection is bidirectional — each direction needs a SYN and a matching ACK, and the server can piggyback its SYN onto the ACK, collapsing four messages into three. After this, both sides have confirmed send and receive work, and data flows.
- "Cost?" — one full round-trip before any data; why keep-alive/connection reuse matter.
- "What's a SYN flood?" — many SYNs, no final ACK, exhausting the server's half-open connection table (DoS).
Q4What is TIME_WAIT, and is a lot of it a problem?
A favourite trap — understanding why TIME_WAIT exists and distinguishing it from CLOSE_WAIT.
After the side that initiates the close finishes the teardown, its socket lingers ~60s in TIME_WAIT. Two reasons: to absorb any straggling/retransmitted packets from the old connection, and to prevent a brand-new connection reusing the same four-tuple from receiving stale data. It's normal and usually harmless. It only becomes a problem on a busy client making huge numbers of short-lived outbound connections, where accumulated TIME_WAITs can exhaust ephemeral ports — fixed by connection reuse/keep-alive, not by blindly tuning it away.
- "TIME_WAIT vs CLOSE_WAIT?" — TIME_WAIT: you closed, waiting safely (normal). CLOSE_WAIT: the peer closed and your app hasn't called
close()— an application socket leak. - "Which side gets TIME_WAIT?" — whoever closes first.
Q5"Connection refused" vs "connection timed out" — what does each tell you?
Whether you can diagnose from the failure mode instead of guessing.
Connection refused: your SYN reached the host and it replied with a TCP RST — the host is up and reachable, but nothing is listening on that port (service down, wrong port). Fix on the service side. Connection timed out: your SYN got no reply at all — silently dropped, almost always by a firewall or a routing/MTU black hole, or the host is down/unreachable. Fix on the network/firewall side. The failure mode tells you which half of the stack to investigate — refused points at the app, timeout points at the network.
- "How confirm quickly?" —
nc -vz host port;tcpdumpto see RST returned vs no reply. - "Refused but the service is running?" — bound to the wrong interface/localhost only, or wrong port.
Q6A connection works for small requests but hangs on large transfers. What's going on?
The MTU black hole — a deep, distinguishing question.
Classic MTU black hole. Small packets (the handshake, tiny requests) fit within every link's MTU and pass fine, so the connection looks healthy. But a large transfer produces packets bigger than some link's MTU; that link drops them and sends back an ICMP "fragmentation needed" message — and if a firewall blocks that ICMP, the sender never learns to shrink its packets, so large payloads are silently dropped and the transfer hangs. Common with VPNs and overlay networks (VXLAN in Kubernetes steals ~50 bytes of MTU). The fix is MSS clamping (advertise a smaller segment size) or correcting the overlay MTU.
- "How confirm?" —
ping -M do -s <size>to find where large packets stop; check for blocked ICMP. - "Why does it bite Kubernetes?" — encapsulation overhead lowers the real MTU between nodes.
Q7Walk me through how a private machine (or container) reaches the internet.
Understanding NAT, private vs public addressing, and statefulness.
The machine has a private, non-routable IP (10.x/192.168.x). Its router/host holds a public IP and performs NAT: on the way out it rewrites the source from the private IP to the public IP (SNAT/masquerade) and records the mapping in a connection-tracking table; on the way back it reverses the rewrite, delivering the reply to the right private machine. Many devices thus share one public IP, disambiguated by port. It's stateful — the mapping is created on the first packet and reused for the flow — which is why only the outbound direction needs a rule and why a full conntrack table drops new connections. Docker and Kubernetes do exactly this with iptables.
- "How does inbound port-forwarding work?" — DNAT: rewrite a public destination to an internal one (docker
-p). - "What breaks at scale?" — conntrack table exhaustion; ephemeral port limits.
Q8Design: distribute traffic across many backend servers behind one address. How?
Load-balancing fundamentals — L4 vs L7, health checks, and the trade-offs.
Put a load balancer behind one virtual IP/DNS name. Two levels: an L4 load balancer forwards by IP/port (fast, protocol-agnostic, just picks a backend per connection — e.g. via DNAT, exactly like a Kubernetes Service). An L7 load balancer understands HTTP, so it can route by path/host, terminate TLS, and retry — richer but heavier. Distribute with a policy (round-robin, least-connections, or hashing for stickiness), and run health checks so unhealthy backends are removed automatically. Key concerns: session stickiness if needed, graceful backend drain, and avoiding the LB itself becoming a single point of failure (run several, fronted by DNS or anycast).
- "L4 vs L7 — when each?" — L4 for raw speed/any protocol; L7 for HTTP routing, TLS termination, retries.
- "How does this map to Kubernetes?" — a Service is an L4 LB (kube-proxy/iptables); an Ingress is L7.
- "How do clients find the LB?" — DNS, and often anycast for the DNS/LB itself.
Q9Why is DNS behind so many outages, and how do you debug it?
Operational maturity — TTL/caching pitfalls and separating DNS from connectivity.
DNS sits before every connection, so when it fails, everything downstream "breaks" with confusing symptoms. Common causes: caching/TTL (a changed record hasn't propagated, so some clients hit the old IP — "works for me, not for you"), a slow/unreachable resolver adding latency to every request, or a misconfigured record. Debug by separating resolution from connectivity: dig NAME to check the answer and TTL, dig +trace to see which level of the hierarchy fails, and compare against a known-good resolver (dig @1.1.1.1 NAME). Only once DNS returns the right IP do you move on to ping/ss/tcpdump.
- "Why long vs short TTL?" — long = fast + less load but slow to change; short = quick propagation but more queries.
- "In Kubernetes?" — CoreDNS resolves Service names; a slow/failing CoreDNS makes the whole cluster "slow."
Q10What actually is a subnet, and why does it matter?
Whether the CIDR/local-vs-remote concept is solid — it underpins routing and cloud networking.
A subnet is a contiguous range of IPs sharing a prefix, written in CIDR like 10.0.1.0/24 — the /24 means the first 24 bits are the network, the last 8 identify hosts (256 addresses). It matters because it defines local vs remote: a machine delivers directly (ARP + frame) to destinations within its subnet, and sends everything else to the gateway. It's the basis of routing (longest-prefix match over subnets), of cloud network design (VPCs are carved into subnets), and of Kubernetes (each pod/node gets addresses from planned CIDR ranges — overlaps cause silent, painful bugs).
- "How many hosts in a /24? a /16?" — 254 usable, ~65k.
- "Two machines same subnet, can't talk?" — check they truly share the subnet mask; a mismatch makes each think the other is remote.