Building Quality of Service (Qos) for your ADSL/Cable/Other Internet Link

 

Over the years the usage pattern on my Internet link has changed. There are now so many things in the house competing for that small link. Each of them has different requirements and priorities and this should be reflected in how they get access to the link. The main concern on the link is the limited upload speed and this is where the focus of this article is headed.

 

Using the system I am about to describe I can have a Bittorrent download/upload going at full speed, and make perfect quality VOIP calls at the same time as I browse the Internet with perfect responsiveness (well, as responsive and the Internet can be!). You can take all or bits of it to implement your own traffic control system.

 

This is intended for users with moderate to advanced Linux/Unix skills. This document is written assuming at least some basic knowledge in a few areas.

 

Some of the services/users/requirements for me are:

Description

Primary Requirements

Notes

VOIP

Low latency, low jitter

Interactive and packet loss or delay noticeable and annoying

General web surfing

Some latency acceptable

Reasonably interactive, some delays noticed but bearable

Guest access

Lower priority than those living in the house

Reasonably interactive, some delays noticed but bearable

Internet gaming

Low latency, low jitter

Interactive and packet loss or delay noticeable and annoying

Work-from-home VPN access

Higher priority than web surfing. Some latency acceptable.

Mostly low traffic, needs to be reliable. Can be peaky when access large files.

Bulk upload traffic

Use all remaining bandwidth, do not interfere with anything else

Not interactive. Used for regular automated website updates using FTP, Bittorrent etc

Bulk download traffic

Do not interfere with anything else

Not interactive. Large software updates, ISOs, Bittorrent etc

Outgoing site-to-site VPN

Solid reliable connection

Mostly low traffic

Incoming VPN

Solid reliable connection

Mostly low traffic

Incoming remote control

Low latency

Mostly low traffic as long as you are not stupid! Interactive, packet loss can be annoying

Internet Gaming VOIP

Low latency, should not interfere with gaming traffic

Low bandwidth. Interactive. Interactive, packet loss can be annoying

Skype

Low latency, high outgoing bandwidth when using video

Interactive and packet loss or delay noticeable and annoying

DNS

Low latency, low bandwidth

Needed for most operations

Me

I don’t want anyone else to interfere with what I am doing!

I’m happy to let low bandwidth, interactive traffic generated by others (eg VOIP, some Skype) be higher priority than me otherwise they just complain to me anyway!

Future

Who knows

System must be flexible

 

Well, what equipment do we have that we have to take into account/use/work-around?

The full network diagram is a bit too complicated and not really needed, so here’s an overview:

 

 

The Important Components

The QoS I have employed uses Linux Traffic Control – hence we need some sort of device capable of using this. The best place to do this is right at the Internet link (ie on the Router) so that you can capture all the traffic. It could be done on the Server, but then we miss the VOIP traffic – which is really important.

 

This makes the Router the single most important component. It might be a combined modem/router/wireless access point or just a plain router – but it needs to be capable of running Linux Traffic Control. This is not as hard as it might sound. I use DD-WT firmware (http://www.dd-wrt.com) on a Linksys router. DD-WRT firmware gives us command line access via ssh to a Unix/Linux like shell. From here we can access the required tc and iptables commands.

 

Now since we will be running tc and iptables commands (quite a few of them) we really need script to put it all together. This makes the script an important component.

 

The remaining components are optional but desirable depending on what you are doing.

 

The final important component is the server. You could do all of this without the server I guess, but I use it host the script. The script lives on a Samba share that is accessed by the router (to run the script if needed during testing) and also access by me to edit the script. This just makes my development/testing/tuning of the script a whole lot easier.

The server also hosts a proxy server. Now all the standard HTTP traffic in the house goes through it, including ALL guest traffic (if it can’t use a proxy then its too bad). I use the proxy server to determine which is guest traffic and which is not. Guest traffic then gets marked as such and when it reaches the router is handled appropriately.

 

So to that end, in my setup, Squid is another important component. It might be optional for you.

 

Traffic Control Architecture

Linux traffic control has many different types of queues/classes. I’ve taken what I found works well over time and put it all together.

Now I first got started on this when I realised that the QoS settings, even in DDWRT firmware (via the Admin GUI) were just not going to give me what I needed, as they were way too simple.

So I started searching the Internet for the best way to go. I first came across this page:

http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.cookbook.ultimate-tc.html

 

It gave me a good start, but I soon found I needed to expand it. Then I added some bits from this one:

http://www.arctic.org/~dean/scripts/wshape (which no longer exists by the looks of it)

I think this is probably moved to something like: http://lartc.org/wondershaper/

This is called the Wonder Shaper script. It was pretty good and I still have maintained a lot of its ideas/structure.

 

Then VOIP came along. That was a problem. VOIP had to share the bandwidth in a timesliced fashion and this resulted in dropouts during calls when heavy uploads were happening. This was bad.

That’s when I found this page: http://www.voip-info.org/wiki/view/QoS+with+Linux+using+PRIO+and+HTB

This was more advanced and got me started down the path I needed to go.

 

From here I have developed the class/queue structure that suits me best at the moment. It is shown in the following diagram:

 

 

I’m not going to go into detail about each of the queue/class types and what they do – you can find that at http://lartc.org/ and plenty of other sites.

 

This is how it generally all hangs together -

The Script

It is designed to run under DDWRT firmware, but it does also run under Fedora (where I used to use it). If your network suits running from your Linux server, then you should have no problems with a modern kernel compiled with all the required tc bits and pieces.

 

The script is heavy on blank lines (for readability) and comments. In fact, they make up about half the number of lines!

Still, the script is not really for beginners I guess. You will probably need some basic Linux/Unix skills to use it.

 

Get it here: tc.sh

Using the Script

The script is not perfect. I have tried to move most of the configuration information to the top of the script into variables. I just haven’t finished it yet.

Configuration

I used to run the script on my Linux server and hence the script behaves slightly differently and needs slightly different configuration information depending on where you run it. Actually I sometimes still run the script on the Linux server for testing.

 

I’m going to only do a high level overview of the config here and rely heavily on the comments in the script for the rest. Lets run through the script in order.

 

The first things you will need to configure are the hostname(s) you will run the script on and the names of the Ethernet interfaces the traffic passes through. The Internet interface is the most important.

 

Next we come to the bandwidth throttling section. Configure the appropriate numbers.

Set the UPLINK and DOWNLINK values. These do take some tuning and we address this in detail below.

 

Then we come to the part where we specify lists of source and destination IP addresses/netmask, and TCP/UDP source and destination port numbers. I personally don’t use any IP address/netmask config in this section because I really only care about source IP address and I’ve already described the problem with that above. You could specify some destination IP addresses for specific servers you wanted good service from.

 

You could play around with the starting bandwidth numbers in this section if you really wanted to, but it’s not really needed. The numbers highlighted below in RED are percentages:

LIM40010=`expr 30 \* $STD_TRAFFIC_MAX_RATE / 100`

LIM40020=`expr 20 \* $STD_TRAFFIC_MAX_RATE / 100`

LIM40030=`expr 10 \* $STD_TRAFFIC_MAX_RATE / 100`

LIM40040=`expr 5 \* $STD_TRAFFIC_MAX_RATE / 100`

LIM40050=`expr 1 \* $STD_TRAFFIC_MAX_RATE / 100`

 

Next we get much more into the guts of the script. You will see a section where I use iptables to “mark” the packets as they come in the LAN interface based on the source address. I’ve put this into an if-then construct as I only do this on my router. It looks like:

   iptables -t mangle -A PREROUTING -j MARK -i $mydev_lan -s 192.168.0.51 --set-mark 1

   iptables -t mangle -A PREROUTING -j MARK -i $mydev_lan -s 192.168.0.52 --set-mark 1

 

You only need to change the RED IP addresses to suit. These have to be “marked” like this on the LAN interface because if we try and do it on the Internet interface the source IP will always be the WAN IP address since we get the traffic after it is already NATted.

 

There is another iptables marking going on for 10:3 further down. That for an Internet-based first person shooter game which needs both a source IP and a source port specification to classify it.

   iptables -t mangle -A PREROUTING -j MARK -i $mydev_lan -s 192.168.2.2 --sport 32768 --set-mark 2

 

Below that there are some specific hardcoded filters to catch TCP ACK packets etc. No changes really needed there.

 

As you go through the script you might see some filters that are commented out, you can uncomment and use them if needed. For example, there is one in there to classify all UDP traffic.

Running it

I normally only run this script manually when I am testing it. It has been designed like an init script. Therefore there are some parameters you supply to it to make it do the right thing. The script has a basic usage message, but normally you will run it like this:

 

tc.sh start

 

The other common mode I use is:

 

tc.sh monitor X

where X is an optional number of seconds

 

I use the monitor mode when I am testing to make sure filters that I am changing are correctly grabbing the traffic. I run up a monitor session and then watch it as I generate the traffic I am trying to filter, making sure that the traffic is going into the right queues.

 

I develop and test the script using a SAMBA mount from my server. I mount up a share under /tmp/smbshare on the router and can run the script directly from there.

 

The way I implement this on my DDWRT firmware router more permanently is commented right at the top of the script.

Basically I grab all the tc and iptables lines that are created when the script runs and place them into the startup commands section of the DDWRT GUI.

This way, every time the router starts up it is not reliant on my server or SAMBA mount working, plus I can save the settings in a configuration backup.

Tuning the UPLINK and DOWNLINK Speeds

The whole premise of this system is that you throttle your real upload speed (not the speed your ISP says you have) to a little less than the actual so that you are put in control of the queues. If you set the UPLINK value to the same or more than your real uplink speed then you give control away to some other part of the network. Getting the correct value for UPLINK takes a little bit of tuning. Here’s how to do it –

 

What you need:

 

Now I’ll show you how I tune my uplink.

Make sure no Internet traffic is happening apart from what you create in this tuning exercise.

We are basically going to try a series of values and see how the ping times react to the values we have chosen. When we are happy with the ping times we can stop.

 

On the router, stop the traffic control:

# /tmp/smbshare/tc.sh stop

Disabling Traffic Control on ppp0:

 

I open a Windows Command Prompt/Linux terminal and ping a reliable/nearby IP address – leave this continually running. We will watch the ping times as we change the UPLINK value

64 bytes from target(66.102.11.104): icmp_seq=1 ttl=57 time=26.4 ms

64 bytes from target(66.102.11.104): icmp_seq=2 ttl=57 time=27.0 ms

64 bytes from target(66.102.11.104): icmp_seq=3 ttl=57 time=28.1 ms

64 bytes from target(66.102.11.104): icmp_seq=4 ttl=57 time=26.8 ms

We can call this ping time of around 27ms our baseline i.e. what we expect with no traffic

 

I start 4 parallel FTP uploads of very large files. This will saturate my bandwidth for as long as I need.

64 bytes from target(66.102.11.104): icmp_seq=16 ttl=57 time=51.0 ms

64 bytes from target(66.102.11.104): icmp_seq=18 ttl=57 time=112 ms

64 bytes from target(66.102.11.104): icmp_seq=19 ttl=57 time=863 ms

64 bytes from target(66.102.11.104): icmp_seq=20 ttl=57 time=1047 ms

64 bytes from target(66.102.11.104): icmp_seq=21 ttl=57 time=1126 ms

64 bytes from target(66.102.11.104): icmp_seq=22 ttl=57 time=1216 ms

64 bytes from target(66.102.11.104): icmp_seq=23 ttl=57 time=1514 ms

64 bytes from target(66.102.11.104): icmp_seq=25 ttl=57 time=1663 ms

64 bytes from target(66.102.11.104): icmp_seq=26 ttl=57 time=1522 ms

64 bytes from target(66.102.11.104): icmp_seq=27 ttl=57 time=1402 ms

The uplink is now starting to become saturated and without any control of the queues the ping responses get lost in the flood of packets. The response gets really bad.

 

Now I take my ISP uplink speed and set my UPLINK variable (in the script) to somewhere around 80% of what my ISP says I have. I chose 800 kbit/sec.

In the script

Set UPLINK=800

Save it. Now reload the router with the traffic control from the samba share:

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 800(776,776)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=29 ttl=57 time=968 ms

64 bytes from target(66.102.11.104): icmp_seq=30 ttl=57 time=839 ms

64 bytes from target(66.102.11.104): icmp_seq=31 ttl=57 time=709 ms

64 bytes from target(66.102.11.104): icmp_seq=32 ttl=57 time=590 ms

64 bytes from target(66.102.11.104): icmp_seq=33 ttl=57 time=482 ms

64 bytes from target(66.102.11.104): icmp_seq=34 ttl=57 time=363 ms

64 bytes from target(66.102.11.104): icmp_seq=35 ttl=57 time=243 ms

64 bytes from target(66.102.11.104): icmp_seq=36 ttl=57 time=123 ms

64 bytes from target(66.102.11.104): icmp_seq=37 ttl=57 time=72.1 ms

64 bytes from target(66.102.11.104): icmp_seq=38 ttl=57 time=70.9 ms

64 bytes from target(66.102.11.104): icmp_seq=56 ttl=57 time=44.0 ms

64 bytes from target(66.102.11.104): icmp_seq=57 ttl=57 time=40.9 ms

64 bytes from target(66.102.11.104): icmp_seq=58 ttl=57 time=29.2 ms

64 bytes from target(66.102.11.104): icmp_seq=59 ttl=57 time=30.4 ms

The ping times start settling back down and the traffic control takes effect

Our ping times are now back around the baseline value. We can now call this figure of 800 our “Safe” value – it’s the one we know will bring us back to reasonable ping times.

 

Set UPLINK=900

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 900(873,873)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=71 ttl=57 time=42.7 ms

64 bytes from target(66.102.11.104): icmp_seq=72 ttl=57 time=43.2 ms

64 bytes from target(66.102.11.104): icmp_seq=73 ttl=57 time=36.7 ms

64 bytes from target(66.102.11.104): icmp_seq=74 ttl=57 time=40.6 ms

64 bytes from target(66.102.11.104): icmp_seq=75 ttl=57 time=44.6 ms

64 bytes from target(66.102.11.104): icmp_seq=76 ttl=57 time=44.3 ms

64 bytes from target(66.102.11.104): icmp_seq=77 ttl=57 time=36.5 ms

64 bytes from target(66.102.11.104): icmp_seq=78 ttl=57 time=43.9 ms

64 bytes from target(66.102.11.104): icmp_seq=79 ttl=57 time=39.2 ms

This has had a slight impact on ping times but they are still reasonably well under control

 

Set UPLINK=1000

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 1000(970,970)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=86 ttl=57 time=48.4 ms

64 bytes from target(66.102.11.104): icmp_seq=87 ttl=57 time=113 ms

64 bytes from target(66.102.11.104): icmp_seq=88 ttl=57 time=202 ms

64 bytes from target(66.102.11.104): icmp_seq=89 ttl=57 time=294 ms

64 bytes from target(66.102.11.104): icmp_seq=90 ttl=57 time=402 ms

64 bytes from target(66.102.11.104): icmp_seq=91 ttl=57 time=501 ms

64 bytes from target(66.102.11.104): icmp_seq=92 ttl=57 time=602 ms

64 bytes from target(66.102.11.104): icmp_seq=93 ttl=57 time=703 ms

64 bytes from target(66.102.11.104): icmp_seq=94 ttl=57 time=802 ms

64 bytes from target(66.102.11.104): icmp_seq=95 ttl=57 time=891 ms

64 bytes from target(66.102.11.104): icmp_seq=96 ttl=57 time=992 ms

You can now see the ping times escalate once more, but this time with traffic control in place. The traffic control is now ineffective and pointless since we have now gone above the real uplink speed

 

Set UPLINK=800

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 800(776,776)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=102 ttl=57 time=1486 ms

64 bytes from target(66.102.11.104): icmp_seq=103 ttl=57 time=1059 ms

64 bytes from target(66.102.11.104): icmp_seq=104 ttl=57 time=943 ms

64 bytes from target(66.102.11.104): icmp_seq=105 ttl=57 time=794 ms

64 bytes from target(66.102.11.104): icmp_seq=106 ttl=57 time=654 ms

64 bytes from target(66.102.11.104): icmp_seq=107 ttl=57 time=538 ms

64 bytes from target(66.102.11.104): icmp_seq=108 ttl=57 time=426 ms

64 bytes from target(66.102.11.104): icmp_seq=109 ttl=57 time=300 ms

64 bytes from target(66.102.11.104): icmp_seq=110 ttl=57 time=176 ms

64 bytes from target(66.102.11.104): icmp_seq=111 ttl=57 time=67.9 ms

64 bytes from target(66.102.11.104): icmp_seq=112 ttl=57 time=36.4 ms

64 bytes from target(66.102.11.104): icmp_seq=113 ttl=57 time=30.8 ms

64 bytes from target(66.102.11.104): icmp_seq=114 ttl=57 time=26.8 ms

64 bytes from target(66.102.11.104): icmp_seq=115 ttl=57 time=27.6 ms

64 bytes from target(66.102.11.104): icmp_seq=116 ttl=57 time=30.8 ms

Set the uplink speed back to our safe value to bring the ping times back under control

 

Set UPLINK=950

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 950(921,921)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=126 ttl=57 time=42.2 ms

64 bytes from target(66.102.11.104): icmp_seq=127 ttl=57 time=92.2 ms

64 bytes from target(66.102.11.104): icmp_seq=128 ttl=57 time=125 ms

64 bytes from target(66.102.11.104): icmp_seq=129 ttl=57 time=164 ms

64 bytes from target(66.102.11.104): icmp_seq=130 ttl=57 time=201 ms

64 bytes from target(66.102.11.104): icmp_seq=131 ttl=57 time=238 ms

This attempt is not as bad as above but the ping times are still very large. This value is too high.

 

Set UPLINK=800

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 800(776,776)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=137 ttl=57 time=73.7 ms

64 bytes from target(66.102.11.104): icmp_seq=138 ttl=57 time=34.6 ms

64 bytes from target(66.102.11.104): icmp_seq=139 ttl=57 time=33.2 ms

64 bytes from target(66.102.11.104): icmp_seq=140 ttl=57 time=32.5 ms

64 bytes from target(66.102.11.104): icmp_seq=141 ttl=57 time=30.2 ms

Set the uplink speed back to our safe value to bring the ping times back under control

 

Split the difference between a known ok 900 and a known bad 950

Set UPLINK=925

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 925(897,897)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=149 ttl=57 time=51.1 ms

64 bytes from target(66.102.11.104): icmp_seq=150 ttl=57 time=61.2 ms

64 bytes from target(66.102.11.104): icmp_seq=151 ttl=57 time=70.4 ms

64 bytes from target(66.102.11.104): icmp_seq=152 ttl=57 time=77.8 ms

64 bytes from target(66.102.11.104): icmp_seq=153 ttl=57 time=89.9 ms

64 bytes from target(66.102.11.104): icmp_seq=154 ttl=57 time=109 ms

64 bytes from target(66.102.11.104): icmp_seq=155 ttl=57 time=123 ms

64 bytes from target(66.102.11.104): icmp_seq=156 ttl=57 time=133 ms

Not as bad as the 950, but still too large. This means we are around the top but still a little over the real uplink speed.

 

Set UPLINK=800

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 800(776,776)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=159 ttl=57 time=51.7 ms

64 bytes from target(66.102.11.104): icmp_seq=160 ttl=57 time=36.6 ms

64 bytes from target(66.102.11.104): icmp_seq=161 ttl=57 time=26.8 ms

64 bytes from target(66.102.11.104): icmp_seq=162 ttl=57 time=41.7 ms

64 bytes from target(66.102.11.104): icmp_seq=163 ttl=57 time=28.1 ms

64 bytes from target(66.102.11.104): icmp_seq=164 ttl=57 time=28.9 ms

Set the uplink speed back to our safe value to bring the ping times back under control

 

Set UPLINK=910

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 910(882,882)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=171 ttl=57 time=123 ms

64 bytes from target(66.102.11.104): icmp_seq=172 ttl=57 time=77.2 ms

64 bytes from target(66.102.11.104): icmp_seq=173 ttl=57 time=68.6 ms

64 bytes from target(66.102.11.104): icmp_seq=174 ttl=57 time=59.6 ms

64 bytes from target(66.102.11.104): icmp_seq=175 ttl=57 time=65.4 ms

64 bytes from target(66.102.11.104): icmp_seq=176 ttl=57 time=48.6 ms

910 makes the ping values get even better, but still higher than our baseline

 

Set UPLINK=900

# /tmp/smbshare/tc.sh start

Enabling Traffic Control on ppp0 to speed 900(873,873)/20000 kbit/s:

64 bytes from target(66.102.11.104): icmp_seq=190 ttl=57 time=37.5 ms

64 bytes from target(66.102.11.104): icmp_seq=191 ttl=57 time=35.6 ms

64 bytes from target(66.102.11.104): icmp_seq=192 ttl=57 time=40.9 ms

64 bytes from target(66.102.11.104): icmp_seq=193 ttl=57 time=39.7 ms

64 bytes from target(66.102.11.104): icmp_seq=194 ttl=57 time=36.3 ms

64 bytes from target(66.102.11.104): icmp_seq=195 ttl=57 time=43.8 ms

Dropping back to 900, brings the ping times down another level

 

We decide that setting the UPLINK value to 900 is the best setting and we leave it at that.

 

How do you tune the DOWNLINK value? Just set it to a little less than what you can download at your fastest rate or even just leave it higher. This one does not matter as much as the UPLINK value.

Final Notes

Wow, that ended up being longer than I though it was!

If you need some help or want to make some comments directly to me, use this contact form