In this post, the process for retroactively identifying and graphing a HTTPS DDoS of service condition is described. Why do we care about graphing, because it can be a great way to describe data to folks that may not be interested in looking at it in a tabular form, e.g. leadership. The specific example will use data collected from the server this blog is hosted on. If you are following along, this post assumes you have SiLK deployed in some manner and are collecting HTTP or similar traffic. Technically a DDoS condition did not occur (only two hosts were making a large number of requests) but blitz.io was used to exceed the network traffic this website typically experiences for sake of example. If a true DoS condition occurred, it would appear differently as the sensor is hosted on the same node therefore it would not record the surge in traffic. In order to record a true DoS, the sensor would ideally be placed upstream in the carrier or somewhere that exceeds the devices being monitored capacity. I would like to thank network defense analyst Geoffrey Sanders for providing R langauge as well as statistical recommendations in order to improve data analysis and graphical representations.
If we would like to retroactively search for anomaly in traffic volume, we can query a number of days and look for unusual spikes:
for DAY in {1..31}; do
if [ ${DAY} -le 9 ]; then
DAY=0${DAY}
fi
RESULT=$(rwfilter --start-date=2015/07/${DAY} --end-date=2015/07/${DAY} --dport=443 --pass=stdout --type=all|rwuniq --fields=dport,proto -
-values=records --no-col --no-final-del --no-title --packets=20-)
echo "2015/07/${DAY}|${RESULT}" >> http.out
done
On the 21st, we see a large number of requests that significantly exceed other days:
2015/07/01|443|6|1443|33287
2015/07/02|443|6|1271|30583
2015/07/03|443|6|1776|32622
2015/07/04|443|6|1498|28316
2015/07/05|443|6|1124|34428
2015/07/06|443|6|1672|36113
2015/07/07|443|6|1298|31087
2015/07/08|443|6|1629|40990
2015/07/09|443|6|42005|750922
2015/07/10|443|6|1656|54450
2015/07/11|443|6|1464|40205
2015/07/12|443|6|1279|22251
2015/07/13|443|6|1884|40887
2015/07/14|443|6|1724|49821
2015/07/15|443|6|1635|37133
2015/07/16|443|6|1653|33433
2015/07/17|443|6|1695|37580
2015/07/18|443|6|1301|24899
2015/07/19|443|6|1445|29230
2015/07/20|443|6|1314|40543
2015/07/21|443|6|70533|817855
2015/07/22|443|6|1909|42257
2015/07/23|443|6|1462|47961
2015/07/24|443|6|1705|37581
2015/07/25|443|6|1150|27093
2015/07/26|443|6|1208|21267
2015/07/27|443|6|1597|32414
2015/07/28|443|6|1714|45208
2015/07/29|443|6|1702|35607
2015/07/30|443|6|1710|46748
2015/07/31|443|6|1514|47915
We can use a similar query broken down by hour for the questionable day:
for HOUR in {0..23}; do if [ ${HOUR} -le 9 ]; then
HOUR=0${HOUR}
fi
RESULT=$(rwfilter --start-date=2015/07/21:${HOUR} --end-date=2015/07/21:${HOUR} --dport=443 --pass=stdout --type=all|rwuniq --fields=dport
,proto --values=records --no-col --no-final-del --no-title --packets=20-)
echo "${HOUR}|${RESULT}" >> http-hour.out
done
The results from the hourly query clearly depict when the surge of HTTPS traffic volume occurred. From here, an analyst may run more specific queries to determine if it is indeed a distributed attack or sourced from only a few nodes.
00|443|6|72|1392
01|443|6|48|3203
02|443|6|151|2605
03|443|6|173|1612
04|443|6|125|2318
05|443|6|149|2622
06|443|6|72|1450
07|443|6|71|1294
08|443|6|73|1524
09|443|6|76|1881
10|443|6|67|1412
11|443|6|823|6720
12|443|6|60|1639
13|443|6|65|1511
14|443|6|72|2987
15|443|6|121|2061
16|443|6|69|2135
17|443|6|67562|722727
18|443|6|203|3222
19|443|6|112|4004
20|443|6|99|1526
21|443|6|122|44746
22|443|6|94|2129
23|443|6|54|1135
We can represent the tabular data from the daily and hourly queries using R Project and ggplot2. For the daily plot example, add a header day|dPort|protocol|Records|Packets
to the dataset and run Rscript filename.r dataset.dat
replacing the command directives with the script below and your dataset:
library(ggplot2)
library(reshape2)
options("scipen"=100, "digits"=4)
fname <- commandArgs(trailingOnly = TRUE)[1]
flowrecs <- read.table(fname, header = TRUE, sep = "|")
flowrecs$day <- as.Date(flowrecs$day, "%Y/%m/%d")
test_data_long <- melt(flowrecs, id.vars=c("day", "dPort", "protocol"))
flow.plot <- ggplot(data=test_data_long,
aes(x=day, y=value, colour=variable))
geom_line() + geom_point() + xlab("Day") + ylab("Flow Records with 20+ Packets")
ggtitle(paste("Flow Records by Destination Port"))
png("plot.png", width=1200, height=400)
plot(flow.plot)
dev.off()
Which should provide a graphical representation similar to:
Similarly, we can do the same thing with the hourly plot by specifying the correct header of hour|dPort|protocol|Records|Packets
and rerunning Rscript
in the same manner as the daily plot.
library(ggplot2)
library(reshape2)
options("scipen"=100, "digits"=4)
fname <- commandArgs(trailingOnly = TRUE)[1]
flowrecs <- read.table(fname, header = TRUE, sep = "|")
flowrecs$hour <- factor(flowrecs$hour, levels=unique(flowrecs$hour))
test_data_long <- melt(flowrecs, id.var=c("hour", "dPort", "protocol"))
flow.plot <- ggplot(data=test_data_long,
aes(x=hour, y=value, colour=variable, group=variable)) +
geom_line() + geom_point() + xlab("Hour") + ylab("Flow Records with 20+ Packets")
ggtitle(paste("Flow Records by Destination Port"))
png("plot.png", width=1200, height=400)
plot(flow.plot)
dev.off()
This depicts the two large sets of requests we had in the single day:
We can use rwstats
in order to take a look at our top talkers if we are aware of congestion or other signs that the uniformity of visitors has changed. This query is a little artificial though. It is very possible that the source attacks may come from hundreds or even thousands of bots or some reflection mechanism depending on the service. If that is the case, we may have to look at other tuples or the actual request in order to determine a similarity between the distributed attack sources.
$ rwfilter --start-date=2015/7/21 --end-date=2015/7/21 --dport=443 --pass=stdout --type=all|rwstats --fields=sip --count=10 --no-col --no-final-del
INPUT: 70533 Records for 551 Bins and 70533 Total Records
OUTPUT: Top 10 Bins by Records
sIP|Records|%Records|cumul_%
54.173.173.209|33704|47.784725|47.784725
54.86.98.210|33698|47.776218|95.560943
162.243.196.54|742|1.051990|96.612933
180.76.15.142|97|0.137524|96.750457
68.180.230.230|75|0.106333|96.856790
180.76.15.140|37|0.052458|96.909248
72.80.60.139|31|0.043951|96.953199
66.249.67.118|31|0.043951|96.997150
180.76.15.136|24|0.034027|97.031177
63.254.26.10|23|0.032609|97.063786
We can append rwresolve
in order resolve a specific IP field. We see two Amazon hosts whom are likely the Blitz.io bots as they comprise 96% of the traffic for the defined time threshold:
$ rwfilter --start-date=2015/7/21 --end-date=2015/7/21 --dport=443 --pass=stdout --type=all|rwstats --fields=sip --count=10 --no-col --no-final-del|rwresolve --ip-fields=1
INPUT: 70533 Records for 551 Bins and 70533 Total Records
OUTPUT: Top 10 Bins by Records
sIP|Records|%Records|cumul_%
ec2-54-173-173-209.compute-1.amazonaws.com|33704|47.784725|47.784725
ec2-54-86-98-210.compute-1.amazonaws.com|33698|47.776218|95.560943
162.243.196.54|742|1.051990|96.612933
baiduspider-180-76-15-142.crawl.baidu.com|97|0.137524|96.750457
b115504.yse.yahoo.net|75|0.106333|96.856790
baiduspider-180-76-15-140.crawl.baidu.com|37|0.052458|96.909248
pool-72-80-60-139.nycmny.fios.verizon.net|31|0.043951|96.953199
crawl-66-249-67-118.googlebot.com|31|0.043951|96.997150
baiduspider-180-76-15-136.crawl.baidu.com|24|0.034027|97.031177
mail.oswaldcompanies.com|23|0.032609|97.063786
Both of the graphs above describe the anomalous traffic but our normal traffic is no longer clear. One way we can provide a more clarification is to use statistics in order to more effectively describe the data because of the significant outliers. In order to achieve this, we will first use a log
function in order to describe the data volumes. We use the same scripts as earlier, but change y=value
to y=log(value)
.
While using a log
function provided an improvement, it may not provide an accurate representation of volume data types. Next, we will take a look at percentiles with R. Our data frame is composed of three days. The middle being the 21st which contains our fictitious DoS attack.
> mydata
[,1] [,2] [,3]
[1,] 1252 1392 1551
[2,] 1347 3203 1969
[3,] 749 2605 1642
[4,] 2232 1612 1432
[5,] 707 2318 1531
[6,] 552 2622 1175
[7,] 1072 1450 1981
[8,] 487 1294 1606
[9,] 1448 1524 959
[10,] 867 1881 1763
[11,] 903 1412 1283
[12,] 911 6720 3055
[13,] 1125 1639 3609
[14,] 1511 1511 1977
[15,] 1792 2987 2476
[16,] 912 2061 1722
[17,] 1114 2135 655
[18,] 424 722727 1338
[19,] 3888 3222 4038
[20,] 3646 4004 1765
[21,] 9281 1526 1650
[22,] 1590 44746 1190
[23,] 1131 2129 1190
[24,] 1602 1135 700
Here is how we get to our graph in the R console:
mydata <- matrix(ncol=24, nrow=3)
mydata <- matrix(df$Packets, ncol=3, nrow=24)
dataout <- apply(mydata, 1, quantile, probs=c(0.05, 0.5, 0.90))
ylim=range(500,5000)
plot(seq(ncol(dataout)), dataout[1,], t="l", lty=2, ylim=ylim, main="Flow Percentiles", xlab="Hour",
ylab="Packets") #5%
lines(seq(ncol(dataout)), dataout[2,], lty=1, lwd=2) #50%
lines(seq(ncol(dataout)), dataout[3,], lty=2, col=2) #90%
legend("topleft", legend=rev(rownames(dataout)), lwd=c(1,2,1), col=c(2,1,1), lty=c(2,1,2))
The 90th percentile really stands out here but having to use y
limit to see our lower percentiles prevents us from seeing the whole picture. Let us graph both again but splitting our lower and upper bounds.
mydata <- matrix(ncol=24, nrow=3)
mydata <- matrix(df$Packets, ncol=3, nrow=24)
dataout <- apply(mydata, 1, quantile, probs=c(0.01, 0.05, 0.5))
ylim=range(100,4200)
plot(seq(ncol(dataout)), dataout[1,], t="l", lty=2, ylim=ylim, main="Flow Percentiles", xlab="Hour", ylab="Packets") #1%
lines(seq(ncol(dataout)), dataout[2,], lty=1, lwd=2) #5%
lines(seq(ncol(dataout)), dataout[3,], lty=2, col=2) #50%
legend("topleft", legend=rev(rownames(dataout)), lwd=c(1,2,1), col=c(2,1,1), lty=c(2,1,2))
mydata <- matrix(ncol=24, nrow=3)
mydata <- matrix(df$Packets, ncol=3, nrow=24)
dataout <- apply(mydata, 1, quantile, probs=c(0.90, 0.95))
ylim=range(dataout)
plot(seq(ncol(dataout)), dataout[1,], t="l", lty=2, ylim=ylim, main="Flow Percentiles", xlab="Hour", ylab="Packets") #90%
lines(seq(ncol(dataout)), dataout[2,], lty=1, lwd=2) #95%
legend("topleft", legend=rev(rownames(dataout)), lwd=c(1,2,1), col=c(2,1,1), lty=c(2,1,2))
This is a better analytical view. We can infer that if we see traffic at the 90 percentile, likely something is off. For the heck of it, let us see how the percentiles compare to the mean and median, the former not being necessary as we already included it in the percentile two examples above but nevertheless.
ylim=range(500,5000)
mydata <- matrix(ncol=24, nrow=3)
mydata <- matrix(df$Packets, ncol=3, nrow=24)
meandata <- apply(mydata, 1, mean)
mediandata <- apply(mydata, 1, median)
plot(meandata, t="l", lty=2, ylim=ylim, main="Flows", xlab="Hour", ylab="Packets")
lines(mediandata, lty=1, lwd=2)
legend("topleft", legend=c("Mean", "Median"), lwd=c(1,2), col=c(1,1), lty=c(2,1))
The tabular data outliers are obvious but we will graph it. Based on this, we could leverage around 5 to 10 percentile for normal traffic but much larger sampling would need to take place as we only used three days.
> meandata
[1] 1398.333 2173.000 1665.333 1758.667 1518.667 1449.667
[7] 1501.000 1129.000 1310.333 1503.667 1199.333 3562.000
[13] 2124.333 1666.333 2418.333 1565.000 1301.333 241496.333
[19] 3716.000 3138.333 4152.333 15842.000 1483.333 1145.667
> mediandata
[1] 1392 1969 1642 1612 1531 1175 1450 1294 1448 1763 1283 3055 1639 1511 2476
[16] 1722 1114 1338 3888 3646 1650 1590 1190 1135
Last but not least, we are going to take a look at a tool the NetSA team has developed for graphically representing data named Rayon. I will provide a quick reference, but you can find more details here and here. As with R language, the outliers distort the graph so we use log
functions for the second and third graphs in order to minimize the outlier effects. First, grab the data we are interested in:
$ rwfilter --start-date=2015/7/21 --end-date=2015/7/21 --dport=443 --proto=6 --type=inweb --pass=httpsin.bin
$ rwfilter --start-date=2015/7/21 --end-date=2015/7/21 --dport=443 --proto=6 --type=outweb --pass=httpsout.bin
Next, export the values we need, we provide a snippet of our data:
$ rwcount --bin-size=300 --no-titles --delimited httpsin.bin|awk -F\| '{printf("%s|%s|in\n", $1, $3)}' > 2-top.txt
--snip--
2015/07/21T00:00:00|5003.00|in
2015/07/21T00:05:00|16677.47|in
2015/07/21T00:10:00|4814.53|in
2015/07/21T00:15:00|4951.00|in
2015/07/21T00:20:00|1440.00|in
2015/07/21T00:25:00|10055.00|in
2015/07/21T00:30:00|5410.06|in
2015/07/21T00:35:00|1356.94|in
2015/07/21T00:40:00|4346.32|in
2015/07/21T00:45:00|10125.04|in
2015/07/21T00:50:00|7178.64|in
2015/07/21T00:55:00|16766.00|in
--snip--
$ rwcount --bin-size=300 --no-titles --delimited httpsout.bin|awk -F\| '{printf("%s|%s|out\n", $1, $3)}' > 2-btm.txt
--snip--
2015/07/21T00:00:00|60615.00|out
2015/07/21T00:05:00|317387.87|out
2015/07/21T00:10:00|214138.13|out
2015/07/21T00:15:00|60527.00|out
2015/07/21T00:20:00|3500.00|out
2015/07/21T00:25:00|76385.00|out
2015/07/21T00:30:00|77113.44|out
2015/07/21T00:35:00|32326.56|out
2015/07/21T00:40:00|39375.58|out
2015/07/21T00:45:00|96888.67|out
2015/07/21T00:50:00|30598.75|out
2015/07/21T00:55:00|313460.00|out
--snip--
We graph the values with rwtimeseries
. As expected, the incoming traffic is less than the outgoing HTTPS response. We adjusted the scale of the second and third graph using log
, and the last Rayon graph describes data between the 95 and 100 percentiles.
$ cat 2-top.txt 2-btm.txt | rytimeseries --style=filled_lines --output-path=2.png --top-filter="[2]==in" --bottom-filter="[2]==out" --top-column=1 --bottom-column=1 --annotate-max --value-tick-label-format=metric --value-units=B --title="Traffic to/from Web Servers" --value-scale=linear
$ cat 2-top.txt 2-btm.txt | rytimeseries --style=filled_lines --output-path=2.png --top-filter="[2]==in" --bottom-filter="[2]==out" --top-column=1 --bottom-column=1 --annotate-max --value-tick-label-format=metric --value-units=B --title="Traffic to/from Web Servers" --value-scale=log --`fix-scale-min=1
$ cat 2-top.txt 2-btm.txt | rytimeseries --style=filled_lines --output-path=2.png --top-filter="[2]==in" --bottom-filter="[2]==out" --top-column=1 --bottom-column=1 --annotate-max --value-tick-label-format=metric --value-units=B --title="Traffic to/from Web Servers" --value-scale=clog
$ cat 2-top.txt 2-btm.txt | rytimeseries --style=filled_lines --output-path=2.png --top-filter="[2]==in" --bottom-filter="[2]==out" --top-column=1 --bottom-column=1 --annotate-max --value-tick-label-format=metric --value-units=B --title="Traffic to/from Web Servers" --value-scale=linear --value-min-pct=90 --value-max-pct=95
There you go. A quick and dirty way to identify traffic surges to whatever services you have sitting behind your collector. Please leave any questions you have regarding this post below.
Comments
comments powered by Disqus