Chintan Patel Chintan Patel Co-Founder and Technology Lead at TrialX

How Twitter DDoS’d our Website 61 times in Past 2 days!!

We just spent our entire Sunday digging up server logs and conducting a post-mortem analysis on why our website (www.trialx.com) kept going down. According to pingdom.com, our site went down 61 times in the past 2 days! Our traffic numbers are very modest – we receive about a few hundred thousand visits per month – so we weren’t sure what was causing the frequent meltdowns.

Here is some background on our technology stack:

  • Hardware: Amazon EC2 High CPU medium (c1.medium) instance with 5 EC2 compute units and 1.7GB RAM
  • OS: Ubuntu (Kernel: 2.6.21.7-2.fc8xen # i686 GNU/Linux)
  • App stack: Apache, mod_python, Django/Python and PHP, MySQL

It all started on Friday evening, when we got a server down alert.

The standard method we follow is to go to the Amazon EC2 console and reboot the server and everything goes back to normal. And it did work. However, every few hours the server kept going down.  By Saturday morning, we started looking into the server logs and htop‘ing to see what was up? We noticed that MySQL and Apache were fighting for CPU at times causing the load to go above 1 at times. We decided it was now time to move the database to a separate server. Amazon Relational Data Service (RDS) seemed like a great alternative.  The move involved firing up a new RDS instance (db.m1.small) and mysqldump’in data from current DB and piping to MySQL on RDS then changing the connection parameters. The entire move took less than 2 hours of down time. We were back up with RDS fire-power! It was 3:00AM Sunday morning then.

Unfortunately, the excitement did not last long. By 4:30am the server went down again. We rebooted the server and tail -f’ed the Apache server log and started looking at each and every IPs hitting our server. The major hitters were usual suspects: GoogleBot, Bingbot, Yandex, and so on. We isolated one nasty MSNbot (wtf?) which was hitting us almost 2-3 times per second. Not cool. we had to politely ban it in our iptables.

In afternoon, we headed to Chinatown to get some Dim sum at Jing Fong but there were about a half-a-billion folks waiting (no kidding, the place sits about 300 people and still there was an hour wait!).

After we came back, we started benchmarking using Apache AB tool to simulate loads to different URLs on the website. The server did not budge, even with 10 requests per second. We implemented caching on the PHP/Wordpress pages using WP Super cache. No effect. The server still kept going down.

We looked at the access logs at timestamps just before the server went down but still nothing stood out. We turned on the Mysql LOG slow queries in RDS. Again, nothing stood out. Some geo-IP based queries rending TrialX widget took about 1-2 seconds but these were very few.

Around 4:30pm, three of us were looking at the Apache access log, htops, slow-queries, Apache mod_status.

And suddenly, we noticed the load shooting up to 2, 3, 4, 10, 30…!! And Apache logs were showing all user-agent strings of “TwitterBot”, “TweetmemeBot”, “PostRank”, “PyCURL” and all of them accessing URLs from our blogs with same Google Analytics parameters:

GET  /curetalk/2011/02/graft-rejection-treatments-clinical-trials-2/?utm_source=twitterfeed&utm_medium=twitter&utm_campaign=byte_diab

The htop output looked something like the image below. Essentially, the server went from a load of almost 0 to 31 in a matter of few seconds. Even swap memory was exhausted. And the site crumbled like an egg shell shattered by a hammer!

We made a tentative hypothesis that Twitterfeed, a service that we use to blast out to FaceBook and our Twitter accounts was the culprit behind the massive load that was bringing the server to its knees.

To test the hypothesis, we again rebooted the server and set the Twitterfeed to post at given time and there you go, we started getting a flurry of all Twitter related bots accessing the same URL.

Below are the user-Agents hitting our server almost instantly after Twitterfeed update:

  1. Twitterbot/0.1
  2. TweetmemeBot
  3. bitlybot
  4. PostRank/2.0 (postrank.com)
  5. SocialMedia Bot/1.5;
  6. PycURL/7.18.2
  7. Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp
  8. brainbot/1.0 (digitalbrain@brainstreammedia @digitalbrain
  9. Summify/1.0.1
  10. NING/1.0
  11. Java/1.6.0_07
  12. EarlyEdd http://earlyedd.com – Alpha
  13. Voyager/1.0
  14. kame-rt (support@backtype.com
  15. UnwindFetchor/1.0 (+http://www.gnip.com/)
  16. JS-Kit URL Resolver, http://js-kit.com/

Moreover, each of these bots was accessing the last 5 newly published blogs and hence we were receiving about 16×5 = 80 requests within the span of 2-3 seconds after each update that went to Twitterfeed. It was almost as if the site was under a DDoS attack. So many bots hitting the server instantly was causing the load to shoot up and then a massive meltdow (see image below).

As soon as our site's feed pushes updates to Twitter, an army of Twitter Bots (that presumably followed our account automatically too) hits the website - some 80 requests in like 2s. Just like a Pack of Wolves upon the scent of blood!

We now knew the issue was related to “Twitterfeed update”,  but we were curios to test whether the real culprit was Twitterfeed or Twitter?

So we opened up our browsers and logged into our account on Twitter.com with 3 tabs and pasted a link to our website in the tweets. We simulated an update for 9 new URLs and there we saw again…. load went up… 2, 3…4…10...  We looked at the logs and most of the same user-agents were hitting the server then.

Bottomline/Lesson Learnt

We learnt (in a very painful way) that the real-time updates on Twitter was “inviting” a set of bots almost instantly to our website to visit a set of URLs that was bringing the server to its knees. This happens because all these bots visit within a small span of 2-3 seconds making hundreds of requests instead of a randomized visit patterns like Googlebot or Bingbot.

Now can this scenario have a broader implication. We certainly think so. Given Twitter’s power of being a real-time broadcast medium, its strength also makes it a potential vulnerability. With just a few tweets, all of which point to a specific target website, its possible for one to direct the Bots to simultanesouly ping the URL in the tweet and thus bring down a site.

Future Directions

We have stopped the Twitterfeed for now, until we figure out a caching solution for our new URLs. We certainly don’t want to stop Twitter updates, as it brings in decent traffic to our website. We may implement some robot rules for the Twitter Bot Army to behave politely when they visit us.

Please let us know if you have any suggestions to handle such traffic/load scenarios. We’d like to hear your solutions.

Engineering @ TrialX

  • Arvind

    Is your server not caching properly?

    • conadmin

      no we do have a cache – however many of these bots made requests to pages that were the first request and hence the cache didnt contain the page. leading to performance hogs

  • Pingback: Tweets that mention How Twitter DDoS’d our Website 61 times in Past 2 days!! -- Topsy.com

  • MArk

    okay – sounds cool and interesting – but how is this a DDOS.

  • http://patrickmylund.com Patrick Mylund Nielsen

    You should reduce the number of processes that Apache will spawn. Your server is dying not necessarily because it can’t handle the load, but because, at some point, just enough processes are spawned to make the server run out of swap space. (It shouldn’t even be using swap memory in the first place–it’s extremely inefficient doing disk I/O.)

    It’s better to let some clients wait for previous requests to complete than spawning so many processes that the server simply chokes, which is what happens as soon as you run out of RAM/start swapping.

    An example (if you’re using Apache MPM Worker) — in apache2.conf/httpd.conf:

    StartServers 5
    MinSpareServers 2
    MaxSpareServers 10
    MaxClients 15
    MaxRequestsPerChild 0

    Try running ab with a large concurrency, e.g. “sudo ab -n 3000 -c 100 http://localhost” to see if you have RAM to spare when the process limit is hit, then increase the values (MaxClients in particular, as it’s the total amount of processes allowed) accordingly.

    Hope that helps.

    • chintan

      Pat, thanks a lot for this. We are going to make this change right away – we have been going down because of HN-driven traffic. Will ping you with the results.
      Thanks once again

  • http://twitter.com/mikegreenberg/ Mike

    Intentional or not, this is a Distributed Denial of Service in the truest sense of the word. “Many disjoint resources accessing the same resource all at once causing it to be unavailable for others to access.” That’s pretty much what’s happening.

    • Sk

      i sort of think the same…
      Taking it further, one can easily use such twitter bots to target sites. For example if one has even 10 twitter accnts and is followed by say 20 of these bots then its easy for one to send say 20 tweets with random url like

      1. user 1 – http://BringDownSite.com/1
      2. user 2 – http://BringDownSite.com/1
      3. user 3 – http://BringDownSite.com/2

      so that is 10 accnts, 20 tweets to 20 bots – that will get a 10x20X20 = 4000 times that the bots could hit BringDown.com in a burst.

  • http://www.lifeisaprayer.com/ Jeff Geerling

    It looks like you’re using quite a few different components/features on this site via WordPress plugins… you might want to look at how you can reduce the amount of PHP overhead you’re inducing by running all the different plugins. You could serve more ‘cold cache’ requests if the server didn’t have to load up all the different things on each request.

    At a minimum, make sure you have APC or some other PHP cache (in addition to other caches like WP super cache…).

  • http://www.virvo.com Virvo

    The problem with using EC2 – is that you are using EC2.

    This is a VPS – not a dedicated server – so dont expect it to perform like a dedicated server. Trying using a real dedicated server, and you will not have this problem.

    TBH, if your site comes down with 40 requests per second, you REALLY need to increase your server specs.

    • Chintan

      thanks – good to know about the EC2 server capabilities. Yes we are definitely looking into doing both a server upgrade and a system upgrade

  • Pingback: Liens de la Journée du 25/02/2011

  • Melissa

    try configuring your htaccess file to block these user agents

  • Pingback: RIP Google Health. A Heartful Retrospect from an App Developer.

  • Mike

    Increase your RAM for your instance also install and configure APC as Jeff suggested. You might want to try out varnish as a caching solution it is pretty good and there are lots of tutorials on the internet on how to configure and install it for use with wordpress and other CMS. Good luck. Varnish can also be used to load balance requests, even though I personally have not tried this.

  • Angel

    Hello, i have the same problem to this, i dont know how to block those bots, because of those bots, my sites have very bad traffic thats why my adsense payment was on hold.

    I tried to block those bots IPs and use robots.txt but no luck

    hope someone could help me to block those bots..

    thanks

    Kindly email the solutions here: einjhel25m@yahoo.com

    Thanks