PHP: checking links validity using multi-thread processes

janvier 12th, 2012
by Axel

I recently came to a situation where I had to check 1800 links validity. No chance to do it manually. I decided to write some PHP code to check these URLs using the PHP get_headers function. Checking an URL seems easy but it takes a tiny amount of time to process because it calls distant resources: DNS lookup, connection time, download time… for each request. Using a simple « while » PHP function, you may come across three issues:

  1. it may be very long to complete and it’s not always possible to increase the PHP maximum execution time
  2. the display: your web page will be blank for quite a long time until enough data is processed. Of course you may play with the buffer handler but still!
  3. the process: as PHP is not a multi-threaded language out of the box, it will check one link by one. Much longer…

Therefore, I developed a very simple peace of code to solve these issues. As it is working quite softly, I decided to share it with you.

How does it work? Let’s take an example: you are producing video content. Because it is highly recommended to, you also post your content on popular video sharing websites: youtube, dailymotion, kewego. You want to be able to check whether all your videos are still available on all these websites.

Use my application then! Edit the « index.php » file, set as many groups as sharing websites you use (in my case: 3) and feed the Checker class with items. Browse to the application with your favorite browser. You will see the list of all the items that need to be checked

Sorry about the design, I did it quickly ;-)
By default, the Checker will use 5 concurrent threads. You may change this value when instantiating the Checker:

// In index.php
// Using 8 threads
$app = new Checker(8);

You may now run the tests. Checks will be run using AJAX.
Tip: when you click on a pending check, it will be run instantly. Once a check is processed, you may click on it to browse to the corresponding URL

Now the essential: to download this script, click here: link_checker-0.1.zip

Note: It is obvious this script is not the best one, it is not the cleanest I’ve done, it will not fit everybody. The purpose is just to demonstrate how to render fake multi-threading support. Of course you may change it to fit your needs.

Tags: , , , , , , ,
Posted in Développement, Technologies | Comments (0)

No comments yet

Leave a Reply