Multi-threading technique indeed improves the efficiency of a program dramatically. For example, a web crawler I recently created can crawl one million URLs (parsing URL, doing DNS, sending/receiving with Socket, parsing response, etc) in a minute. Here I will list some key contents in order to share or reuse in the future.
Shared data
There should be some shared data between threads. For example, a CRITICAL_SECTION
object handling locking and unlocking (much faster than mutexes), data to be consumed (like a queue containing all URLs to be crawled). It’s recommended that we create a class to hold those shared data. Note InitializeCriticalSection()
needs to be called in the constructor of this class in order to use it.
Create/close threads
Threads can be created/closed with a function like below. threadStat
and threadCraw
are user-defined functions for single threads with def UINT threadCrawl(LPVOID pParam)
.
HANDLE *handles = new HANDLE[numThreads + 1];
// start stat thread
handles[numThreads] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)threadStat, &sharedData, 0, NULL);
// start N crawling threads
for (int i = 0; i < numThreads; i++)
{
handles[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)threadCrawl, &sharedData, 0, NULL);
}
// wait for N crawing threads to finish
// signal stats thread to quit, wait for it to terminate
for (int i = 0; i < numThreads + 1; i++)
{
WaitForSingleObject(handles[i], INFINITE);
CloseHandle(handles[i]);
}
Lock/unlock
With CRITICAL_SECTION
, we can lock and unlock as follows. Note we need to unlock before break
to avoid deadlock.
// lock
EnterCriticalSection(&(sharedData->cs));
if (sharedData->urlsQueue.empty())
{
// unlock
LeaveCriticalSection(&(sharedData->cs));
break;
}
url = sharedData->urlsQueue.front();
sharedData->urlsQueue.pop();
InterlockedIncrement(&(sharedData->numExtractedURLs));
// unlock
LeaveCriticalSection(&(sharedData->cs));
Atomic operations
To update the stats, we can use locking/unlocking, but it is often faster to directly use interlocked operations, each mapping to a single CPU instruction. Two examples of such functions is as below.
// increment by 1
InterlockedIncrement(&(sharedData->numExtractedURLs));
// add a number
InterlockedAdd(&(sharedData->numActiveThreads), -1);