Detailed explanation of multithreaded misuse of an application

1. Requirements and preliminary implementation
A very simple Windows service: the client connects to the mail server, downloads the mail (including attachments) and saves it in .eml format, and deletes the mail on the server after the save is successful. The pseudocode implemented is roughly as follows:

Copy the codeThe code is as follows:

      public void Process()
        {
var recordCount = 1000;//Number of mail records each time
            while (true)
            {
                using (var client = new Pop3Client())
                {
//1. Establish a connection and perform identity authentication
                    (server, port, useSSL);
                    (userName, pwd);

var messageCount = (); // Number of existing mails in the mailbox
 if (messageCount > recordCount)
 {
 messageCount = recordCount;
 }
 if (messageCount < 1)
 {
 break;
 }
var listAllMsg = new List<Message>(messageCount); // Used to temporarily save the retrieved email

//2. Remove the email and fill it in the list, and at most recordCount will be blocked each time
for (int i = 1; i <= messageCount; i++) //The mailbox index starts based on 1, index range: [1, messageCount]
{
((i)); //Pick out the email to the list
}

//3. Traverse and save to the client, format is .eml
                    foreach (var message in listAllMsg)
                    {
                        var emlInfo = new (("{0}.eml", ().ToString("n")));
(emlInfo);//Save the email as a .eml format file
                    }

//4. Traverse and delete
                    int messageNumber = 1;
                    foreach (var message in listAllMsg)
                    {
(messageNumber); //Delete the email (essentially, just put the DELETE tag before closing the connection, and it is not really deleted)
                        messageNumber++;
                    }

//5. Disconnect and truly complete the deletion
();

if (messageCount < recordCount)
 {
 break;
 }
 }
 }
 }

When receiving emails during development, open source components were used (actually this is a union of the two projects with OpenPop), and the implementation of calling the interface is very simple. After writing the code, I found that the basic functions were met. Based on the principle of faster and more efficient on the basis of stability, I finally performed performance tuning.

2. Performance tuning and bug generation analysis
For the time being, no matter whether the time-consuming operations here are computationally intensive or IO-intensive, anyway, when someone sees that there are sets that need to be processed one by one, they can't help but have the urge to operate asynchronously in parallel with multiple threads. Asynchronous asynchronous if conditions are asynchronous, asynchronous if conditions are not asynchronous, and asynchronous if conditions are created, we must truly give full play to the advantages of multi-threading, make full use of the server's powerful processing capabilities, and also confidently write many multi-threaded programs. This business logic is relatively simple and exception handling is easier to control (even if there are problems, there are compensation measures, and it can be improved in post-processing). In theory, the number of emails that need to be checked every day will not be too many, and it will not become a CPU and memory killer for a long time. Such a multi-threaded asynchronous service implementation should be acceptable. And according to the analysis, it is obvious that this is a typical frequently accessed network IO-intensive application, and of course, it requires working hard on IO processing.

1. Receive emails
From the example code, you can see that fetching messages requires an index starting from 1 and must be ordered. If multiple requests are initiated asynchronously, how can this index be passed in? The must-be ordered article starts to make me hesitate. If it is constructed through synchronous structures such as Lock or Interlocked, it is obvious that the advantage of multi-threading is lost. I guess it may not be as fast as the sequential synchronization acquisition speed.

Analysis is analysis, let’s write some code to try to see how efficient it is.

Quickly write an asynchronous method to pass integer parameters, and at the same time control the change of the total number of messages extracted through Interlocked. After each asynchronous method is obtained, add the Message to the listAllMsg list through Lock.

There are not many emails for the mail server, so I get one or two emails for the test. Well, it’s very good. The email is successfully extracted, and there will be rewards after the initial adjustment. It’s a gratifying congratulations.

2. Save email
The tuning process is as follows: the implementation code of traversing and saving as .eml is changed to using multi-threading, and the save operations are processed in parallel. After testing, one or two emails are saved, and the CPU does not see much higher. The saving efficiency seems to have improved slightly, and it has improved a bit.

3. Delete emails
Tune again: Follow the multi-threaded save operation, modify the code that traverses the deleted emails, and also handle the deleted operation in parallel through multi-threaded. OK, very good, very good. At this time, I was thinking about Thread, ThreadPool, CCR, TPL, EAP, APM. Use everything I know and use, and choose the best and most efficient one, which seems very technical, hahaha.

Then, I quickly wrote an asynchronous deletion method to start the test. When there are not many emails, such as a few letters, it can work normally, and it seems quite fast.

At this point, I have begun to prepare to celebrate the completion of my mission.

4. Analysis of the causes of bugs
Judging from the independent effects of 1, 2, and 3 above, it seems that each thread can run independently without communication with each other or sharing data. Moreover, it uses asynchronous multi-threading technology, and the fetching, the fast storage, the fast deletion, and the fast deletion, and it seems that the email processing will enter the best state. But finally, the integrated joint debugging test is extracted, saved and deleted. After running for a while, the tragedy happened:

When there are many test emails, such as about 20 or 30 letters, you will see a PopServerException exception in the log, which seems to be a little garbled, and the garbled code seems to be different every time. After testing three or two letters, you will find that sometimes it can work normally, sometimes it will throw PopServerException exceptions, or there are still garbled codes, and the analysis of the error stack is where the email is deleted.

I kao, what's going to do? Is the relationship with the mail server not well done? Why do I always have PopServerException exceptions?

Could it be that there is a problem with the asynchronous deletion method? Asynchronous deletion, index number 1, um, index problem? Still not sure.

At this point, can you find out the reason why multi-threaded processing delete operation throws exceptions? Do you already know the reason? OK, the following content is meaningless to you, so you don’t have to read it down.

Let me talk about my investigation.

Looking at the log, I initially suspected that there was a problem with the method of deleting emails, but after reading it, it was still reliable. Then it is estimated that the email code was incorrect when deleting, and later I thought it was unlikely that the same email synchronization code was retrieved, saved and deleted. There was no exception thrown in these three operations. I was not very relieved, so I tested several emails in several times. There were attachments but no attachments, and the html plain text, and the synchronization code was handled very well.

I couldn't figure it out, so I opened the source code and tracked the SendCommand method in the Pop3Client class viewed from the DeleteMessage method. I suddenly felt cluested. The source code of DeleteMessage deletes email is as follows:

Copy the codeThe code is as follows:

        public void DeleteMessage(int messageNumber)
        {
            AssertDisposed();

ValidateMessageNumber(messageNumber);

if (State != )
throw new InvalidUseException("You cannot delete any messages without authenticating yourself towards the server first");

SendCommand("DELE " + messageNumber);
}

The last line SendCommand needs to submit a DELE command, follow up and see how it is implemented:

Copy the codeThe code is as follows:

        private void SendCommand(string command)
        {
            // Convert the command with CRLF afterwards as per RFC to a byte array which we can write
            byte[] commandBytes = (command + "\r\n");

            // Write the command to the server
            (commandBytes, 0, );
            (); // Flush the content as we now wait for a response

// Read the response from the server. The response should be in ASCII
LastServerResponse = (InputStream);

IsOkResponse(LastServerResponse);
}

Pay attention to the InputStream and OutputStream properties, their definitions are as follows (the magical private modification attribute, which is rare in this way):

Copy the codeThe code is as follows:

/// <summary>
 /// This is the stream used to read off the server response to a command
 /// </summary>
 private Stream InputStream { get; set; }

/// <summary>
 /// This is the stream used to write commands to the server
 /// </summary>
 private Stream OutputStream { get; set; }

The place where it is assigned a value is to call the public void Connect (Stream inputStream, Stream outputStream) method in the Pop3Client class, and the Connect method finally called by this Connect method is as follows:

Copy the codeThe code is as follows:

/// <summary>
 /// Connects to a remote POP3 server
 /// </summary>
 /// <param name="hostname">The <paramref name="hostname"/> of the POP3 server</param>
 /// <param name="port">The port of the POP3 server</param>
 /// <param name="useSsl">True if SSL should be used. False if plain TCP should be used.</param>
 /// <param name="receiveTimeout">Timeout in milliseconds before a socket should time out from reading. Set to 0 or -1 to specify infinite timeout.</param>
 /// <param name="sendTimeout">Timeout in milliseconds before a socket should time out from sending. Set to 0 or -1 to specify infinite timeout.</param>
 /// <param name="certificateValidator">If you want to validate the certificate in a SSL connection, pass a reference to your validator. Supply <see langword="null"/> if default should be used.</param>
 /// <exception cref="PopServerNotAvailableException">If the server did not send an OK message when a connection was established</exception>
 /// <exception cref="PopServerNotFoundException">If it was not possible to connect to the server</exception>
 /// <exception cref="ArgumentNullException">If <paramref name="hostname"/> is <see langword="null"/></exception>
 /// <exception cref="ArgumentOutOfRangeException">If port is not in the range [<see cref=""/>, <see cref=""/> or if any of the timeouts is less than -1.</exception>
 public void Connect(string hostname, int port, bool useSsl, int receiveTimeout, int sendTimeout, RemoteCertificateValidationCallback certificateValidator)
 {
 AssertDisposed();

if (hostname == null)
throw new ArgumentNullException("hostname");

if ( == 0)
throw new ArgumentException("hostname cannot be empty", "hostname");

if (port > || port < )
throw new ArgumentOutOfRangeException("port");

if (receiveTimeout < -1)
throw new ArgumentOutOfRangeException("receiveTimeout");

if (sendTimeout < -1)
throw new ArgumentOutOfRangeException("sendTimeout");

if (State != )
throw new InvalidUseException("You cannot ask to connect to a POP3 server, when we are already connected to one. Disconnect first.");

            TcpClient clientSocket = new TcpClient();
            = receiveTimeout;
            = sendTimeout;

            try
            {
                (hostname, port);
            }
            catch (SocketException e)
            {
                // Close the socket - we are not connected, so no need to close stream underneath
                ();

                ("Connect(): " + );
                throw new PopServerNotFoundException("Server not found", e);
            }

            Stream stream;
            if (useSsl)
            {
                // If we want to use SSL, open a new SSLStream on top of the open TCP stream.
                // We also want to close the TCP stream when the SSL stream is closed
                // If a validator was passed to us, use it.
                SslStream sslStream;
                if (certificateValidator == null)
                {
                    sslStream = new SslStream((), false);
                }
                else
                {
                    sslStream = new SslStream((), false, certificateValidator);
                }
                = receiveTimeout;
                = sendTimeout;

// Authenticate the server
(hostname);

                stream = sslStream;
            }
            else
            {
                // If we do not want to use SSL, use plain TCP
                stream = ();
            }

// Now do the connect with the same stream being used to read and write to
Connect(stream, stream); //In/OutputStream property initialization
}

I suddenly saw the TcpClient object. Isn’t this based on Socket and implementing POP3 protocol operation instructions through Socket programming? There is no doubt that you need to initiate a TCP connection, such as three handshakes, sending commands to operate the server... I remembered it all at once.

We know that a TCP connection is a session, and sending commands (such as obtaining and deleting) requires communication with the mail server through the TCP connection. If multiple threads send commands (such as obtaining (TOP or RETR) or deleting (DELE)) to operate the server on a session, the operations of these commands are not thread-safe. This is likely to cause the OutputStream and InputStream data to fight each other, which is likely to be the reason why there is garbled code in the log we see. Speaking of thread safety, I suddenly realized that I think there should be problems with checking emails. To verify my idea, I checked the source code of the GetMessage method:

Copy the codeThe code is as follows:

        public Message GetMessage(int messageNumber)
        {
            AssertDisposed();

ValidateMessageNumber(messageNumber);

if (State != )
throw new InvalidUseException("Cannot fetch a message, when the user has not been authenticated yet");

byte[] messageContent = GetMessageAsBytes(messageNumber);

return new Message(messageContent);
}

The internal GetMessageAsBytes method finally followed the SendCommand method:

Copy the codeThe code is as follows:

      if (askOnlyForHeaders)
            {
                // 0 is the number of lines of the message body to fetch, therefore it is set to zero to fetch only headers
                SendCommand("TOP " + messageNumber + " 0");
            }
            else
            {
                // Ask for the full message
                SendCommand("RETR " + messageNumber);
            }

According to my tracking, the garbled code for throwing exceptions in the test comes from LastServerResponse(This is the last response the server sent back when a command was issued to it). In the IsOKResponse method, it does not start with "+OK" and will throw a PopServerException exception:

Copy the codeThe code is as follows:

/// <summary>
 /// Tests a string to see if it is a "+OK" string. 
 /// An "+OK" string should be returned by a compliant POP3
 /// server if the request could be served. 
 /// 
 /// The method does only check if it starts with "+OK".
 /// </summary>
 /// <param name="response">The string to examine</param>
 /// <exception cref="PopServerException">Thrown if server did not respond with "+OK" message</exception>
 private static void IsOkResponse(string response)
 {
 if (response == null)
 throw new PopServerException("The stream used to retrieve responses from was closed");

if (("+OK", ))
return;

throw new PopServerException("The server did not respond with a +OK response. The response was: \"" + response + "\"");
}

After analyzing this, I finally know that the biggest trap is that Pop3Client is not thread-safe. Finally found the reason, hahaha, at this moment I was extremely excited and excited as if I saw the goddess appear. I was so happy that I almost forgot that the wrong code was written by myself.

After a moment, I finally calmed down and reflected on my very low-level mistakes and was so fainted. How could I forget about TCP and thread safety? Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

By the way, when saving as .eml, it is through the SaveToFile method of the Message object, and does not need to communicate with the mail server, so there is no exception when saving asynchronously (the binary array RawMessage will not match the data). Its source code is as follows:

Copy the codeThe code is as follows:

/// <summary>
 /// Save this <see cref="Message"/> to a file. 
 /// 
 /// Can be loaded at a later time using the <see cref="LoadFromFile"/> method.
 /// </summary>
 /// <param name="file">The File location to save the <see cref="Message"/> to. Existent files will be overwritten.</param>
 /// <exception cref="ArgumentNullException">If <paramref name="file"/> is <see langword="null"/></exception>
 /// <exception>Other exceptions relevant to file saving might be thrown as well</exception>
 public void SaveToFile(FileInfo file)
 {
 if (file == null)
 throw new ArgumentNullException("file");

(, RawMessage);
}

Let’s summarize and see how this bug occurs: I am not sensitive and vigilant enough to be about TCP and thread safety, and I perform performance tuning when I see the for loop, the test data is insufficient, and I accidentally touch lightning. Ultimately, the reason for the error is that the asynchronous scenarios are not properly selected for thread safety. There are many improper use, and the typical one is the misuse of database connections. I have read an article about misuse of database connection objects, such as this oneDetailed explanation of why you need to close the database connection and whether you can not close it》, I also summarized it at that time, so I have a lot of impression. Now I still have to talk about it. The use of a Pop3Client or SqlConnection may not be suitable for using multi-threading, especially when intensive communication with the server. Even if the multi-threading technology is used correctly, the performance may not be improved.

Some Library or .NET clients we often use, such as FastDFS, Memcached, RabbitMQ, Redis, MongDB, Zookeeper, etc., all need to access the network and server communication and parse protocols, analyze the source code of several clients, remember that FastDFS, Memcached and Redis clients have a Pool implementation, and I remember that they have no thread safety risks. According to personal experience, you must maintain awe when using them. Perhaps the language and library programming experience you use is very friendly. The API instructions are easy to understand and it looks easy to call, but it is not all that easy to use. It is best to quickly understand the source code and understand the general implementation ideas. Otherwise, if you are not familiar with the internal implementation principles, you may fall into a trap without knowing it. When we refactor or tune the use of multithreading technology, we must never ignore a profound problem, which is to be clear about scenarios suitable for asynchronous processing. Just like knowing that it is suitable for cache scenarios, I even think that understanding this is more important than how to write code. Also, refactoring or tuning must be cautious, and the data that test depends on must be fully prepared. This has been proven many times in actual work, which left a particularly deep impression on me. Many business systems can run well when the data volume is not large, but in an environment with high concurrent data volume, it is easy to have various inexplicable problems. For example, when testing multi-threaded asynchronous acquisition and deletion of emails, there are only one or two emails with very small attachments on the mail server. They run normally through asynchronous acquisition and deletion, and there are no exception logs. However, once there are too many data, abnormal logs will appear, troubleshoot, debug, read the source code, and then troubleshoot... This article will be published.