This article describes the variable sharing caused by C#’s vigilance against anonymous methods. Share it for your reference, as follows:
Anonymous method
Anonymous methods are advanced features introduced in .NET 2.0. The word "anonymous" means that it can write the implementation inline in a method to form a delegate object without having a clear method name, for example:
static void Test() { Action<string> action = delegate(string value) { (value); }; action("Hello World"); }
But the key to the anonymity method is not only the word "anonymity". Its most powerful feature is that the anonymous method forms a closure, which can be passed as a parameter to another method, but can also access local variables of the method and other members of the current class. For example:
class TestClass { private void Print(string message) { (message); } public void Test() { string[] messages = new string[] { "Hello", "World" }; int index = 0; Action<string> action = (m) => { ((index++) + ". " + m); }; (messages, action); ("index = " + index); } }
As shown above, in the Test method of TestClass, the action delegate calls the private method Print, which is also in the TestClass class, and reads and writes the local variable index in the Test method. With the new features of Lambda expressions in C# 3.0, the use of anonymous methods has been greatly promoted. However, if used improperly, anonymous methods can easily cause difficult-to-discover problems.
Problem cases
A brother recently worked in a simple data import program, whose main job is to read data from text files, analyze and reorganize, and then write to the database. The logic is roughly as follows:
static void Process() { List<Item> batchItems = new List<Item>(); foreach (var item in ...) { (item); if ( > 1000) { DataContext db = new DataContext(); (batchItems); (); batchItems = new List<Item>(); } } }
After reading data from the data source, it is added to the batchItems list and submits it once when the batchItems are 1,000. This code function works normally, but unfortunately time is stuck on the database submission. Data is acquired and processed very quickly, but it takes a long time to submit it once. So think about it, there will be no resource conflict between data submission and data processing, so put the data submission on another thread for processing! So, use ThreadPool to rewrite the code:
static void Process() { List<Item> batchItems = new List<Item>(); foreach (var item in ...) { (item); if ( > 1000) { ((o) => { DataContext db = new DataContext(); (batchItems); (); }); batchItems = new List<Item>(); } } }
Now, we hand over the data commit operation to ThreadPoll for execution, and when there are additional threads in the thread pool, the data commit operation will be initiated. The data submission operation will not block data processing, so according to the brother's intention, the data will be processed continuously, and in the end, just wait for all database submissions to be completed. The idea is very good, but unfortunately, it was found that when running the code (when not using multi-threading) would now throw an exception "inexplicably". What's even more strange is that the data in the database is missing: one million pieces of data were processed and "submitted", but a part of it is missing in the database. So I looked left and right at the code, and couldn't figure it out.
Do you see the cause of the problem?
Analysis of reasons
To find out what the problem is, we must understand how anonymous methods are implemented in .NET environments.
.NET has no "anonymous method" and no similar new features. "Anonymous method" is a magic that is completely imposed by the compiler. It will include all members that need to be accessed in the anonymous method together in the closure, ensuring that all member calls comply with the .NET standard. For example, the second example in the first section of the article actually becomes the following after being processed by the compiler (natural field names have been "friendly"):
class TestClass { ... private sealed class AutoGeneratedHelperClass { public TestClass m_testClassInstance; public int m_index; public void Action(string m) { this.m_index++; this.m_testClassInstance.Print(m); } } public void TestAfterCompiled() { AutoGeneratedHelperClass helper = new AutoGeneratedHelperClass(); helper.m_testClassInstance = this; helper.m_index = 0; string[] messages = new string[] { "Hello", "World" }; Action<string> action = new Action<string>(); (messages, action); (helper.m_index); } }
From this we can see how the compiler implements a closure:
The compiler automatically generates a private internal auxiliary class and sets it to sealed. An instance of this class will become a closure object.
If an anonymous method requires access to a method's parameter or local variable, then the parameter or local variable will "upgrade" to the public Field field in the auxiliary class.
If anonymous methods require access to other methods in the class, the current instance of the class will be saved in the auxiliary class.
It is worth mentioning that under actual circumstances, all the above three theories may not be satisfied. In some particularly simple cases (such as no local variables and other methods are involved in anonymous methods), the compiler simply generates a static method to construct a delegate instance, because this allows better performance.
For the previous case, we will now rewrite it, so that we can "avoid" the use of anonymous objects and clearly show the cause of the problem:
private class AutoGeneratedClass { public List<Item> m_batchItems; public void WaitCallback(object o) { DataContext db = new DataContext(); (this.m_batchItems); (); } } static void Process() { var helper = new AutoGeneratedClass(); helper.m_batchItems = new List<Item>(); foreach (var item in ...) { helper.m_batchItems.Add(item); if (helper.m_batchItems.Count > 1000) { (); helper.m_batchItems = new List<Item>(); } } }
The compiler will automatically generate an AutoGeneratedClass class, and use an instance of this class in the Process method to replace the original batchItems local variable. Similarly, the delegate object handed over to ThreadPool has changed from anonymous methods to a public method of the AutoGeneratedClass instance. Therefore, each time the thread pool calls the WaitCallback method of the instance.
Now the problem should be clear at a glance, right? After each delegate is handed over to the thread pool, the thread pool will not be executed immediately, but will be retained for the appropriate time before proceeding. When the WaitCallback method is executed, it will read the object referenced by the Field field "current" of m_batchItems. At the same time, the Process method has "abandoned" the data we originally wanted to submit, which will cause the data submitted to the database to be lost. At the same time, in the process of preparing each batch of data, it is very likely that two data submissions will be initiated. When two threads submit the same batch of Items, the so-called "inexplicable" exception will be thrown.
Solve the problem
If you find the problem, it is natural to solve it:
private class WrapperClass { private List<Item> m_items; public WrapperClass(List<Item> items) { this.m_items = items; } public void WaitCallback(object o) { DataContext db = new DataContext(); (this.m_items); (); } } static void Process() { List<Item> batchItems = new List<Item>(); foreach (var item in ...) { (item); if ( > 1000) { ( new WrapperClass(batchItems).WaitCallback); batchItems = new List<Item>(); } } }
Here we explicitly prepare an encapsulation class to use it to retain the data we need to submit. When you submit, you use the retained data, so there will naturally be no "data sharing" that you shouldn't have, thus avoiding errors1.
Summarize
Anonymous methods are powerful, but they can also create some undetectable pitfalls. For delegates created using anonymous methods, if they are not executed synchronously immediately and local variables of the method are used, you need to be careful about them. Because at this time the "local variable" has actually been transformed from the compiler to the Field field on an instance of an automatic class, and this field will be shared by the current method and the delegate object. If you will also modify the shared "local variables" after creating the delegate object, please make sure that this is in line with your intention and will not cause problems.
Such problems also do not only occur in anonymous methods. If you use Lambda expressions to create an expression tree, which also uses a "local variable", the expression tree will also get the "current" value when parsing or executing, rather than the value when creating the expression tree.
This is also why inline writing in Java - anonymous classes - if you want to share "local variables" in a method, you must use the final keyword to modify the variable: in this way, this variable can only be assigned values when declared, avoiding the "weird problems" that may be caused by subsequent "modifications".
I hope this article will be helpful to everyone's C# programming.