Three ways to read Excel and comparative analysis of C#

(1) OleDB method

Advantages: Use Excel directly as a data source, and directly read content through SQL, which has a faster reading speed.

Disadvantages: The method of reading data is not flexible enough to directly read a certain cell. Only after reading the entire Sheet page (the result is Datatable) can you get the specified value in the Datatable according to the number of rows and columns.

When the amount of Excel data is large. It will consume a lot of memory, and when there is insufficient memory, a memory overflow exception will be thrown.

The code is read as follows:

 public DataTable GetExcelTableByOleDB(string strExcelPath, string tableName)
 {
   try
   {
     DataTable dtExcel = new DataTable();
     //Data table     DataSet ds = new DataSet();
     //Get file extension     string strExtension = (strExcelPath);
     string strFileName = (strExcelPath);
     //Excel connection     OleDbConnection objConn = null;
     switch (strExtension)
     {
       case ".xls":
         objConn = new OleDbConnection("Provider=.4.0;Data Source=" + strExcelPath + ";" + "Extended Properties=\"Excel 8.0;HDR=NO;IMEX=1;\"");
         break;
       case ".xlsx":
         objConn = new OleDbConnection("Provider=.12.0;Data Source=" + strExcelPath + ";" + "Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1;\"");
         break;
       default:
         objConn = null;
         break;
     }
     if (objConn == null)
     {
       return null;
     }
     ();
     //Get information about all Sheet tables in Excel     // schemaTable = (, null);
     //Get the first Sheet table name of Excel     //string tableName = [0][2].ToString().Trim();
     string strSql = "select * from [" + tableName + "]";
     //Get information in the specified Sheet table in Excel     OleDbCommand objCmd = new OleDbCommand(strSql, objConn);
     OleDbDataAdapter myData = new OleDbDataAdapter(strSql, objConn);
     (ds, tableName);//Fill in data     ();
     //dtExcel is the information stored in the specified table in the excel file     dtExcel = [tableName];
     return dtExcel;
   }
   catch
   {
     return null;
   }
 }

The following is a description of the connection string

HDR=Yes, which means that the first row is the title and is not used as data (but in my actual use, if there are complex values in the first row, the read Datatable column title will be automatically set to F1, F2, etc., which is inconsistent with the actual application. So at that time, all content was read into the Datatable through HDR=No, and then manually set the first row to the title); IMEX ( IMport EXport mode ) setting
IMEX has three modes:
0 is Export mode
1 is Import mode
2 is Linked mode (full update capabilities)
What I want to explain here is the IMEX parameters, because different modes represent different read and write behaviors:
When IMEX=0 is "export mode", the Excel file enabled in this mode can only be used for "write" purposes.
When IMEX=1 is "Incoming Mode", the Excel file enabled in this mode can only be used for "reading".
When IMEX=2 is "link mode", the Excel file enabled in this mode can support both "read" and "write" purposes.

---------------------------------

In addition, when reading Excel 2007 version files, the version should be changed from 8.0 to 12.0. At the same time, the driver cannot use Jet, but should use ACE. Responsible for errors that will cause "installable ISAM not found".

---------------------------------

It is also found on the Internet that in this way there are more Sheet tables retrieved than the Sheet tables in the actual Excel tables, and there are two reasons:

1. The name taken out includes the name in the XL Naming Manager (see the formula of XL2007 - Naming Manager, shortcut key Crtl+F3);

2. The extracted name includes the FilterDatabase suffix, which is used by XL to record the Filter range.

The first point is simple, just delete the content in the named manager; the second point is more troublesome to process. After Filter is deleted, these names are still retained. The simple way is to add a new Sheet and then copy the original Sheet. But the actual situation cannot do the above checks for each Excel. The filtering scheme is given below. (We have verified this question, please verify it yourself)

 //objConn is a link to read Excel. The following filters to obtain a valid collection of Sheet page names   schemaTable = (, null);
  List&lt;string&gt; lstSheetNames = new List&lt;string&gt;();
  for (int i = 0; i &lt; ; i++)
  {
    string strSheetName = (string)[i]["TABLE_NAME"];
    if (("$") &amp;&amp; !("'", "").EndsWith("$"))
    {
      //The invalid filtering SheetName has been completed...      continue;
    }
    if (lstSheetNames != null &amp;&amp; !(strSheetName))
      (strSheetName);
  }

Because the SheetName is invalid, the last character will not be $. If SheetName has some special symbols, the read SheetName will automatically be added with single quotes. For example, edit the SheetName into MySheet(1) in Excel, and the SheetName read at this time is: 'MySheet(1)$', so it is best to filter single quotes before judging whether the last character is $.

---------------------------------

(2) The way of Com components (implemented by adding references)

Advantages: It can read data in Excel very flexibly, and users can flexibly call various functions for processing.

Disadvantages: Cell-based processing, the reading speed is slow, and it is best not to use this method to read files with large data volumes.

You need to add a corresponding DLL reference, and this reference must exist to be used. If the Web site is deployed on IIS, the server machine needs to have Excel installed, and sometimes IIS permissions are also required.

The code is read as follows:

 private Stopwatch wath = new Stopwatch();
 /// &lt;summary&gt;
 /// Read Excel using COM /// &lt;/summary&gt;
 /// <param name="excelFilePath">Path</param> /// &lt;returns&gt;DataTabel&lt;/returns&gt;
 public  GetExcelData(string excelFilePath)
 {
    app = new ();
    sheets;
    workbook = null;
   object oMissiong = ;
    dt = new ();
   ();
   try
   {
     if (app == null)
     {
       return null;
     }
     workbook = (excelFilePath, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, 
       oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong);
     //Read the data into the DataTable - Start     sheets = ;
      worksheet = ()sheets.get_Item(1);//Read the first table     if (worksheet == null)
       return null;
     string cellContent;
     int iRowCount = ;
     int iColCount = ;
      range;
     //Responsible for the Start     DataColumn dc;
     int ColumnID = 1;
     range = ()[1, 1];
     while (().Trim() != "")
     {
       dc = new DataColumn();
        = ("");
        = ().Trim();
       (dc);
 
       range = ()[1, ++ColumnID];
     }
     //End
     for (int iRow = 2; iRow &lt;= iRowCount; iRow++)
     {
       DataRow dr = ();
       for (int iCol = 1; iCol &lt;= iColCount; iCol++)
       {
         range = ()[iRow, iCol];
         cellContent = (range.Value2 == null) ? "" : ();
           dr[iCol - 1] = cellContent;
       }
       (dr);
     }
     ();
     TimeSpan ts = ;
     //Read the data into the DataTable—End     return dt;
   }
   catch
   {
     return null;
   }
   finally
   {
     (false, oMissiong, oMissiong);
     (workbook);
     workbook = null;
     ();
     ();
     (app);
     app = null;
     ();
     ();
   }
 }
 /// &lt;summary&gt;
 /// Using COM, multi-threading Excel (1 main thread, 4 secondary threads) /// &lt;/summary&gt;
 /// <param name="excelFilePath">Path</param> /// &lt;returns&gt;DataTabel&lt;/returns&gt;
 public  ThreadReadExcel(string excelFilePath)
 {
    app = new ();
    sheets = null;
    workbook = null;
   object oMissiong = ;
    dt = new ();
   ();
   try
   {
     if (app == null)
     {
       return null;
     }
     workbook = (excelFilePath, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, 
       oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong, oMissiong);
     //Read the data into the DataTable - Start     sheets = ;
      worksheet = ()sheets.get_Item(1);//Read the first table     if (worksheet == null)
       return null;
     string cellContent;
     int iRowCount = ;
     int iColCount = ;
      range;
     //Responsible for the Start     DataColumn dc;
     int ColumnID = 1;
     range = ()[1, 1];
     while (iColCount &gt;= ColumnID)
     {
       dc = new DataColumn();
        = ("");
       string strNewColumnName = ().Trim();
       if ( == 0) strNewColumnName = "_1";
       //Judge whether the column name is duplicate       for (int i = 1; i &lt; ColumnID; i++)
       {
         if ([i - 1].ColumnName == strNewColumnName)
           strNewColumnName = strNewColumnName + "_1";
       }
        = strNewColumnName;
       (dc);
       range = ()[1, ++ColumnID];
     }
     //End
     //The data is greater than 500, and multiple processes are used to read data     if (iRowCount - 1 &gt; 500)
     {
       //Start multi-threading data reading       // Create a new thread       int b2 = (iRowCount - 1) / 10;
       DataTable dt1 = new DataTable("dt1");
       dt1 = ();
       SheetOptions sheet1thread = new SheetOptions(worksheet, iColCount, 2, b2 + 1, dt1);
       Thread othread1 = new Thread(new ThreadStart());
       ();
       //Block for 1 millisecond, ensure that the first read dt1 is read       (1);
       DataTable dt2 = new DataTable("dt2");
       dt2 = ();
       SheetOptions sheet2thread = new SheetOptions(worksheet, iColCount, b2 + 2, b2 * 2 + 1, dt2);
       Thread othread2 = new Thread(new ThreadStart());
       ();
       DataTable dt3 = new DataTable("dt3");
       dt3 = ();
       SheetOptions sheet3thread = new SheetOptions(worksheet, iColCount, b2 * 2 + 2, b2 * 3 + 1, dt3);
       Thread othread3 = new Thread(new ThreadStart());
       ();
       DataTable dt4 = new DataTable("dt4");
       dt4 = ();
       SheetOptions sheet4thread = new SheetOptions(worksheet, iColCount, b2 * 3 + 2, b2 * 4 + 1, dt4);
       Thread othread4 = new Thread(new ThreadStart());
       ();
       //The main thread reads the remaining data       for (int iRow = b2 * 4 + 2; iRow &lt;= iRowCount; iRow++)
       {
         DataRow dr = ();
         for (int iCol = 1; iCol &lt;= iColCount; iCol++)
         {
           range = ()[iRow, iCol];
           cellContent = (range.Value2 == null) ? "" : ();
           dr[iCol - 1] = cellContent;
         }
         (dr);
       }
       ();
       ();
       ();
       ();
       //Add data read by multiple threads to the following dt1       foreach (DataRow dr in )
         ();
       ();
       ();
       foreach (DataRow dr in )
         ();
       ();
       ();
       foreach (DataRow dr in )
         ();
       ();
       ();
       foreach (DataRow dr in )
         ();
       ();
       ();
       return dt1;
     }
     else
     {
       for (int iRow = 2; iRow &lt;= iRowCount; iRow++)
       {
         DataRow dr = ();
         for (int iCol = 1; iCol &lt;= iColCount; iCol++)
         {
           range = ()[iRow, iCol];
           cellContent = (range.Value2 == null) ? "" : ();
           dr[iCol - 1] = cellContent;
         }
         (dr);
       }
     }
     ();
     TimeSpan ts = ;
     //Read the data into the DataTable—End     return dt;
   }
   catch
   {
     return null;
   }
   finally
   {
     (false, oMissiong, oMissiong);
     (workbook);
     (sheets);
     workbook = null;
     ();
     ();
     (app);
     app = null;
     ();
     ();
   }
 }

(3) Read Excel in NPOI (this method has not been tested)

NPOI is the .NET version of the POI project. POI is an open source Java project to read and write Microsoft OLE2 component documents such as Excel and WORD. Using NPOI, you can read and write WORD/EXCEL documents on machines without Office or the corresponding environment.

Advantages: fast reading Excel, flexible operation of reading mode

Disadvantages: You need to download the corresponding plug-in and add it to the system reference.

 /// &lt;summary&gt;
 /// Import data from excel into DataTable /// &lt;/summary&gt;
 /// <param name="sheetName">The name of the excel worksheet</param> /// <param name="isFirstRowColumn">Is the first row the column name of the DataTable</param> /// <returns>Returned DataTable</returns> public DataTable ExcelToDataTable(string sheetName, bool isFirstRowColumn)
 {
   ISheet sheet = null;
   DataTable data = new DataTable();
   int startRow = 0;
   try
   {
     fs = new FileStream(fileName, , );
     if ((".xlsx") &gt; 0) // 2007 version       workbook = new XSSFWorkbook(fs);
     else if ((".xls") &gt; 0) // 2003 version       workbook = new HSSFWorkbook(fs);
     if (sheetName != null)
     {
       sheet = (sheetName);
     }
     else
     {
       sheet = (0);
     }
     if (sheet != null)
     {
       IRow firstRow = (0);
       int cellCount = ; //The number of the last cell in a row is the total number of columns       if (isFirstRowColumn)
       {
         for (int i = ; i &lt; cellCount; ++i)
         {
           DataColumn column = new DataColumn((i).StringCellValue);
           (column);
         }
         startRow =  + 1;
       }
       else
       {
         startRow = ;
       }
       //The label of the last column       int rowCount = ;
       for (int i = startRow; i &lt;= rowCount; ++i)
       {
         IRow row = (i);
         if (row == null) continue; //The default is null for rows without data.         
         DataRow dataRow = ();
         for (int j = ; j &lt; cellCount; ++j)
         {
           if ((j) != null) // Similarly, cells without data are null by default             dataRow[j] = (j).ToString();
         }
         (dataRow);
       }
     }
     return data;
   }
   catch (Exception ex)
   {
     ("Exception: " + );
     return null;
   }
 }

Below are some related articles, you can refer to