.net - C# CSV parsing escaping double quotes -


this question has answer here:

i trying parse number of csv files has double quotes , commas within fields. have no control on format of csvs , instead of using "" escape quotes using \". files extremely large reading , using regex isn't best option me.

i prefer use existing library , notewrite entirely new parser. using csvhelper

this example of csv data:

"id","name","notes" "40","continue","if message \"continue\" not appear restart, , notify instructor." "41","restart","if message \"restart\" not appear after 10 seconds, restart manually."

the problem double quotes aren't being escaped , , being read delimiter , separating notes field 2 separate fields.

this current code doesn't work.

datatable csvdata = new datatable(); string csvfilepath = @"c:\users\" + csvfilename + ".csv";  try {     fileinfo file = new fileinfo(csvfilepath);     using (textreader reader = file.opentext())     using (csvreader csv = new csvreader(reader))     {         csv.configuration.delimiter = ",";         csv.configuration.hasheaderrecord = true;         csv.configuration.ignorequotes = false;          csv.configuration.trimfields = true;          csv.configuration.willthrowonmissingfield = false;         string[] colfields = null;         while(csv.read())         {             if (colfields == null)             {                 colfields = csv.fieldheaders;                 foreach (string column in colfields)                 {                     datacolumn datacolumn = new datacolumn(column);                     datacolumn.allowdbnull = true;                     csvdata.columns.add(datacolumn);                 }             }             string[] fielddata = csv.currentrecord;              (int = 0; < fielddata.length; i++)             {                 if (fielddata[i] == "")                 {                     fielddata[i] = null;                 }             }             csvdata.rows.add(fielddata);          }     } } 

is there existing library lets specify how escape quotes or should write own parser?

you can quite far when using simple linq statement split , trim , replace unescaping quotes in content:

datatable csvdata = new datatable(); string csvfilepath = @"c:\users\" + csvfilename + ".csv"; try {     string[] seps = { "\",", ",\"" };     char[] quotes = { '\"', ' ' };     string[] colfields = null;     foreach (var line in file.readlines(csvfilepath))     {         var fields = line             .split(seps, stringsplitoptions.none)             .select(s => s.trim(quotes).replace("\\\"", "\""))             .toarray();          if (colfields == null)         {             colfields = fields;             foreach (string column in colfields)             {                 datacolumn datacolumn = new datacolumn(column);                 datacolumn.allowdbnull = true;                 csvdata.columns.add(datacolumn);             }         }         else         {             (int = 0; < fields.length; i++)             {                 if (fields[i] == "")                 {                     fields[i] = null;                 }             }             csvdata.rows.add(fields);          }     } } 

when used in simple console app, , ops original input in "test.txt" file:

public static void csvunescapesplit() {     string[] seps = { "\",", ",\"" };     char[] quotes = { '\"', ' ' };     foreach (var line in file.readlines(@"c:\temp\test.txt"))     {         var fields = line             .split(seps, stringsplitoptions.none)             .select(s => s.trim(quotes).replace("\\\"", "\""))             .toarray();         foreach (var field in fields)             console.write("{0} | ", field);         console.writeline();     } } 

this produces following (correct) output:

id | name | notes | 40 | continue | if message "continue" not appear restart, , notify instructor. | 41 | | if message "restart" not appear after 10 seconds, manually restart. | 

caveat: if field separators have spaces, these:

"40" , "continue" , "if message \"continue\" not appear restart, , notify instructor." 

or content strings contain commas directly after quote, here (after "restart"):

"41","help","if message \"restart\", not appear after 10 seconds, manually restart." 

it fail.