this question has answer here:
- csv parsing options .net [closed] 3 answers
i trying parse number of csv files has double quotes , commas within fields. have no control on format of csvs , instead of using "" escape quotes using \". files extremely large reading , using regex isn't best option me.
i prefer use existing library , notewrite entirely new parser. using csvhelper
this example of csv data:
"id","name","notes" "40","continue","if message \"continue\" not appear restart, , notify instructor." "41","restart","if message \"restart\" not appear after 10 seconds, restart manually."
the problem double quotes aren't being escaped , , being read delimiter , separating notes field 2 separate fields.
this current code doesn't work.
datatable csvdata = new datatable(); string csvfilepath = @"c:\users\" + csvfilename + ".csv"; try { fileinfo file = new fileinfo(csvfilepath); using (textreader reader = file.opentext()) using (csvreader csv = new csvreader(reader)) { csv.configuration.delimiter = ","; csv.configuration.hasheaderrecord = true; csv.configuration.ignorequotes = false; csv.configuration.trimfields = true; csv.configuration.willthrowonmissingfield = false; string[] colfields = null; while(csv.read()) { if (colfields == null) { colfields = csv.fieldheaders; foreach (string column in colfields) { datacolumn datacolumn = new datacolumn(column); datacolumn.allowdbnull = true; csvdata.columns.add(datacolumn); } } string[] fielddata = csv.currentrecord; (int = 0; < fielddata.length; i++) { if (fielddata[i] == "") { fielddata[i] = null; } } csvdata.rows.add(fielddata); } } }
is there existing library lets specify how escape quotes or should write own parser?
you can quite far when using simple linq statement split
, trim
, replace
unescaping quotes in content:
datatable csvdata = new datatable(); string csvfilepath = @"c:\users\" + csvfilename + ".csv"; try { string[] seps = { "\",", ",\"" }; char[] quotes = { '\"', ' ' }; string[] colfields = null; foreach (var line in file.readlines(csvfilepath)) { var fields = line .split(seps, stringsplitoptions.none) .select(s => s.trim(quotes).replace("\\\"", "\"")) .toarray(); if (colfields == null) { colfields = fields; foreach (string column in colfields) { datacolumn datacolumn = new datacolumn(column); datacolumn.allowdbnull = true; csvdata.columns.add(datacolumn); } } else { (int = 0; < fields.length; i++) { if (fields[i] == "") { fields[i] = null; } } csvdata.rows.add(fields); } } }
when used in simple console app, , ops original input in "test.txt" file:
public static void csvunescapesplit() { string[] seps = { "\",", ",\"" }; char[] quotes = { '\"', ' ' }; foreach (var line in file.readlines(@"c:\temp\test.txt")) { var fields = line .split(seps, stringsplitoptions.none) .select(s => s.trim(quotes).replace("\\\"", "\"")) .toarray(); foreach (var field in fields) console.write("{0} | ", field); console.writeline(); } }
this produces following (correct) output:
id | name | notes | 40 | continue | if message "continue" not appear restart, , notify instructor. | 41 | | if message "restart" not appear after 10 seconds, manually restart. |
caveat: if field separators have spaces, these:
"40" , "continue" , "if message \"continue\" not appear restart, , notify instructor."
or content strings contain commas directly after quote, here (after "restart"):
"41","help","if message \"restart\", not appear after 10 seconds, manually restart."
it fail.