i have csv file datetime column: "2011-05-02t04:52:09+00:00".
i using scala, file loaded spark dataframe , can use jodas time parse date:
val sqlcontext = new sqlcontext(sc) import sqlcontext.implicits._ val df = new sqlcontext(sc).load("com.databricks.spark.csv", map("path" -> "data.csv", "header" -> "true")) val d = org.joda.time.format.datetimeformat.forpattern("yyyy-mm-dd't'kk:mm:ssz")
i create new columns base on datetime field timeserie analysis.
in dataframe, how create column base on value of column?
i notice dataframe has following function: df.withcolumn("dt",column), there way create column base on value of existing column?
thanks
import org.apache.spark.sql.types.datetype import org.apache.spark.sql.functions._ import org.joda.time.datetime import org.joda.time.format.datetimeformat val d = datetimeformat.forpattern("yyyy-mm-dd't'kk:mm:ssz") val dtfunc: (string => date) = (arg1: string) => datetime.parse(arg1, d).todate val x = df.withcolumn("dt", calludf(dtfunc, datetype, col("dt_string")))
the calludf
, col
included in functions
import
show
the dt_string
inside col("dt_string")
origin column name of df, want transform from.
alternatively, replace last statement with:
val dtfunc2 = udf(dtfunc) val x = df.withcolumn("dt", dtfunc2(col("dt_string")))