scala - Add new column in DataFrame base on existing column -

i have csv file datetime column: "2011-05-02t04:52:09+00:00".

i using scala, file loaded spark dataframe , can use jodas time parse date:

val sqlcontext = new sqlcontext(sc) import sqlcontext.implicits._ val df = new sqlcontext(sc).load("com.databricks.spark.csv", map("path" -> "data.csv", "header" -> "true"))  val d = org.joda.time.format.datetimeformat.forpattern("yyyy-mm-dd't'kk:mm:ssz")

i create new columns base on datetime field timeserie analysis.

in dataframe, how create column base on value of column?

i notice dataframe has following function: df.withcolumn("dt",column), there way create column base on value of existing column?

thanks

import org.apache.spark.sql.types.datetype import org.apache.spark.sql.functions._ import org.joda.time.datetime import org.joda.time.format.datetimeformat  val d = datetimeformat.forpattern("yyyy-mm-dd't'kk:mm:ssz") val dtfunc: (string => date) = (arg1: string) => datetime.parse(arg1, d).todate val x = df.withcolumn("dt", calludf(dtfunc, datetype, col("dt_string")))

the calludf, col included in functions import show

the dt_string inside col("dt_string") origin column name of df, want transform from.

alternatively, replace last statement with:

val dtfunc2 = udf(dtfunc) val x = df.withcolumn("dt", dtfunc2(col("dt_string")))

Autos

Search This Blog

scala - Add new column in DataFrame base on existing column -