+---+-------------------+
| id| date|
+---+-------------------+
| 1|2020-11-28 20:01:02|
| 2|2020-11-29 21:03:04|
| 3|2020-11-30 22:05:06|
+---+-------------------+
こちらのdateカラムのデータをカラムごとに分割してみます。
from pyspark.sql.functions import col, date_format, year, month, dayofmonth, hour, minute, second, dayofweek
df = spark.createDataFrame([
(1, "2020-11-28 20:01:02"),
(2, "2020-11-29 21:03:04"),
(3, "2020-11-30 22:05:06")
],
["id", "date"])
df = df.withColumn("year", year(col("date")))
df = df.withColumn("month", month(col("date")))
df = df.withColumn("day", dayofmonth(col("date")))
df = df.withColumn("hour", hour(col("date")))
df = df.withColumn("minute", minute(col("date")))
df = df.withColumn("second", second(col("date")))
df = df.withColumn("dayofweek", dayofweek(col("date")))
df.show()
dayofweekは日曜を1とした曜日の数字が入ります。
+---+-------------------+----+-----+---+----+------+------+---------+
| id| date|year|month|day|hour|minute|second|dayofweek|
+---+-------------------+----+-----+---+----+------+------+---------+
| 1|2020-11-28 20:01:02|2020| 11| 28| 20| 1| 2| 7|
| 2|2020-11-29 21:03:04|2020| 11| 29| 21| 3| 4| 1|
| 3|2020-11-30 22:05:06|2020| 11| 30| 22| 5| 6| 2|
+---+-------------------+----+-----+---+----+------+------+---------+