Skip to content Skip to sidebar Skip to footer

Can I Change The Nullability Of A Column In My Spark Dataframe?

I have a StructField in a dataframe that is not nullable. Simple example: import pyspark.sql.functions as F from pyspark.sql.types import * l = [('Alice', 1)] df = sqlContext.creat

Solution 1:

I know this question is already answered, but I was looking for a more generic solution when I came up with this:

def set_df_columns_nullable(spark, df, column_list, nullable=True):
    for struct_field in df.schema:
        if struct_field.name in column_list:
            struct_field.nullable = nullable
    df_mod = spark.createDataFrame(df.rdd, df.schema)
    return df_mod

You can then call it like this:

set_df_columns_nullable(spark,df,['name','age'])

Solution 2:

For the general case, one can change the nullability of a column via the nullable property of the StructField of that specific column. Here's an example:

df.schema['col_1']
# StructField(col_1,DoubleType,false)

df.schema['col_1'].nullable = True

df.schema['col_1']
# StructField(col_1,DoubleType,true)

Solution 3:

Seems you missed the StructType(newSchema).

l = [('Alice', 1)]
df = sqlContext.createDataFrame(l, ['name', 'age'])
df = df.withColumn('foo', F.when(df['name'].isNull(),False).otherwise(True))
df.schema.fields
newSchema = [StructField('name',StringType(),True), StructField('age',LongType(),True),StructField('foo',BooleanType(),False)]
df2 = sqlContext.createDataFrame(df.rdd, StructType(newSchema))
df2.show()

Solution 4:

df1 = df.rdd.toDF()
df1.printSchema()

Output:

root
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)
 |-- foo: boolean (nullable = true)

Post a Comment for "Can I Change The Nullability Of A Column In My Spark Dataframe?"