sparkbyexamples.com Open in urlscan Pro
2a06:98c1:3120::3 Public Scan

Back to summary

URL:
https://sparkbyexamples.com/pyspark/pyspark-withcolumn/
Submission: On November 14 via manual (November 14th 2022, 2:47:06 pm UTC) from LU — Scanned from NL

Form analysis
4 forms found in the DOM

GET https://sparkbyexamples.com/

<form role="search" method="get" class="searchform" action="https://sparkbyexamples.com/" data-hs-cf-bound="true"> <label for="ocean-search-form-1"> <span class="screen-reader-text">Search this website</span> <input type="search"
      id="ocean-search-form-1" class="field" autocomplete="off" placeholder="Search" name="s"> <input type="hidden" name="post_type" value="post"> </label></form>

GET https://sparkbyexamples.com/

<form aria-label="Search this website" method="get" action="https://sparkbyexamples.com/" class="mobile-searchform" data-hs-cf-bound="true"> <input aria-label="Insert search query" value="" class="field" id="ocean-mobile-search-2" type="search"
    name="s" autocomplete="off" placeholder="Search"> <button aria-label="Submit search" type="submit" class="searchform-submit"> <i class=" icon-magnifier" aria-hidden="true" role="img"></i> </button> <input type="hidden" name="post_type"
    value="post"></form>

POST https://sparkbyexamples.com/wp-comments-post.php

<form action="https://sparkbyexamples.com/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate="" data-hs-cf-bound="true">
  <div class="comment-textarea"><label for="comment" class="screen-reader-text">Comment</label><textarea name="comment" id="comment" cols="39" rows="4" tabindex="0" class="textarea-comment" placeholder="Your comment here..."></textarea></div>
  <div class="comment-form-author"><label for="author" class="screen-reader-text">Enter your name or username to comment</label><input type="text" name="author" id="author" value="" placeholder="Name" size="22" tabindex="0" class="input-name"></div>
  <div class="comment-form-email"><label for="email" class="screen-reader-text">Enter your email address to comment</label><input type="text" name="email" id="email" value="" placeholder="Email" size="22" tabindex="0" class="input-email"></div>
  <div class="comment-form-url"><label for="url" class="screen-reader-text">Enter your website URL (optional)</label><input type="text" name="url" id="url" value="" placeholder="Website" size="22" tabindex="0" class="input-website"></div>
  <p class="form-submit"><input name="submit" type="submit" id="comment-submit" class="submit" value="Post Comment"> <input type="hidden" name="comment_post_ID" value="7134" id="comment_post_ID"> <input type="hidden" name="comment_parent"
      id="comment_parent" value="0"></p>
  <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="c677a33927"></p>
  <p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="1668437218812">
    <script type="text/javascript">
      document.getElementById("ak_js_1").setAttribute("value", (new Date()).getTime());
    </script>
  </p>
</form>

POST

<form method="post" data-hs-cf-bound="true"> <input type="submit" value="Close and accept" class="accept"></form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised ads and content, ad and content measurement,
and audience insights, as well as to develop and improve products.

With your permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse
consenting. Please note that some processing of your personal data may not
require your consent, but you have a right to object to such processing. Your
preferences will apply to this website only. You can change your preferences at
any time by returning to this site or visit our privacy policy.

MORE OPTIONSAGREE
Skip to content
 * Home
 * About
 * Write For US

|       { One stop for all Spark Examples }
 * 
 * 
 * 
 * 
 * 
 * 

 * Spark
   * Spark RDD
   * Spark DataFrame
   * Spark SQL Functions
   * What’s New in Spark 3.0?
   * Spark Streaming
   * Apache Spark Interview Questions
 * PySpark
 * Pandas
 * R
   * R Programming
   * R Data Frame
   * R Vector
   * R dplyr Tutorial
 * Snowflake
 * Hive
 * Interview Q
   * Spark Interview Questions
 * More
   * KafkaApache Kafka Tutorials with Examples
   * NumPy
   * H2O.ai
   * Apache Hadoop
   * Apache HBase
   * Apache Cassandra
   * H2O Sparkling Water
   * Scala Language
 * 

Search this website
Menu Close
 * Spark
   * Spark RDD
   * Spark DataFrame
   * Spark SQL Functions
   * What’s New in Spark 3.0?
   * Spark Streaming
   * Apache Spark Interview Questions
 * PySpark
 * Pandas
 * R
   * R Programming
   * R Data Frame
   * R Vector
   * R dplyr Tutorial
 * Snowflake
 * Hive
 * Interview Q
   * Spark Interview Questions
 * More
   * Kafka
   * NumPy
   * H2O.ai
   * Apache Hadoop
   * Apache HBase
   * Apache Cassandra
   * H2O Sparkling Water
   * Scala Language
 * 

 * Home
 * About
 * Write For US




PYSPARK WITHCOLUMN() USAGE WITH EXAMPLES

 * Post author:NNK
 * Post category:PySpark
 * Post last modified:August 6, 2022

PySpark withColumn() is a transformation function of DataFrame which is used to
change the value, convert the datatype of an existing column, create a new
column, and many more. In this post, I will walk you through commonly used
PySpark DataFrame column operations using withColumn() examples.

 * PySpark withColumn – To change column DataType
 * Transform/change value of an existing column
 * Derive new column from an existing column
 * Add a column with the literal value
 * Rename column name
 * Drop DataFrame column

First, let’s create a DataFrame to work with.




data = [('James','','Smith','1991-04-01','M',3000),
  ('Michael','Rose','','2000-05-19','M',4000),
  ('Robert','','Williams','1978-09-05','M',4000),
  ('Maria','Anne','Jones','1967-12-01','F',4000),
  ('Jen','Mary','Brown','1980-02-17','F',-1)
]

columns = ["firstname","middlename","lastname","dob","gender","salary"]
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
df = spark.createDataFrame(data=data, schema = columns)


Copy


1. CHANGE DATATYPE USING PYSPARK WITHCOLUMN()

By using PySpark withColumn() on a DataFrame, we can cast or change the data
type of a column. In order to change data type, you would also need to use
cast() function along with withColumn(). The below statement changes the
datatype from String to Integer for the salary column.


df.withColumn("salary",col("salary").cast("Integer")).show()


Copy


2. UPDATE THE VALUE OF AN EXISTING COLUMN

PySpark withColumn() function of DataFrame can also be used to change the value
of an existing column. In order to change the value, pass an existing column
name as a first argument and a value to be assigned as a second argument to the
withColumn() function. Note that the second argument should be Column type .
Also, see Different Ways to Update PySpark DataFrame Column.




df.withColumn("salary",col("salary")*100).show()


Copy

This snippet multiplies the value of “salary” with 100 and updates the value
back to “salary” column.


3. CREATE A COLUMN FROM AN EXISTING

To add/create a new column, specify the first argument with a name you want your
new column to be and use the second argument to assign a value by applying an
operation on an existing column. Also, see Different Ways to Add New Column to
PySpark DataFrame.


df.withColumn("CopiedColumn",col("salary")* -1).show()


Copy

This snippet creates a new column “CopiedColumn” by multiplying “salary” column
with value -1.




4. ADD A NEW COLUMN USING WITHCOLUMN()

In order to create a new column, pass the column name you wanted to the first
argument of withColumn() transformation function. Make sure this new column not
already present on DataFrame, if it presents it updates the value of that
column.

On below snippet, PySpark lit() function is used to add a constant value to a
DataFrame column. We can also chain in order to add multiple columns.


df.withColumn("Country", lit("USA")).show()
df.withColumn("Country", lit("USA")) \
  .withColumn("anotherColumn",lit("anotherValue")) \
  .show()


Copy


5. RENAME COLUMN NAME

Though you cannot rename a column using withColumn, still I wanted to cover this
as renaming is one of the common operations we perform on DataFrame. To rename
an existing column use withColumnRenamed() function on DataFrame.




df.withColumnRenamed("gender","sex") \
  .show(truncate=False) 


Copy


6. DROP COLUMN FROM PYSPARK DATAFRAME

Use “drop” function to drop a specific column from the DataFrame.


df.drop("salary") \
  .show() 


Copy

Note: Note that all of these functions return the new DataFrame after applying
the functions instead of updating DataFrame.


7. PYSPARK WITHCOLUMN() COMPLETE EXAMPLE


import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit
from pyspark.sql.types import StructType, StructField, StringType,IntegerType

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

data = [('James','','Smith','1991-04-01','M',3000),
  ('Michael','Rose','','2000-05-19','M',4000),
  ('Robert','','Williams','1978-09-05','M',4000),
  ('Maria','Anne','Jones','1967-12-01','F',4000),
  ('Jen','Mary','Brown','1980-02-17','F',-1)
]

columns = ["firstname","middlename","lastname","dob","gender","salary"]
df = spark.createDataFrame(data=data, schema = columns)
df.printSchema()
df.show(truncate=False)

df2 = df.withColumn("salary",col("salary").cast("Integer"))
df2.printSchema()
df2.show(truncate=False)

df3 = df.withColumn("salary",col("salary")*100)
df3.printSchema()
df3.show(truncate=False) 

df4 = df.withColumn("CopiedColumn",col("salary")* -1)
df4.printSchema()

df5 = df.withColumn("Country", lit("USA"))
df5.printSchema()

df6 = df.withColumn("Country", lit("USA")) \
   .withColumn("anotherColumn",lit("anotherValue"))
df6.printSchema()

df.withColumnRenamed("gender","sex") \
  .show(truncate=False) 
  
df4.drop("CopiedColumn") \
.show(truncate=False) 


Copy

The complete code can be downloaded from PySpark withColumn GitHub project

Happy Learning !!

Share via:

 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * 
 * More


YOU MAY ALSO LIKE READING:

 1.  Spark Persistence Storage Levels
 2.  PySpark orderBy() and sort() explained
 3.  Spark Web UI – Understanding Spark Execution
 4.  PySpark Create DataFrame from List
 5.  PySpark flatMap() Transformation
 6.  PySpark Row using on DataFrame and RDD
 7.  Setup and run PySpark on Spyder IDE
 8.  PySpark Refer Column Name With Dot (.)
 9.  Spark Check String Column Has Numeric Values
 10. Install PySpark in Anaconda & Jupyter Notebook

Tags: withColumn, withColumnRenamed


NNK

SparkByExamples.com is a Big Data and Spark examples community page, all
examples are simple and easy to understand and well tested in our development
environment Read more ..


LEAVE A REPLY CANCEL REPLY

Comment
Enter your name or username to comment
Enter your email address to comment
Enter your website URL (optional)





Δ


THIS POST HAS 3 COMMENTS

 1. raghav 21 Dec 2020 Reply
    
    
    Hi
    df2 = df.withColumn(“salary”,col(“salary”).cast(“Integer”))
    df2.printSchema()
    
    I dont want to create a new dataframe if I am changing the datatype of
    existing dataframe.
    Is there a way I can change column datatype in existing dataframe without
    creating a new dataframe ?
    
    1. NNK 25 Dec 2020 Reply
       
       
       DataFrames are immutable hence you cannot change anything directly on it.
       every operation on DataFrame results in a new DataFrame. If you want to
       change the DataFrame, I would recommend using the Schema at the time of
       creating the DataFrame.
 2. Anonymous 16 Sep 2020 Reply
    
    
    Can you please explain Split column to multiple columns from Scala example
    into python




PYSPARK TUTORIAL

 * PySpark Tutorial For Beginners
 * PySpark – Features
 * PySpark – Advantages
 * PySpark – Modules & Packages
 * PySpark – Cluster Managers
 * PySpark – Install on Windows
 * PySpark – Install on Mac
 * PySpark – Web/Application UI
 * PySpark – SparkSession
 * PySpark – SparkContext
 * PySpark – RDD
 * PySpark – Parallelize
 * PySpark – repartition() vs coalesce()
 * PySpark – Broadcast Variables
 * PySpark – Accumulator

PYSPARK DATAFRAME

 * PySpark – Create a DataFrame
 * PySpark – Create an empty DataFrame
 * PySpark – Convert RDD to DataFrame
 * PySpark – Convert DataFrame to Pandas
 * PySpark – show()
 * PySpark – StructType & StructField
 * PySpark – Row Class
 * PySpark – Column Class
 * PySpark – select()
 * PySpark – collect()
 * PySpark – withColumn()
 * PySpark – withColumnRenamed()
 * PySpark – where() & filter()
 * PySpark – drop() & dropDuplicates()
 * PySpark – orderBy() and sort()
 * PySpark – groupBy()
 * PySpark – join()
 * PySpark – union() & unionAll()
 * PySpark – unionByName()
 * PySpark – UDF (User Defined Function)
 * PySpark – map()
 * PySpark – flatMap()
 * pyspark – foreach()
 * PySpark – sample() vs sampleBy()
 * PySpark – fillna() & fill()
 * PySpark – pivot() (Row to Column)
 * PySpark – partitionBy()
 * PySpark – ArrayType Column (Array)
 * PySpark – MapType (Map/Dict)

PYSPARK SQL FUNCTIONS

 * PySpark – Aggregate Functions
 * PySpark – Window Functions
 * PySpark – Date and Timestamp Functions
 * PySpark – JSON Functions

PYSPARK DATASOURCES

 * PySpark – Read & Write CSV File
 * PySpark – Read & Write Parquet File
 * PySpark – Read & Write JSON file

PYSPARK BUILT-IN FUNCTIONS

 * PySpark – when()
 * PySpark – expr()
 * PySpark – lit()
 * PySpark – split()
 * PySpark – concat_ws()
 * Pyspark – substring()
 * PySpark – translate()
 * PySpark – regexp_replace()
 * PySpark – overlay()
 * PySpark – to_timestamp()
 * PySpark – to_date()
 * PySpark – date_format()
 * PySpark – datediff()
 * PySpark – months_between()
 * PySpark – explode()
 * PySpark – array_contains()
 * PySpark – array()
 * PySpark – collect_list()
 * PySpark – collect_set()
 * PySpark – create_map()
 * PySpark – map_keys()
 * PySpark – map_values()
 * PySpark – struct()
 * PySpark – countDistinct()
 * PySpark – sum(), avg()
 * PySpark – row_number()
 * PySpark – rank()
 * PySpark – dense_rank()
 * PySpark – percent_rank()
 * PySpark – typedLit()
 * PySpark – from_json()
 * PySpark – to_json()
 * PySpark – json_tuple()
 * PySpark – get_json_object()
 * PySpark – schema_of_json()





00:00/00:00



MOST VIEWED ARTICLES

 * How to Get Column Average or Mean in pandas DataFrame
 * Pandas groupby() and count() with Examples
 * Pandas Convert Column to Int in DataFrame
 * Pandas Convert Column to Float in DataFrame
 * PySpark Where Filter Function | Multiple Conditions






TOP TUTORIALS

 * Apache Spark Tutorial
 * PySpark Tutorial
 * Python Pandas Tutorial
 * R Programming Tutorial
 * Python NumPy Tutorial
 * Apache Hive Tutorial
 * Apache HBase Tutorial
 * Apache Cassandra Tutorial
 * Apache Kafka Tutorial
 * Snowflake Data Warehouse Tutorial
 * H2O Sparkling Water Tutorial


CATEGORIES

 * Apache Spark
 * PySpark
 * Pandas
 * R Programming
 * Snowflake Database
 * NumPy
 * Apache Hive
 * Apache HBase
 * Apache Kafka
 * Apache Cassandra
 * H2O Sparkling Water


ABOUT US

 * About SparkByExamples
 * Contact SparkByExamples
 * Write For US
 * Proud Contributors




ABOUT SPARKBYEXAMPLES.COM

SparkByExamples.com is a Big Data and Spark examples community page, all
examples are simple and easy to understand, and well tested in our development
environment Read more ..
 * Opens in a new tab
 * Opens in a new tab
 * Opens in a new tab
 * Opens in a new tab
 * Opens in a new tab

sparkbyexamples@gmail.com
+1 (949) 345-0676
Desert Bloom
Irvine, CA 92618
USA
Copyright sparkbyexamples.com
Share via
Facebook
Twitter
LinkedIn
Mix
Pinterest
Tumblr
Skype
Buffer
Pocket
VKontakte
Parler
Xing
Reddit
Line
Flipboard
MySpace
Delicious
Amazon
Digg
Evernote
Blogger
LiveJournal
Baidu
MeWe
NewsVine
Yummly
Yahoo
WhatsApp
Viber
SMS
Telegram
Facebook Messenger
Like
Email
Print
Copy Link
Powered by Social Snap
Copy link
CopyCopied
Powered by Social Snap

Privacy & Cookies: This site uses cookies. By continuing to use this website,
you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

sparkbyexamples.com Open in urlscan Pro 2a06:98c1:3120::3 Public Scan

Form analysis 4 forms found in the DOM

GET https://sparkbyexamples.com/

GET https://sparkbyexamples.com/

POST https://sparkbyexamples.com/wp-comments-post.php

POST

Text Content

sparkbyexamples.com Open in urlscan Pro
2a06:98c1:3120::3 Public Scan

Form analysis
4 forms found in the DOM