sparkbyexamples.com
Open in
urlscan Pro
2a06:98c1:3120::3
Public Scan
URL:
https://sparkbyexamples.com/pyspark/pyspark-withcolumn/
Submission: On November 14 via manual from LU — Scanned from NL
Submission: On November 14 via manual from LU — Scanned from NL
Form analysis
4 forms found in the DOMGET https://sparkbyexamples.com/
<form role="search" method="get" class="searchform" action="https://sparkbyexamples.com/" data-hs-cf-bound="true"> <label for="ocean-search-form-1"> <span class="screen-reader-text">Search this website</span> <input type="search"
id="ocean-search-form-1" class="field" autocomplete="off" placeholder="Search" name="s"> <input type="hidden" name="post_type" value="post"> </label></form>
GET https://sparkbyexamples.com/
<form aria-label="Search this website" method="get" action="https://sparkbyexamples.com/" class="mobile-searchform" data-hs-cf-bound="true"> <input aria-label="Insert search query" value="" class="field" id="ocean-mobile-search-2" type="search"
name="s" autocomplete="off" placeholder="Search"> <button aria-label="Submit search" type="submit" class="searchform-submit"> <i class=" icon-magnifier" aria-hidden="true" role="img"></i> </button> <input type="hidden" name="post_type"
value="post"></form>
POST https://sparkbyexamples.com/wp-comments-post.php
<form action="https://sparkbyexamples.com/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate="" data-hs-cf-bound="true">
<div class="comment-textarea"><label for="comment" class="screen-reader-text">Comment</label><textarea name="comment" id="comment" cols="39" rows="4" tabindex="0" class="textarea-comment" placeholder="Your comment here..."></textarea></div>
<div class="comment-form-author"><label for="author" class="screen-reader-text">Enter your name or username to comment</label><input type="text" name="author" id="author" value="" placeholder="Name" size="22" tabindex="0" class="input-name"></div>
<div class="comment-form-email"><label for="email" class="screen-reader-text">Enter your email address to comment</label><input type="text" name="email" id="email" value="" placeholder="Email" size="22" tabindex="0" class="input-email"></div>
<div class="comment-form-url"><label for="url" class="screen-reader-text">Enter your website URL (optional)</label><input type="text" name="url" id="url" value="" placeholder="Website" size="22" tabindex="0" class="input-website"></div>
<p class="form-submit"><input name="submit" type="submit" id="comment-submit" class="submit" value="Post Comment"> <input type="hidden" name="comment_post_ID" value="7134" id="comment_post_ID"> <input type="hidden" name="comment_parent"
id="comment_parent" value="0"></p>
<p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="c677a33927"></p>
<p style="display: none !important;"><label>Δ<textarea name="ak_hp_textarea" cols="45" rows="8" maxlength="100"></textarea></label><input type="hidden" id="ak_js_1" name="ak_js" value="1668437218812">
<script type="text/javascript">
document.getElementById("ak_js_1").setAttribute("value", (new Date()).getTime());
</script>
</p>
</form>
POST
<form method="post" data-hs-cf-bound="true"> <input type="submit" value="Close and accept" class="accept"></form>
Text Content
WE VALUE YOUR PRIVACY We and our partners store and/or access information on a device, such as cookies and process personal data, such as unique identifiers and standard information sent by a device for personalised ads and content, ad and content measurement, and audience insights, as well as to develop and improve products. With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our partners’ processing as described above. Alternatively you may access more detailed information and change your preferences before consenting or to refuse consenting. Please note that some processing of your personal data may not require your consent, but you have a right to object to such processing. Your preferences will apply to this website only. You can change your preferences at any time by returning to this site or visit our privacy policy. MORE OPTIONSAGREE Skip to content * Home * About * Write For US | { One stop for all Spark Examples } * * * * * * * Spark * Spark RDD * Spark DataFrame * Spark SQL Functions * What’s New in Spark 3.0? * Spark Streaming * Apache Spark Interview Questions * PySpark * Pandas * R * R Programming * R Data Frame * R Vector * R dplyr Tutorial * Snowflake * Hive * Interview Q * Spark Interview Questions * More * KafkaApache Kafka Tutorials with Examples * NumPy * H2O.ai * Apache Hadoop * Apache HBase * Apache Cassandra * H2O Sparkling Water * Scala Language * Search this website Menu Close * Spark * Spark RDD * Spark DataFrame * Spark SQL Functions * What’s New in Spark 3.0? * Spark Streaming * Apache Spark Interview Questions * PySpark * Pandas * R * R Programming * R Data Frame * R Vector * R dplyr Tutorial * Snowflake * Hive * Interview Q * Spark Interview Questions * More * Kafka * NumPy * H2O.ai * Apache Hadoop * Apache HBase * Apache Cassandra * H2O Sparkling Water * Scala Language * * Home * About * Write For US PYSPARK WITHCOLUMN() USAGE WITH EXAMPLES * Post author:NNK * Post category:PySpark * Post last modified:August 6, 2022 PySpark withColumn() is a transformation function of DataFrame which is used to change the value, convert the datatype of an existing column, create a new column, and many more. In this post, I will walk you through commonly used PySpark DataFrame column operations using withColumn() examples. * PySpark withColumn – To change column DataType * Transform/change value of an existing column * Derive new column from an existing column * Add a column with the literal value * Rename column name * Drop DataFrame column First, let’s create a DataFrame to work with. data = [('James','','Smith','1991-04-01','M',3000), ('Michael','Rose','','2000-05-19','M',4000), ('Robert','','Williams','1978-09-05','M',4000), ('Maria','Anne','Jones','1967-12-01','F',4000), ('Jen','Mary','Brown','1980-02-17','F',-1) ] columns = ["firstname","middlename","lastname","dob","gender","salary"] from pyspark.sql import SparkSession spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate() df = spark.createDataFrame(data=data, schema = columns) Copy 1. CHANGE DATATYPE USING PYSPARK WITHCOLUMN() By using PySpark withColumn() on a DataFrame, we can cast or change the data type of a column. In order to change data type, you would also need to use cast() function along with withColumn(). The below statement changes the datatype from String to Integer for the salary column. df.withColumn("salary",col("salary").cast("Integer")).show() Copy 2. UPDATE THE VALUE OF AN EXISTING COLUMN PySpark withColumn() function of DataFrame can also be used to change the value of an existing column. In order to change the value, pass an existing column name as a first argument and a value to be assigned as a second argument to the withColumn() function. Note that the second argument should be Column type . Also, see Different Ways to Update PySpark DataFrame Column. df.withColumn("salary",col("salary")*100).show() Copy This snippet multiplies the value of “salary” with 100 and updates the value back to “salary” column. 3. CREATE A COLUMN FROM AN EXISTING To add/create a new column, specify the first argument with a name you want your new column to be and use the second argument to assign a value by applying an operation on an existing column. Also, see Different Ways to Add New Column to PySpark DataFrame. df.withColumn("CopiedColumn",col("salary")* -1).show() Copy This snippet creates a new column “CopiedColumn” by multiplying “salary” column with value -1. 4. ADD A NEW COLUMN USING WITHCOLUMN() In order to create a new column, pass the column name you wanted to the first argument of withColumn() transformation function. Make sure this new column not already present on DataFrame, if it presents it updates the value of that column. On below snippet, PySpark lit() function is used to add a constant value to a DataFrame column. We can also chain in order to add multiple columns. df.withColumn("Country", lit("USA")).show() df.withColumn("Country", lit("USA")) \ .withColumn("anotherColumn",lit("anotherValue")) \ .show() Copy 5. RENAME COLUMN NAME Though you cannot rename a column using withColumn, still I wanted to cover this as renaming is one of the common operations we perform on DataFrame. To rename an existing column use withColumnRenamed() function on DataFrame. df.withColumnRenamed("gender","sex") \ .show(truncate=False) Copy 6. DROP COLUMN FROM PYSPARK DATAFRAME Use “drop” function to drop a specific column from the DataFrame. df.drop("salary") \ .show() Copy Note: Note that all of these functions return the new DataFrame after applying the functions instead of updating DataFrame. 7. PYSPARK WITHCOLUMN() COMPLETE EXAMPLE import pyspark from pyspark.sql import SparkSession from pyspark.sql.functions import col, lit from pyspark.sql.types import StructType, StructField, StringType,IntegerType spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate() data = [('James','','Smith','1991-04-01','M',3000), ('Michael','Rose','','2000-05-19','M',4000), ('Robert','','Williams','1978-09-05','M',4000), ('Maria','Anne','Jones','1967-12-01','F',4000), ('Jen','Mary','Brown','1980-02-17','F',-1) ] columns = ["firstname","middlename","lastname","dob","gender","salary"] df = spark.createDataFrame(data=data, schema = columns) df.printSchema() df.show(truncate=False) df2 = df.withColumn("salary",col("salary").cast("Integer")) df2.printSchema() df2.show(truncate=False) df3 = df.withColumn("salary",col("salary")*100) df3.printSchema() df3.show(truncate=False) df4 = df.withColumn("CopiedColumn",col("salary")* -1) df4.printSchema() df5 = df.withColumn("Country", lit("USA")) df5.printSchema() df6 = df.withColumn("Country", lit("USA")) \ .withColumn("anotherColumn",lit("anotherValue")) df6.printSchema() df.withColumnRenamed("gender","sex") \ .show(truncate=False) df4.drop("CopiedColumn") \ .show(truncate=False) Copy The complete code can be downloaded from PySpark withColumn GitHub project Happy Learning !! Share via: * * * * * * * * * * More YOU MAY ALSO LIKE READING: 1. Spark Persistence Storage Levels 2. PySpark orderBy() and sort() explained 3. Spark Web UI – Understanding Spark Execution 4. PySpark Create DataFrame from List 5. PySpark flatMap() Transformation 6. PySpark Row using on DataFrame and RDD 7. Setup and run PySpark on Spyder IDE 8. PySpark Refer Column Name With Dot (.) 9. Spark Check String Column Has Numeric Values 10. Install PySpark in Anaconda & Jupyter Notebook Tags: withColumn, withColumnRenamed NNK SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more .. LEAVE A REPLY CANCEL REPLY Comment Enter your name or username to comment Enter your email address to comment Enter your website URL (optional) Δ THIS POST HAS 3 COMMENTS 1. raghav 21 Dec 2020 Reply Hi df2 = df.withColumn(“salary”,col(“salary”).cast(“Integer”)) df2.printSchema() I dont want to create a new dataframe if I am changing the datatype of existing dataframe. Is there a way I can change column datatype in existing dataframe without creating a new dataframe ? 1. NNK 25 Dec 2020 Reply DataFrames are immutable hence you cannot change anything directly on it. every operation on DataFrame results in a new DataFrame. If you want to change the DataFrame, I would recommend using the Schema at the time of creating the DataFrame. 2. Anonymous 16 Sep 2020 Reply Can you please explain Split column to multiple columns from Scala example into python PYSPARK TUTORIAL * PySpark Tutorial For Beginners * PySpark – Features * PySpark – Advantages * PySpark – Modules & Packages * PySpark – Cluster Managers * PySpark – Install on Windows * PySpark – Install on Mac * PySpark – Web/Application UI * PySpark – SparkSession * PySpark – SparkContext * PySpark – RDD * PySpark – Parallelize * PySpark – repartition() vs coalesce() * PySpark – Broadcast Variables * PySpark – Accumulator PYSPARK DATAFRAME * PySpark – Create a DataFrame * PySpark – Create an empty DataFrame * PySpark – Convert RDD to DataFrame * PySpark – Convert DataFrame to Pandas * PySpark – show() * PySpark – StructType & StructField * PySpark – Row Class * PySpark – Column Class * PySpark – select() * PySpark – collect() * PySpark – withColumn() * PySpark – withColumnRenamed() * PySpark – where() & filter() * PySpark – drop() & dropDuplicates() * PySpark – orderBy() and sort() * PySpark – groupBy() * PySpark – join() * PySpark – union() & unionAll() * PySpark – unionByName() * PySpark – UDF (User Defined Function) * PySpark – map() * PySpark – flatMap() * pyspark – foreach() * PySpark – sample() vs sampleBy() * PySpark – fillna() & fill() * PySpark – pivot() (Row to Column) * PySpark – partitionBy() * PySpark – ArrayType Column (Array) * PySpark – MapType (Map/Dict) PYSPARK SQL FUNCTIONS * PySpark – Aggregate Functions * PySpark – Window Functions * PySpark – Date and Timestamp Functions * PySpark – JSON Functions PYSPARK DATASOURCES * PySpark – Read & Write CSV File * PySpark – Read & Write Parquet File * PySpark – Read & Write JSON file PYSPARK BUILT-IN FUNCTIONS * PySpark – when() * PySpark – expr() * PySpark – lit() * PySpark – split() * PySpark – concat_ws() * Pyspark – substring() * PySpark – translate() * PySpark – regexp_replace() * PySpark – overlay() * PySpark – to_timestamp() * PySpark – to_date() * PySpark – date_format() * PySpark – datediff() * PySpark – months_between() * PySpark – explode() * PySpark – array_contains() * PySpark – array() * PySpark – collect_list() * PySpark – collect_set() * PySpark – create_map() * PySpark – map_keys() * PySpark – map_values() * PySpark – struct() * PySpark – countDistinct() * PySpark – sum(), avg() * PySpark – row_number() * PySpark – rank() * PySpark – dense_rank() * PySpark – percent_rank() * PySpark – typedLit() * PySpark – from_json() * PySpark – to_json() * PySpark – json_tuple() * PySpark – get_json_object() * PySpark – schema_of_json() 00:00/00:00 MOST VIEWED ARTICLES * How to Get Column Average or Mean in pandas DataFrame * Pandas groupby() and count() with Examples * Pandas Convert Column to Int in DataFrame * Pandas Convert Column to Float in DataFrame * PySpark Where Filter Function | Multiple Conditions TOP TUTORIALS * Apache Spark Tutorial * PySpark Tutorial * Python Pandas Tutorial * R Programming Tutorial * Python NumPy Tutorial * Apache Hive Tutorial * Apache HBase Tutorial * Apache Cassandra Tutorial * Apache Kafka Tutorial * Snowflake Data Warehouse Tutorial * H2O Sparkling Water Tutorial CATEGORIES * Apache Spark * PySpark * Pandas * R Programming * Snowflake Database * NumPy * Apache Hive * Apache HBase * Apache Kafka * Apache Cassandra * H2O Sparkling Water ABOUT US * About SparkByExamples * Contact SparkByExamples * Write For US * Proud Contributors ABOUT SPARKBYEXAMPLES.COM SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand, and well tested in our development environment Read more .. * Opens in a new tab * Opens in a new tab * Opens in a new tab * Opens in a new tab * Opens in a new tab sparkbyexamples@gmail.com +1 (949) 345-0676 Desert Bloom Irvine, CA 92618 USA Copyright sparkbyexamples.com Share via Facebook Twitter LinkedIn Mix Pinterest Tumblr Skype Buffer Pocket VKontakte Parler Xing Reddit Line Flipboard MySpace Delicious Amazon Digg Evernote Blogger LiveJournal Baidu MeWe NewsVine Yummly Yahoo WhatsApp Viber SMS Telegram Facebook Messenger Like Email Print Copy Link Powered by Social Snap Copy link CopyCopied Powered by Social Snap Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy