medium.datadriveninvestor.com
Open in
urlscan Pro
162.159.152.4
Public Scan
Submitted URL: https://medium.datadriveninvestor.com/mastering-chatgpt-in-python-a53814e834b0
Effective URL: https://medium.datadriveninvestor.com/mastering-chatgpt-in-python-a53814e834b0?gi=01f23ab2e398
Submission: On January 25 via manual from GB — Scanned from GB
Effective URL: https://medium.datadriveninvestor.com/mastering-chatgpt-in-python-a53814e834b0?gi=01f23ab2e398
Submission: On January 25 via manual from GB — Scanned from GB
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign In Write Sign up Sign In Published in DataDrivenInvestor You have 2 free member-only stories left this month. Sign up for Medium and get an extra one Sadrach Pierre, Ph.D. Follow Jan 12 · 9 min read · Member-only · Listen Save MASTERING THE GPT-3 API IN PYTHON EXPLORING DATA SCIENCE USE CASES WITH THE GPT-3 API Image by Miguel Á. Padriñán on Pexels GPT-3 is a language machine-learning model that was released by Open AI late last year. It has gained widespread media attention for its ability to write essays, songs, poetry, and even code! The tool is free to use and simply requires an email to sign up. GPT-3 is a type of machine-learning model called a transformer. Specifically, it is an extension of the Generative Pre-training Transformer, hence “GPT.” The transformer architecture uses self-attention and reinforcement learning to model conversational text. Generally, it works by processing text one word at a time and it uses previous words to predict the next word in a sequence. GPT-3 has a large list of use cases across all fields including the sciences, arts, and technology. It can be used to answer basic questions about science and math. It can even answer, with decent accuracy, questions related to graduate-level math and science concepts. Even crazier, I asked some questions related to my Ph.D. research in chemical physics and it was able to provide decent explanations for some graduate-level concepts. It does have its limitations, though. When I asked GPT-3 about the more novel research methods in chemical physics it was unable to provide a sensible answer. Because of this, GPT-3 should be used with caution when being used as a search engine for education and research. GPT-3 has no fact-checking capabilities. As the fact-checking capabilities improve I can imagine GPT-3 being very useful for graduate students and maybe even research one day. In addition to my personal experience, I’ve seen many other cool applications of the tool. For example, a developer used GPT-3 to orchestrate cloud services for completing complex tasks. Other users have generated working python and SQL, among other languages, scripts with GPT-3. In the arts, users have asked GPT-3 to write essays comparing modern and contemporary art. The potential applications of GPT-3 are plentiful across basically any domain. GPT-3 performs very well at answering basic questions about topics for which there is a great deal of accurate content on the web. For example, it can do a decent job explaining photosynthesis. It won’t be able to do as good of a job answering cutting-edge research questions around photosynthesis. For example, it won’t be able to describe the mechanistic process of photosynthesis and the quantum particles involved. While it can give decent high-level responses but isn’t likely to give insight into the technical details of most research questions. Similarly, GPT-3 can write some simple working code but as the complexity of the task increases the more error-prone and often wrong the generated code. It also can’t generate political opinions, ethical values, investment advice, accurate news reporting, and other types of content that are typically human-generated. While GPT-3 has its limitations, its wide range of applicability is impressive. I thought it would be fun to come up with some interesting data science and machine learning prompts to see if they can supplement parts of the data science workflow. To start we will generate some data science-related text from some simple prompts. Once we have a bit more familiarity with the tool, we will see if we can ask questions that can help with some come data science tasks. There are several interesting data science and machine learning questions that we can ask GPT-3. For example, can we use GPT-3 to source publicly available data sets? The training data for GPT-3 contains data up until 2021, so presumably, it should know public datasets within the past 1–2 years. Another interesting application is question framing. Can GPT-3 help users frame good machine-learning research questions? While it struggles to give specific technical answers maybe it can do a good job of framing good unanswered research questions. Another cool application is using GPT-3 to decide which ML model to use for a particular application. This would be nice since, for well-established techniques for which there is copious online literature, it should be able to do a good job of helping a user select a model as well as explain why the selected model will work best. Finally, we can try to use GPT-3 to write some python code for some data science tasks. For example, can we will see if we can use it to write code that generates synthetic data for particular use cases. NOTE: The results of GPT-3 API are not deterministic. Because of this, the results you get may slightly differ from the output shown here. Further, since GPT-3 does not have a fact-checking mechanism, I advise you double check any factual results you plan to use for work, school, or personal projects. For this work, I will be writing code in Deepnote, which is a collaborative data science notebook that makes running reproducible experiments very easy. SETTING UP GPT-3 To start, let’s navigate to Deepnote and create a new project (you can sign-up for free if you don’t already have an account). Let’s create a project called ‘GPT3’ and a notebook within this project called ‘GPT3_ds’. Next let’s install OpenAI using pip in the first notebook cell: Embedding created by author SOME SIMPLE DATA SCIENCE & MACHINE LEARNING PROMPTS Asking about python libraries Let’s start by store our key in the api_key attribute of our openAI object: import openai openai.api_key = "your-key" Now let’s ask about some well known python libraries. If we ask “what is the Pandas library?” GPT-3 will give a response: Embedding created by author We can even ask more specific questions such as “what are some common Pandas use cases?”. It gives reasonable answers listing data wrangling, data visualization, data aggregation, and time series analysis: Embedding created by author Let’s ask “what are the most common deep learning libraries?”: Embedding created by author We can even ask something like “what is a deep neural network?”: Embedding created by author Using GPT-3 for fact searching should be done with caution since there are no fact-checking mechanisms in place. For the most part, if you are looking to get a surface-level understanding of a well-known concept (that has many online blogs & forums) in data science and machine learning, GPT-3 should work fairly well. FEATURE ENGINEERING Another interesting use case for GPT-3 is using it as an aid to guide data science work. For example, can GPT-3 give some ideas on feature transformations that can improve model performance? Let’s ask GPT-3: Embedding created by author We see that it gives some good feature transformation suggestions as well as explanations of each. Let’s see if we can take this even further. Let’s ask it to write some python code where it performs standardization: Embedding created by author Let’s copy and paste this into a new cell and run it: Embedding created by author Next, let’s do the same for normalization: Embedding created by author Let’s run this code: Embedding created by author MODEL SELECTION Another cool use case model selection. Let’s see if GPT-3 can guide us on our select machine-learning models. Let’s ask “How do I select a time series model?” Embedding created by author The answers are reasonable listing, checks for stationarity, seasonality, trends, and model performance evaluation. Let’s ask a more specific question like “When to use ARIMA vs Prophet?”: Embedding created by author The full response: > ARIMA and FB Prophet are both used for time series forecasting. ARIMA is a > linear model and is better for predicting long-term trends with stationary > data. FB Prophet is a non-linear model and is better for predicting short-term > trends with non-stationary data. So, you would use ARIMA when you are mainly > concerned with predicting long-term trends, and you would use FB Prophet when > you are mainly concerned with predicting short-term trends. -GPT-3 GENERATING SYNTHETIC DATA Can we use GPT-3 to generate synthetic data? Let’s see! We will look at well-known examples of healthcare readmission data and transaction data. Healthcare readmission Let’s ask GPT-3 to “Write example python code that generates synthetic healthcare readmission data stored in a data frame”: Embedding created by author Let’s run this code: Embedding created by author Let’s see if we can ask it to build a classification model on this synthetic data that predict readmission and evaluates performance. The prompt will be: > “Write example python code that generates synthetic healthcare readmission > data stored in a dataframe. From this write code that builds a catboost model > that predicts readmission outcomes. Also write code to calculate and print > performance.” Embedding created by author And let’s copy and run this code: Embedding created by author Transaction data Let’s ask GPT-3 to “Write example python code that generates synthetic transaction data stored in a dataframe”: Embedding created by author Let’s copy this code and run it in a new cell: Embedding created by author Right now we have item ID, customer, and Purchase amount. Let’t see if we can get more specific. Let’s ask for age, gender, and location as well: Embedding created by author And let’s run the code: Embedding created by author Pretty impressive! PROMPTS FOR PUBLIC DATA SETS Another application is querying GPT-3 for public data sets. Let’s ask GPT-3 to list some public data sets: Embedding created by author Let’s see if we can find public data licensed under Apache 2.0. Let’s also ask for links to the source: Embedding created by author While not all of these links are correct, it does a decent job of finding source links. The Google Ngrams, common crawl, and NASA data are pretty good. If it doesn’t provide the exact location of the data, in most cases it provides a link to a webpage where the data can be found. Let’s ask for descriptions as well. Note that while results may overlap, they are slightly different with each run. To my knowledge, results are not always reproducible: Embedding created by author The results are: 1. OpenStreetMap: OpenStreetMap is a free, editable map of the world, created and maintained by volunteers and available for use under an open license. It contains millions of data points, including roads, buildings, and points of interest. Source: https://www.openstreetmap.org/ 2. Google Books Ngrams: Google Books Ngrams is a dataset of over 5 million books from Google Books, spanning from 1500 to 2008. It contains word counts for each year, allowing researchers to track the usage of words over time. Source: https://aws.amazon.com/datasets/google-books-ngrams/ 3. Wikipedia: Wikipedia is a free, open-source encyclopedia with millions of articles in hundreds of languages. It is available for use under the Creative Commons Attribution-ShareAlike license. Source: https://www.wikipedia.org/ 4. Common Crawl: Common Crawl is a large-scale web crawl that collects data from over 5 billion webpages. It is available for use under the Apache 2.0 license. Source: https://commoncrawl.org/ 5. Open Images Dataset: The Open Images Dataset is a collection of 9 million images annotated with labels spanning over 6000 categories. It is available for use under the Apache 2.0 license. Source: https://storage.googleapis.com/openimages/web/index.html -GPT-3 Open streetmaps, Google Ngrams, and the open image data set links are reasonable. PROMPTS FOR MACHINE LEARNING QUESTION FRAMING For our final example, we will see if GPT-3 can help us frame machine-learning problems. We will consider use-case formulation as well as research question framing. Use Case Prompts While GPT-3 contains data only up until 2021, it still may be able to help us ideate ML use cases that are relevant today. Let’s ask “What are some emerging machine learning use cases in social media?”: Embedding created by author 1. Automated Content Curation: Automatically curating content from social media platforms to create personalized content feeds for users. 2. Sentiment Analysis: Analyzing user sentiment from social media posts to gain insights into customer opinions and preferences. 3. Social Media Monitoring: Using machine learning algorithms to monitor social media conversations and detect potential issues or trends. 4. Social Media Advertising: Leveraging machine learning to optimize social media advertising campaigns and target the right audience. 5. Social Media Recommendations: Using machine learning to recommend content to users based on their interests and preferences. Let’s also see the response for healthcare: Embedding created by author RESEARCH QUESTION PROMPTS Now let’s see if it can frame some research questions. Let’s ask “What are some good research questions on using deep learning for image detection?” Embedding created by author “What are some good research questions related to NLP transformer models? Embedding created by author The code in this post is available on GitHub. CONCLUSIONS In this post, we discussed how to use GPT-3 for a variety of data science and machine learning tasks. First, we gave the API some simple prompts related to python libraries, feature engineering, and model selection. Next, we walked through how to use the GPT-3 API to generate synthetic healthcare and synthetic transaction data. We were also able to generate a query to build a readmission model on top of the synthetic healthcare data. We then discuss how we can query the API for information on publicly available data sets. Finally, we queried the API for emerging machine learning use cases and research questions. I hope you enjoyed this post. Feel free to download the code and see if you can incorporate GPT-3 into your data science workflow. Subscribe to DDIntel Here. Visit our website here: https://www.datadriveninvestor.com Join our network here: https://datadriveninvestor.com/collaborate Artificial Intelligence Machine Learning Python Chatgpt Data Science 221 221 1 221 1 ENJOY THE READ? REWARD THE WRITER.BETA Your tip will go to Sadrach Pierre, Ph.D. through a third-party platform of their choice, letting them know you appreciate their story. Give a tip GET AN EMAIL WHENEVER SADRACH PIERRE, PH.D. PUBLISHES. By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices. Subscribe MORE FROM DATADRIVENINVESTOR Follow empowerment through data, knowledge, and expertise. subscribe to DDIntel at https://ddintel.datadriveninvestor.com Tica Darius ·Jan 12 Member-only THE SLEEPER INVESTMENT THAT WILL MAKE YOU RICH IN 2023 It’s insane that nobody is talking about this — The investing world took a large hit with this recession. Everybody was chasing tech stocks, dividend stocks, unicorn startups, or wild crypto projects. But once it all imploded, investor sentiment plummeted and none of it works anymore. Luckily, this is an opportunity for new investors to explore other ideas than… Investing 5 min read Investing 5 min read -------------------------------------------------------------------------------- Share your ideas with millions of readers. Write on Medium -------------------------------------------------------------------------------- Dr. Gleb Tsipursky ·Jan 12 Member-only FTX FIASCO CAUSED BY SBF’S DOUBLE-OR-NOTHING PHILOSOPHY “Let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?” The vast majority of us would not take the risk of playing that game even once… Cognitive Bias 8 min read Cognitive Bias 8 min read -------------------------------------------------------------------------------- Dr. Gleb Tsipursky ·Jan 12 Member-only NEW STUDY SHOWS SMES ARE LEADING THE HYBRID WORK REVOLUTION TO WIN THE TALENT WARS As the world continues to navigate the challenges posed by the ongoing pandemic, small and medium enterprises (SMEs) have been faced with a particularly difficult economic environment. A new study by Vistage, published in January 2023, reveals that despite these challenges, 60% of SME CEOs are planning to increase headcount… Cognitive Bias 6 min read Cognitive Bias 6 min read -------------------------------------------------------------------------------- Rijul Singh Malik ·Jan 12 DATA SCIENCE FOR THE MASSES A blog geared towards those who are interested in learning data science. — Data Science for the Masses is a blog that covers a range of topics related to data science. From beginner tips to more advanced concepts, this blog has something for everyone who is interested in learning more about this field. Check it out today and start your data science journey! What is Data Science? … Data Science 5 min read Data Science 5 min read -------------------------------------------------------------------------------- Desiree Peralta ·Jan 12 Member-only DO YOURSELF A FAVOR AND DON’T BUY A HOUSE IN 2023 Advice from a real estate lover who got rich in this market. — In 2018, I bought my first house. It was a quick and inexperienced decision I made without even having studied the market well, but the results were surprisingly positive: The house is worth double today, and I haven’t sold it yet. Since then, I have bought three more houses. They… Finance 5 min read Finance 5 min read -------------------------------------------------------------------------------- Read more from DataDrivenInvestor RECOMMENDED FROM MEDIUM Nikita Tank in Dev Tutorials BEST CUSTOMER SERVICE TOOLS TO REPLY CONSUMERS WITH IN SECONDS Women in AI - Switzerland in WomeninAI PSYCHOLOGY OF ARTIFICIAL INTELLIGENCE Lambert Chu in Point API FROM QUILL TO KEYBOARD — WHAT’S NEXT? Bokyeong Woo in Lunit Team Blog CASE OF THE MONTH | WHERE IS BREAST CANCER? 🔎 Epilog AI EPILOG AI — DECEMBER 2020 UPDATE Ahmed Bader EDGE COMPUTING FOR PALM TREES J Peng in The HumAIn Blog BRIEF INTRODUCTION OF REINFORCEMENT LEARNING 2.5 AI Can Really Write THE FUTURE OF INVESTING: NEW AI METHODS AboutHelpTermsPrivacy -------------------------------------------------------------------------------- GET THE MEDIUM APP SADRACH PIERRE, PH.D. 3.5K Followers Writer for Built In & Towards Data Science. Cornell University Ph. D. in Chemical Physics. Follow MORE FROM MEDIUM The PyCoach in Artificial Corner 3 CHATGPT EXTENSIONS TO AUTOMATE YOUR LIFE Josep Ferrer in Geek Culture 5 CHATGPT FEATURES TO BOOST YOUR DAILY WORK Jim Clyde Monge in Generative AI CHATGPT PRO IS COMING — HERE’S HOW YOU CAN JOIN THE WAITLIST Mark Schaefer I JUST EARNED MY FIRST REVENUE FROM CHATGPT AND OTHER OBSERVATIONS Help Status Writers Blog Careers Privacy Terms About Text to speech To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.