generative-ai-newsroom.com Open in urlscan Pro
162.159.153.4 Public Scan

Back to summary
Submitted URL:
https://generative-ai-newsroom.com/how-to-use-gpt-4-to-summarize-documents-for-your-audience-18ecfe2ad6a4?gi=591b576e802b
Effective URL:
https://generative-ai-newsroom.com/how-to-use-gpt-4-to-summarize-documents-for-your-audience-18ecfe2ad6a4?gi=6167081975c5
Submission: On July 06 via api (July 6th 2023, 11:09:06 am UTC) from US — Scanned from DE
Form analysis
0 forms found in the DOM

Text Content

Open in app

Sign up

Sign In

Write


Sign up

Sign In



Top highlight


HOW TO USE GPT-4 TO SUMMARIZE DOCUMENTS FOR YOUR AUDIENCE

Nick Diakopoulos

·

Follow

Published in

Generative AI in the Newsroom

·
6 min read
·
Apr 11

53

1

Listen

Share



Editor’s Note: This post has been edited to include an addendum about
corrections made to one of the articles referenced.

There’s a lot happening in the field of generative AI. You could easily burn out
trying to stay on top of it all. That’s why I wanted to see if I could use GPT-4
to help summarize some of the latest research and provide tailored summaries for
the audience here. Today I published a couple examples of this which you can
read here and here.

To generate these articles I used a series of prompts to OpenAI’s GPT-4 model to
(1) analyze the research papers and extract particular pieces of information,
and then (2) write a summary based on those bits of extracted information. By
breaking this process down into two steps I was able to better control the
information that would be included in the final summary articles.

To define what I wanted the model to analyze in step 1 I first thought about the
audience for this blog: journalists such as reporters and editors who might want
to know what a new piece of research means for their practice and whether there
are any limitations that would curtail its value or utility. I then experimented
with some prompts and settled on three to extract information from the document
text [1,2,3]. Each of these papers is short enough that I could include the
entire text into GPT-4.

I fed the output from these three prompts into a final prompt to generate the
first article [4]. I tweaked the prompt for the second article to aim for a more
accessible blog-like style [5]. To see how I configured all the prompts
including the system prompt, the document prompts, and other model parameters
see the code in this Colab Notebook.


ACCURACY CHECKING

One of the biggest concerns about generative AI summaries is the potential for
fabrication of information. An article about a research paper needs to ensure
that facts are accurate and consistent with respect to that paper.

For the first paper, I read it thoroughly before I tried to summarize it. This
allowed me to quickly assess whether the generated summary was accurate with
respect to the paper. While I didn’t see any outright fabrication, one of the
generations included a sentence that didn’t make sense to me and another
sentence was a bit confusing or potentially misleading. This reaffirmed that you
really do need to have a human in the loop. Having read the paper, which took me
about 30 minutes, it only took about 5 minutes to read and assess the final
output.

For the second paper, I summarized it automatically without reading it. But this
time I spent more time factchecking the output, reading each sentence and
checking whether it was consistent with the underlying paper. I added one
snippet of text (“e.g. radio, speech, TV, etc.”) to help address what I thought
was an issue of specificity in the writing. This editing process took about 10
minutes, overall less time than the first paper since I only read small excerpts
of the paper as I was editing.

To create illustrations for the final articles I followed some of the advice
here and manually prompted DALL-E until it generated some images that I thought
were reasonable. That took perhaps another 10 minutes for each article.

So in total, the first article took me 45 minutes, and the second one took me
about 20 minutes. In both cases it took about 5 minutes for the model to extract
information and then output an article.

Further Reflections

I think GPT-4 is a viable technology for accelerating the translational coverage
of research for particular audiences. It drastically reduced the time and effort
needed to produce a tailored summary to about 20 minutes. This could be combined
with a prior use-case I’ve developed on story discovery to filter new research
articles of relevance to an audience and then automatically generate blog posts
summarizing those papers.

The key way that journalists can differentiate summary articles is by focusing
on extracting different pieces of information in the first step. In my case I
focused on key findings of interest to the target audience (reporters and
editors), including benefits, limitations, and critiques. But others could
configure their own questions and frames of interest to different audiences and
arrive at different outputs. Basically it’s up to the journalist to define what
matters and use that to drive the summary.

The other key area where journalists still need to be involved is in editing and
factchecking the resulting articles, including by checking for any sentences
that are too similar to sentences in the original research paper and might need
to be quoted in order to avoid potential plagiarism. I have a nagging feeling
that even though I configured the AI to be critical that it probably has some
blind spots and it could miss something. In an ideal world, the outputs of this
process would not only get edited for accuracy but serve more as a first draft
for a reporter to write-through, or even just to provide an impetus to go do
more reporting.

I’ll also admit that the output articles are perhaps not the most interestingly
written. A good writer could make them more engaging. Perhaps you could include
quotes, excerpts, or more examples (saliency), or explore how the findings might
actually be used in a specific news task (concreteness).

A final limitation here is that there is often visual information in research
papers that isn’t currently being considered in the process. Modern storytelling
is about more than text and at least for now there’s a need for a person to help
illustrate and think about whether there are data or figures needed to convey
the findings.

Addendum: After publishing the articles a reader pointed out that one of them
contained a few sentences that were similar enough to the underlying research
paper that you would want to include quotation marks in order to avoid any
claims of plagiarism. That article has now been corrected and an editor’s note
included. The other article was also checked but no issues were found. This
check was done manually but using an automated script to list any sentences that
were above a threshold of similarity to the original document.

—

[1] What research question is the paper trying to answer? Explain what the
researchers did to study that question, including the specific methods used and
analyses performed. Explain thoroughly and be sure to include specific details.

[2] What are the key findings reported in the paper that are important for
journalists such as reporters and editors? How might journalists such as
reporters and editors benefit from these findings? Why might there still be
limits to those benefits? Explain thoroughly and be sure to include specific
details.

[3] Critique the findings of the paper, focusing on their validity and utility
for journalists such as reporters and editors. Are there reasons not to trust
any of the findings? Explain thoroughly and be sure to include specific details.

[4] Here are some important observations about that research paper: <extracted
information>. Write a 600 word article about the paper using only the paper text
and the important observations about the paper above and focusing on the
benefits and limitations of the findings for journalists. Reduce scientific
jargon and technical terminology in the writing.

[5] Here are some important observations about that research paper: <extracted
information>. Write a 600 word article about the paper in the style of an online
blogger, using only the paper text and the important observations about the
paper above and focusing on the benefits and limitations of the findings for
journalists. Reduce scientific jargon and technical terminology in the writing
so that it is accessible to a broad audience.




Support independent authors and access the best of Medium.

Become a member
Become a member




Generative Ai
News


53

53

1


Follow



WRITTEN BY NICK DIAKOPOULOS

1.8K Followers
·Editor for

Generative AI in the Newsroom

Northwestern University Professor of Communication. Computational journalism,
algorithmic accountability, social computing — http://www.nickdiakopoulos.com/

Follow




MORE FROM NICK DIAKOPOULOS AND GENERATIVE AI IN THE NEWSROOM

Nick Diakopoulos

in

Generative AI in the Newsroom


THE STATE OF AI IN MEDIA: FROM HYPE TO REALITY


EXPLORES AI’S PERVASIVENESS AND HYPE IN THE NEWS MEDIA, EXPLORING SOME OF THE
DRIVING FACTORS AND IMPLICATIONS.

9 min read·May 9

5





Alessandro Alviani

in

Generative AI in the Newsroom


TOWARDS ACCURATE QUOTE-AWARE SUMMARIZATION OF NEWS USING GENERATIVE AI


HOW TO PROMPT A LANGUAGE MODEL TO REWRITE OR SUMMARIZE A NEWS ARTICLE WHILE
MAINTAINING ACCURATE QUOTES.

8 min read·Jun 2

71

1




Nikita Roy

in

Generative AI in the Newsroom


BUILDING A GPT-4 POWERED GOOGLE DOCS EXTENSION FOR NEWS QUIZ GENERATION


AN EXPERIMENT HIGHLIGHTING THE SIGNIFICANCE OF A SYSTEMATIC APPROACH TO PROMPT
ENGINEERING IN ACHIEVING HIGH-QUALITY OUTPUT FROM LLMS.

7 min read·Jun 8

1





Nick Diakopoulos

in

Generative AI in the Newsroom


FINDING NEWSWORTHY DOCUMENTS USING GENERATIVE AI


WHAT IF AI COULD SCAN THE WORLD FOR EVENTS AND INFORMATION AND SEND AN ALERT
WHEN SOMETHING LOOKED INTERESTING?

7 min read·Mar 6

89

1



See all from Nick Diakopoulos
See all from Generative AI in the Newsroom



RECOMMENDED FROM MEDIUM

The Jasper Whisperer

in

The Generator


THE DUMMY GUIDE TO ‘PERPLEXITY’ AND ‘BURSTINESS’ IN AI-GENERATED CONTENT


UNDERSTANDING LANGUAGE MODELS: A SIMPLIFIED GUIDE


·6 min read·Feb 17

216

3




Vedic Science

in

data-driven fiction


SECRET PROMPT THAT CHATGPT LOVES, WITH PROOFS


SECRET TO GETTING GREAT RESULTS THAT ONLY 1% OR LESS KNOW


·5 min read·Jan 26

2.4K

30





LISTS


AI REGULATION

6 stories·16 saves


APPLE'S VISION PRO

7 stories·5 saves


GENERATIVE AI RECOMMENDED READING

52 stories·46 saves


WHAT IS CHATGPT?

9 stories·122 saves


Jay Peterman

in

Towards Data Science


MAKE A TEXT SUMMARIZER WITH GPT-3


QUICK TUTORIAL USING PYTHON, OPENAI’S GPT-3, AND STREAMLIT


·11 min read·Jan 23

169

1




Edmond Yip



in

Bootcamp


VIDEO VERSION OF MIDJOURNEY EVOLVES AGAIN: GENERATE VIDEOS WITH A SINGLE
SENTENCE


WHEN IT COMES TO LARGE-SCALE AI MODELS IN THE GENERATIVE FIELD, WE HAVE CHATGPT
FOR TEXT, MIDJOURNEY AND STABLE DIFFUSION FOR IMAGES, BUT…


·7 min read·Jun 20

12





The Jasper Whisperer


CHATGPT VS JASPER CHAT: WHICH AI CHATBOT IS RIGHT FOR YOUR BUSINESS?


A COMPARISON OF CHATGPT AND JASPER CHAT


·4 min read·Jan 10

462

2




The Jasper Whisperer


WHAT’S THE DIFFERENCE BETWEEN GENERATIVE AI CHATBOTS AND AGI (ARTIFICIAL GENERAL
INTELLIGENCE)?


NAVIGATING THE AI LANDSCAPE FOR BUSINESS AND STUDENTS


·4 min read·Feb 12

260




See more recommendations

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.
generative-ai-newsroom.com Open in urlscan Pro 162.159.153.4 Public Scan

Form analysis 0 forms found in the DOM

Text Content

generative-ai-newsroom.com Open in urlscan Pro
162.159.153.4 Public Scan

Form analysis
0 forms found in the DOM