Decoding the US Senate Listening to on Oversight of AI: NLP Evaluation in Python | by Raul Vizcarra Chirinos | Jun, 2023

[ad_1]

Photograph by Harold Mendoza on Unsplash

Phrase frequency evaluation, visualization and sentiment scores utilizing the NLTK toolkit

Final Sunday morning, as I used to be switching TV channels looking for one thing to look at whereas having breakfast, I stumbled upon a replay of the Senate Listening to on Oversight of AI. It had solely been 40 minutes because it began, so I made a decision to look at the remainder of it (Discuss an fascinating option to spend a Sunday morning!).

When occasions just like the Senate Judiciary Subcommittee Listening to on Oversight of AI happen and also you need to compensate for the important thing takeaways, you’ve 4 choices: witness it stay, search for future recordings (each choices would require three hours of your life); learn the written model (transcripts), that are about 79 pages lengthy and over 29,000 phrases; or learn evaluations on web sites or social media to get totally different opinions and kind your personal ( if it’s not from others).

These days, with every part shifting so shortly and our days feeling too quick, it’s tempting to go for the shortcut and depend on evaluations as a substitute of going to the unique supply (I’ve been there too). For those who select the shortcut for this listening to, it’s extremely possible that almost all evaluations you’ll discover on the internet or social media concentrate on OpenAI CEO Sam Altman’s name for regulating AI. Nonetheless, after watching the listening to, I felt there was extra to discover past the headlines.

So, after my Sunday funday morning exercise, I made a decision to obtain the Senate Listening to transcript and use the NLTK Bundle (a Python bundle for pure language processing — NLP) to research it, evaluate most used phrases and apply some sentiment scores throughout totally different teams of curiosity (OpenAI, IBM, Academia, Congress) and see what may very well be between the traces. Spoiler alert! Out of the 29,000 phrases analyzed, solely 70 (0.24%) have been associated to phrases like regulation, regulate, regulatory, or laws.

It’s essential to notice that this text just isn’t about my takeaways from these AI listening to or Mr. ChatGPT Sam Altman. As an alternative, it focuses on what lies beneath the phrases of every a part of society (Non-public, Academia, Authorities) represented on this session below the roof of Capitol Hill, and what we are able to be taught from these phrases mixing with one another.

Contemplating that the subsequent few months are fascinating instances for the way forward for regulation on Synthetic Intelligence, as the ultimate draft of the EU AI Act awaits debate within the European Parliament (anticipated to happen in June), it’s value exploring what’s behind the discussions surrounding AI on this facet of the Atlantic.

STEP-01: GET THE DATA

I used the transcript revealed by Justin Hendrix in Tech Coverage Press (accessible right here).

Entry the Senate Listening to transcript right here

Whereas Hendrix mentions it’s a fast transcript and suggests confirming quotes by watching the Senate Listening to video, I nonetheless discovered it to be fairly correct and fascinating for this evaluation. If you wish to watch the Senate Listening to or learn the testimonies of Sam Altman (Open AI), Christina Montgomery (IBM), and Gary Marcus (Professor at New York College), you’ll find them right here.

Initially, I deliberate to repeat the transcript to a Phrase doc and manually create a desk in Excel with the members’ names, their representing organizations, and their feedback. Nonetheless, this method was time-consuming and inefficient. So, I turned to Python and uploaded the complete transcript from a Microsoft Phrase file into a knowledge body. Right here is the code I used:

# STEP 01-Learn the Phrase doc
# bear in mind to put in pip set up python-docx

import docx
import pandas as pd

doc = docx.Doc('D:....your phrase file on microsoft phrase')

gadgets = []
names = []
feedback = []

# Iterate over paragraphs
for paragraph in doc.paragraphs:
textual content = paragraph.textual content.strip()

if textual content.endswith(':'):
title = textual content[:-1]
else:
gadgets.append(len(gadgets))
names.append(title)
feedback.append(textual content)

dfsenate = pd.DataFrame({'merchandise': gadgets, 'title': names, 'remark': feedback})

# Take away rows with empty feedback
dfsenate = dfsenate[dfsenate['comment'].str.strip().astype(bool)]

# Reset the index
dfsenate.reset_index(drop=True, inplace=True)
dfsenate['item'] = dfsenate.index + 1
print(dfsenate)

The output ought to appear like this:

 merchandise title remark
0 1 Sen. Richard Blumenthal (D-CT) Now for some introductory remarks.
1 2 Sen. Richard Blumenthal (D-CT) “Too typically now we have seen what occurs when know-how outpaces regulation, the unbridled exploitation of private information, the proliferation of disinformation, and the deepening of societal inequalities. We've got seen how algorithmic biases can perpetuate discrimination and prejudice, and the way the dearth of transparency can undermine public belief. This isn't the longer term we wish.”
2 3 Sen. Richard Blumenthal (D-CT) For those who have been listening from dwelling, you might need thought that voice was mine and the phrases from me, however the truth is, that voice was not mine. The phrases weren't mine. And the audio was an AI voice cloning software program skilled on my ground speeches. The remarks have been written by ChatGPT when it was requested how I'd open this listening to. And also you heard simply now the end result I requested ChatGPT, why did you decide these themes and that content material? And it answered. And I’m quoting, Blumenthal has a powerful document in advocating for shopper safety and civil rights. He has been vocal about points akin to information privateness and the potential for discrimination in algorithmic determination making. Due to this fact, the assertion emphasizes these facets.
3 4 Sen. Richard Blumenthal (D-CT) Mr. Altman, I admire ChatGPT’s endorsement. In all seriousness, this obvious reasoning is fairly spectacular. I'm positive that we’ll look again in a decade and consider ChatGPT and GPT-4 like we do the primary mobile phone, these huge clunky issues that we used to hold round. However we acknowledge that we're on the verge, actually, of a brand new period. The audio and my taking part in, it might strike you as curious or humorous, however what reverberated in my thoughts was what if I had requested it? And what if it had offered an endorsement of Ukraine, surrendering or Vladimir Putin’s management? That will’ve been actually horrifying. And the prospect is greater than a bit of scary to make use of the phrase, Mr. Altman, you've used your self, and I feel you've been very constructive in calling consideration to the pitfalls in addition to the promise.
4 5 Sen. Richard Blumenthal (D-CT) And that’s the explanation why we needed you to be right here in the present day. And we thanks and our different witnesses for becoming a member of us for a number of months. Now, the general public has been fascinated with GPT, dally and different AI instruments. These examples just like the homework finished by ChatGPT or the articles and op-eds, that it will probably write really feel like novelties. However the underlying development of this period are extra than simply analysis experiments. They're now not fantasies of science fiction. They're actual and current the guarantees of curing most cancers or creating new understandings of physics and biology or modeling local weather and climate. All very encouraging and hopeful. However we additionally know the potential harms and we’ve seen them already weaponized disinformation, housing discrimination, harassment of girls and impersonation, fraud, voice cloning deep fakes. These are the potential dangers regardless of the opposite rewards. And for me, maybe the largest nightmare is the looming new industrial revolution. The displacement of hundreds of thousands of employees, the lack of big numbers of jobs, the necessity to put together for this new industrial revolution in ability coaching and relocation that could be required. And already trade leaders are calling consideration to these challenges.
5 6 Sen. Richard Blumenthal (D-CT) To cite ChatGPT, this isn't essentially the longer term that we wish. We have to maximize the great over the dangerous. Congress has a selection. Now. We had the identical selection after we face social media. We did not seize that second. The result's predators on the web, poisonous content material exploiting kids, creating risks for them. And Senator Blackburn and I and others like Senator Durbin on the Judiciary Committee try to take care of it within the Youngsters On-line Security Act. However Congress failed to satisfy the second on social media. Now now we have the duty to do it on AI earlier than the threats and the dangers change into actual. Wise safeguards aren't in opposition to innovation. Accountability just isn't a burden removed from it. They're the muse of how we are able to transfer forward whereas defending public belief. They're how we are able to lead the world in know-how and science, but additionally in selling our democratic values.
6 7 Sen. Richard Blumenthal (D-CT) In any other case, within the absence of that belief, I feel we might nicely lose each. These are subtle applied sciences, however there are primary expectations frequent in our legislation. We will begin with transparency. AI corporations must be required to check their methods, disclose identified dangers, and permit impartial researcher entry. We will set up scorecards and diet labels to encourage competitors based mostly on security and trustworthiness, limitations on use. There are locations the place the chance of AI is so excessive that we ought to limit and even ban their use, particularly in terms of business invasions of privateness for revenue and choices that have an effect on folks’s livelihoods. And naturally, accountability, reliability. When AI corporations and their shoppers trigger hurt, they need to be held liable. We must always not repeat our previous errors, for instance, Part 230, forcing corporations to assume forward and be liable for the ramifications of their enterprise choices could be probably the most highly effective device of all. Rubbish in, rubbish out. The precept nonetheless applies. We must watch out for the rubbish, whether or not it’s going into these platforms or popping out of them.

Subsequent, I thought-about including some labels for future analyis, figuring out the people by the phase of society they represented


def assign_sector(title):
if title in ['Sam Altman', 'Christina Montgomery']:
return 'Non-public'
elif title == 'Gary Marcus':
return 'Academia'
else:
return 'Congress'

# Apply operate
dfsenate['sector'] = dfsenate['name'].apply(assign_sector)

# Assign organizations based mostly on names
def assign_organization(title):
if title == 'Sam Altman':
return 'OpenAI'
elif title == 'Christina Montgomery':
return 'IBM'
elif title == 'Gary Marcus':
return 'Academia'
else:
return 'Congress'

# Apply operate
dfsenate['Organization'] = dfsenate['name'].apply(assign_organization)

print(dfsenate)

Lastly, I made a decision so as to add a column that counts the phrases from every assertion, which might assist us additionally for additional evaluation.

dfsenate['WordCount'] = dfsenate['comment'].apply(lambda x: len(x.cut up()))

At this half, your dataframe ought to appear like this:

   merchandise                            title  ... Group WordCount
0 1 Sen. Richard Blumenthal (D-CT) ... Congress 5
1 2 Sen. Richard Blumenthal (D-CT) ... Congress 55
2 3 Sen. Richard Blumenthal (D-CT) ... Congress 125
3 4 Sen. Richard Blumenthal (D-CT) ... Congress 145
4 5 Sen. Richard Blumenthal (D-CT) ... Congress 197
.. ... ... ... ... ...
399 400 Sen. Cory Booker (D-NJ) ... Congress 156
400 401 Sam Altman ... OpenAI 180
401 402 Sen. Cory Booker (D-NJ) ... Congress 72
402 403 Sen. Richard Blumenthal (D-CT) ... Congress 154
403 404 Sen. Richard Blumenthal (D-CT) ... Congress 98

STEP-02: VISUALIZE THE DATA

Let’s check out the numbers now we have up to now: 404 questions or testimonies and nearly 29,000 phrases. These numbers give us the fabric we have to get began. It’s essential to know that some statements have been cut up into smaller elements. When there have been lengthy statements with totally different paragraphs, the code divided them into separate statements, regardless that they have been really a part of one contribution. To get a greater understanding of every participant’s involvement, I additionally contemplate the variety of phrases they used. This gave one other perspective on their engagement.

Listening to on Oversight of AI: Determine 01

As you possibly can see in Determine 01, interventions by members of Congress represented greater than half of all of the hearings, adopted by Sam Altman’s testimony. Nonetheless, an alternate view obtained by counting the phrases from either side reveals a extra balanced illustration between Congress (11 members) and the panel composed of Altman (OpenAI), Montgomery (IBM), and Marcus (Academia).

It’s fascinating to notice the totally different ranges of engagement among the many members of Congress who participated within the Senate listening to (View desk beneath) . As anticipated, Sen. Blumenthal, because the Subcommittee Chair, was extremely engaged. However what concerning the different members? The desk reveals important variations in engagement amongst all eleven members. Keep in mind, the amount of contributions doesn’t essentially point out their high quality. I’ll allow you to do your personal judgement when you assessment the numbers.

Lastly, regardless that Sam Altman acquired numerous consideration, it’s value noting that Gary Marcus, regardless of it might seem that he had few participation alternatives, had loads to say, as indicated by his phrase rely, which is analogous to Altman’s. Or is it perhaps as a result of academia typically offers detailed explanations, whereas the enterprise world prefers practicality and ease?

Alright, professor Marcus, when you may very well be particular. That is your shot, man. Discuss in plain English and inform me what, if any guidelines we must implement. And please don’t simply use ideas. I’m in search of specificity.

Sen. John Kennedy (R-LA). US Senate Listening to on Oversight of AI ( 2023)

#*****************************PIE CHARTS************************************
import pandas as pd
import matplotlib.pyplot as plt

# Pie chart - Grouping by 'Group' Questions&Testimonies
org_colors = {'Congress': '#6BB6FF', 'OpenAI': 'inexperienced', 'IBM': 'lightblue', 'Academia': 'lightyellow'}
org_counts = dfsenate['Organization'].value_counts()

plt.determine(figsize=(8, 6))
patches, textual content, autotext = plt.pie(org_counts.values, labels=org_counts.index,
autopct=lambda p: f'{p:.1f}%n({int(p * sum(org_counts.values) / 100)})',
startangle=90, colours=[org_colors.get(org, 'gray') for org in org_counts.index])
plt.title('Listening to on Oversight of AI: Questions or Testimonies')
plt.axis('equal')
plt.setp(textual content, fontsize=12)
plt.setp(autotext, fontsize=12)
plt.present()

# Pie chart - Grouping by 'Group' (WordCount)
org_colors = {'Congress': '#6BB6FF', 'OpenAI': 'inexperienced', 'IBM': 'lightblue', 'Academia': 'lightyellow'}
org_wordcount = dfsenate.groupby('Group')['WordCount'].sum()

plt.determine(figsize=(8, 6))
patches, textual content, autotext = plt.pie(org_wordcount.values, labels=org_wordcount.index,
autopct=lambda p: f'{p:.1f}%n({int(p * sum(org_wordcount.values) / 100)})',
startangle=90, colours=[org_colors.get(org, 'gray') for org in org_wordcount.index])

plt.title('Listening to on Oversight of AI: WordCount ')
plt.axis('equal')
plt.setp(textual content, fontsize=12)
plt.setp(autotext, fontsize=12)
plt.present()

#************Engagement among the many members of Congress**********************

# Group by title and rely the rows
Summary_Name = dfsenate.groupby('title').agg(comment_count=('remark', 'measurement')).reset_index()

# WordCount column for every title
Summary_Name ['Total_Words'] = dfsenate.groupby('title')['WordCount'].sum().values

# Proportion distribution for comment_count
Summary_Name ['comment_count_%'] = Summary_Name['comment_count'] / Summary_Name['comment_count'].sum() * 100

# Proportion distribution for total_word_count
Summary_Name ['Word_count_%'] = Summary_Name['Total_Words'] / Summary_Name['Total_Words'].sum() * 100

Summary_Name = Summary_Name.sort_values('Total_Words', ascending=False)

print (Summary_Name)
+-------+--------------------------------+---------------+-------------+-----------------+--------------+
| index | title | Interventions | Total_Words | Interv_% | Word_count_% |
+-------+--------------------------------+---------------+-------------+-----------------+--------------+
| 2 | Sam Altman | 92 | 6355 | 22.77227723 | 22.32252626 |
| 1 | Gary Marcus | 47 | 5105 | 11.63366337 | 17.93178545 |
| 15 | Sen. Richard Blumenthal (D-CT) | 58 | 3283 | 14.35643564 | 11.53184165 |
| 10 | Sen. Josh Hawley (R-MO) | 25 | 2283 | 6.188118812 | 8.019249008 |
| 0 | Christina Montgomery | 36 | 2162 | 8.910891089 | 7.594225298 |
| 6 | Sen. Cory Booker (D-NJ) | 20 | 1688 | 4.95049505 | 5.929256384 |
| 7 | Sen. Dick Durbin (D-IL) | 8 | 1143 | 1.98019802 | 4.014893393 |
| 11 | Sen. Lindsey Graham (R-SC) | 32 | 880 | 7.920792079 | 3.091081527 |
| 5 | Sen. Christopher Coons (D-CT) | 6 | 869 | 1.485148515 | 3.052443008 |
| 12 | Sen. Marsha Blackburn (R-TN) | 14 | 869 | 3.465346535 | 3.052443008 |
| 4 | Sen. Amy Klobuchar (D-MN) | 11 | 769 | 2.722772277 | 2.701183744 |
| 13 | Sen. Mazie Hirono (D-HI) | 7 | 755 | 1.732673267 | 2.652007447 |
| 14 | Sen. Peter Welch (D-VT) | 11 | 704 | 2.722772277 | 2.472865222 |
| 3 | Sen. Alex Padilla (D-CA) | 7 | 656 | 1.732673267 | 2.304260775 |
+-------+--------------------------------+---------------+-------------+-----------------+--------------+

STEP-03: TOKENIZATION

Right here is the place the pure language processing (NLP) enjoyable begins. To research the textual content, we’ll use the NLTK Bundle in Python. It offers helpful instruments for phrase frequency evaluation and visualization. The next libraries and modules would supply the required instruments for phrase frequency evaluation and visualization.


#pip set up nltk
#pip set up spacy
#pip set up wordcloud
#pip set up subprocess
#python -m spacy obtain en

First, we’ll begin with Tokenization, which suggests breaking the textual content into particular person phrases, often known as “tokens.” For this, we’ll use spaCy, an open-source NLP library that may deal with contractions, punctuation, and particular characters. Subsequent, we’ll take away frequent phrases that don’t add a lot which means, like “a,” “an,” “the,” “is,” and “and,” utilizing the cease phrase useful resource from the NLTK library. Lastly, we’ll apply Lemmatization which reduces phrases to their base kind, generally known as the lemma. For instance, “working” turns into “run” and “happier” turns into “completely satisfied.” This system helps us work with the textual content extra successfully and perceive its which means.

To summarize:

o Tokenize the textual content.

o Take away frequent phrases.

o Apply Lemmatization.

#***************************WORD-FRECUENCY*******************************

import subprocess
import nltk
import spacy
from nltk.likelihood import FreqDist
from nltk.corpus import stopwords

# Obtain sources
subprocess.run('python -m spacy obtain en', shell=True)
nltk.obtain('punkt')

# Load spaCy mannequin and set stopwords
nlp = spacy.load('en_core_web_sm')
stop_words = set(stopwords.phrases('english'))

def preprocess_text(textual content):
phrases = nltk.word_tokenize(textual content)
phrases = [word.lower() for word in words if word.isalpha()]
phrases = [word for word in words if word not in stop_words]
lemmas = [token.lemma_ for token in nlp(" ".join(words))]
return lemmas

# Mixture phrases and create Frecuency Distribution
all_comments = ' '.be a part of(dfsenate['comment'])
processed_comments = preprocess_text(all_comments)
fdist = FreqDist(processed_comments)

#**********************HEARING TOP 30 COMMON WORDS*********************
import matplotlib.pyplot as plt
import numpy as np

# Most typical phrases and their frequencies
top_words = fdist.most_common(30)
phrases = [word for word, freq in top_words]
frequencies = [freq for word, freq in top_words]

# Bar plot-Listening to on Oversight of AI:High 30 Most Frequent Phrases
fig, ax = plt.subplots(figsize=(8, 10))
ax.barh(vary(len(phrases)), frequencies, align='middle', coloration='skyblue')

ax.invert_yaxis()
ax.set_xlabel('Frequency', fontsize=12)
ax.set_ylabel('Phrases', fontsize=12)
ax.set_title('Listening to on Oversight of AI:High 30 Most Frequent Phrases', fontsize=14)
ax.set_yticks(vary(len(phrases)))
ax.set_yticklabels(phrases, fontsize=10)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_linewidth(0.5)
ax.spines['bottom'].set_linewidth(0.5)
ax.tick_params(axis='x', labelsize=10)
plt.subplots_adjust(left=0.3)

for i, freq in enumerate(frequencies):
ax.textual content(freq + 5, i, str(freq), va='middle', fontsize=8)

plt.present()

Listening to on Oversight of AI: Determine 02

As you possibly can see within the bar plot (Figur 02) , there was numerous “Considering”. Perhaps the primary 5 phrases give us an fascinating trace of what we should always do in the present day and for our future by way of AI:

“We want to assume and know the place AI ought to go”.

As I discussed at the start of this text, at first sight, “regulation” doesn’t stand out as a often used phrase within the Senate AI Listening to. Nonetheless, concluding that it wasn’t a subject of predominant concern may very well be inaccurate . The curiosity in whether or not AI ought to or shouldn’t be regulated was expressed in several phrases akin to “regulation”, “regulate”, “company” or “regulatory”. Due to this fact, lets make some changes to the code, combination these phrases, and re-run the bar plot to see the way it impacts the evaluation.

nlp = spacy.load('en_core_web_sm')
stop_words = set(stopwords.phrases('english'))

def preprocess_text(textual content):
phrases = nltk.word_tokenize(textual content)
phrases = [word.lower() for word in words if word.isalpha()]
phrases = [word for word in words if word not in stop_words]
lemmas = [token.lemma_ for token in nlp(" ".join(words))]
return lemmas

# Mixture phrases and create Frecuency Distribution
all_comments = ' '.be a part of(dfsenate['comment'])
processed_comments = preprocess_text(all_comments)
fdist = FreqDist(processed_comments)
original_fdist = fdist.copy() # Save the unique object

aggregate_words = ['regulation', 'regulate','agency', 'regulatory','legislation']
aggregate_freq = sum(fdist[word] for phrase in aggregate_words)
df_aggregatereg = pd.DataFrame({'Phrase': aggregate_words, 'Frequency': [fdist[word] for phrase in aggregate_words]})

# Take away particular person phrases and add aggregation
for phrase in aggregate_words:
del fdist[word]
fdist['regulation+agency'] = aggregate_freq

# Pie chart for Regulation+company distribution
import matplotlib.pyplot as plt

labels = df_aggregatereg['Word']
values = df_aggregatereg['Frequency']

plt.determine(figsize=(8, 6))
plt.subplots_adjust(high=0.8, backside=0.25)

patches, textual content, autotext = plt.pie(values, labels=labels,
autopct=lambda p: f'{p:.1f}%n({int(p * sum(values) / 100)})',
startangle=90, colours=['#6BB6FF', 'green', 'lightblue', 'lightyellow', 'gray'])

plt.title('Regulation+company: Distribution', fontsize=14)
plt.axis('equal')
plt.setp(textual content, fontsize=8)
plt.setp(autotext, fontsize=8)
plt.present()

Listening to on Oversight of AI: Determine 03

As you possibly can see in Determine-03, the subject of regulation was in spite of everything many instances throughout the Senate AI Listening to.

STEP-04: WHAT HIDES BEHIND THE WORDS

Phrases alone might present us with some clues, however it’s the interconnection of phrases that actually gives us some perspective. So, let’s take an method utilizing phrase clouds to discover if we are able to uncover insights that can not be proven by easy bar and pie charts.

# Phrase cloud-Senate Listening to on Oversight of AI
from wordcloud import WordCloud
wordcloud = WordCloud(width=800, top=400, background_color='white').generate_from_frequencies(fdist)
plt.determine(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Phrase Cloud - Senate Listening to on Oversight of AI')
plt.present()
Listening to on Oversight of AI: Determine 04

Let’s discover additional and evaluate the phrase clouds for the totally different teams of curiosity represented within the AI Listening to (Non-public, Congress, Academia) and see in the event that they phrases reveal totally different views on the way forward for AI.

# Phrase clouds for every group of Curiosity
organizations = dfsenate['Organization'].distinctive()
for group in organizations:
feedback = dfsenate[dfsenate['Organization'] == group]['comment']
all_comments = ' '.be a part of(feedback)
processed_comments = preprocess_text(all_comments)
fdist_organization = FreqDist(processed_comments)

# Phrase clouds
wordcloud = WordCloud(width=800, top=400, background_color='white').generate_from_frequencies(fdist_organization)
plt.determine(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
if group == 'IBM':
plt.title(f'Phrase Cloud: {group} - Christina Montgomery')
elif group == 'OpenAI':
plt.title(f'Phrase Cloud: {group} - Sam Altman')
elif group == 'Academia':
plt.title(f'Phrase Cloud: {group} - Gary Marcus')
else:
plt.title(f'Phrase Cloud: {group}')
plt.present()

Listening to on Oversight of AI: Determine 05

It’s fascinating how some phrases seem (or disappear) for every group of curiosity represented within the Senate AI Listening to whereas they speak about synthetic intelligence.

When it comes to the massive heading, “Sam Altman’s name for regulating AI” ; nicely, if he’s in favor of regulation or not, I actually can’t inform, however it doesn’t appear to have a lot regulation in its phrases to me.As an alternative, Sam Altman appears to have a people-centric method when he talks about AI, repeating phrases like “assume,” “folks,” “know,” “essential,” and “use,” and depends extra on phrases like “know-how” ,”system” or “mannequin” as a substitute of utilizing the phrase “AI”.

Somebody that did had one thing to say about “threat”, and “points” was Christina Montgomery (IBM) who repeated this phrases continuously when speaking about “know-how”, “corporations” and “AI”. Attention-grabbing reality in her testimony, is discovering phrases that almost all of all count on to listen to from corporations concerned in creating know-how ; “belief”, “governance” and “assume” what it’s “proper” by way of AI.

We have to maintain corporations accountable in the present day and accountable for AI that they’re deploying…..

Christina Montgomery. US Senate Listening to on Oversight of AI ( 2023)

Gary Marcus in his preliminary assertion mentioned, ‘“I come as a scientist, somebody who’s based AI corporations, and is somebody who genuinely loves AI…” So, for the sake of this NLP evaluation, we’re contemplating him as a illustration of the voice of Academia. Phrases like “want”, “assume”, “know”, “go” , “folks” stand out amongst others. An fascinating reality is that the phrase “system” appears to be repeated greater than “AI” in his testimony. Perhaps AI it’s not a single lone know-how that might change the longer term, the affect on the longer term will come from a number of applied sciences or methods interacting with one another (IoT, robotics, BioTech, and so forth.) quite than relying solely on one in all them.

On the finish, the primary speculation talked about by Senator John Kennedy appears not completely false in spite of everything (not only for Congress however for society as an entire). We’re nonetheless in that stage the place we try to grasp the course AI is heading.

Allow me to share with you three hypotheses that I would really like you to imagine for the second to be true. Speculation primary, many members of Congress don’t perceive synthetic intelligence. Speculation. Quantity two, that absence of understanding might not forestall Congress from plunging in with enthusiasm and attempting to manage this know-how in a approach that might damage this know-how. Speculation quantity three, that I would really like you to imagine there’s possible a berserk wing of the substitute intelligence group that deliberately or unintentionally might use synthetic intelligence to kill all of us and damage us the complete time that we’re dying…..

Sen. John Kennedy (R-LA). US Senate Listening to on Oversight of AI ( 2023)

STEP-05: THE EMOTION BEHIND YOUR WORDS

We’ll use the SentimentIntensityAnalyzer class from the NLTK library for sentiment evaluation. This pre-trained mannequin makes use of a lexicon-based method, the place every phrase within the lexicon (VADER) has a predefined sentiment polarity worth. The sentiment scores of the phrases in a bit of textual content are aggregated to calculate an general sentiment rating. The numerical worth ranges from -1 (detrimental sentiment) to +1 (constructive sentiment), with 0 indicating a impartial sentiment. Optimistic sentiment displays a positive emotion, angle, or enthusiasm, whereas detrimental sentiment conveys an unfavorable emotion or angle.

#************SENTIMENT ANALYSIS************
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.obtain('vader_lexicon')

sid = SentimentIntensityAnalyzer()
dfsenate['Sentiment'] = dfsenate['comment'].apply(lambda x: sid.polarity_scores(x)['compound'])

#************BOXPLOT-GROUP OF INTEREST************
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('white')
plt.determine(figsize=(12, 7))
sns.boxplot(x='Sentiment', y='Group', information=dfsenate, coloration='yellow',
width=0.6, showmeans=True, showfliers=True)

# Customise the axis
def add_cosmetics(title='Sentiment Evaluation Distribution by Group of Curiosity',
xlabel='Sentiment'):
plt.title(title, fontsize=28)
plt.xlabel(xlabel, fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine()

def customize_labels(label):
if "OpenAI" in label:
return label + "-Sam Altman"
elif "IBM" in label:
return label + "-Christina Montgomery"
elif "Academia" in label:
return label + "-Gary Marcus"
else:
return label

# Apply custom-made labels to y-axis
yticks = plt.yticks()[1]
plt.yticks(ticks=plt.yticks()[0], labels=[customize_labels(label.get_text())
for label in yticks])

add_cosmetics()
plt.present()

Listening to on Oversight of AI: Determine 06

A boxplot is all the time fascinating because it reveals the minimal and most values, the median, the primary (Q1) and third (Q3) quartiles. As well as, a line of code was added to show the imply worth. (Acknowledgment to Elena Kosourova for designing the boxplot code template; I solely made changes for my dataset).

General, everybody appeared to be in an excellent temper throughout the Senate Listening to, particularly Sam Altman, who stood out with the very best sentiment rating, adopted by Christina Montgomery. Alternatively, Gary Marcus appeared to have a extra impartial expertise (median round 0.25) and he might have felt considerably uncomfortable at instances, with values near 0 and even detrimental. As well as, Congress as an entire displayed a left-skewed distribution in its sentiment scores, indicating an inclination in the direction of neutrality or positivity. Apparently, if we take a more in-depth look, sure interventions stood out with extraordinarily excessive or low sentiment scores.

Listening to on Oversight of AI: Determine 07

Perhaps we should always interpret the outcomes not as if folks within the Senate AIHearing have been completely satisfied or uncomfortable. Perhaps this recommend that those that take part within the Listening to might not maintain an excessively optimistic view of the place AI is headed, however on the similar time, they aren’t pessimistic both. The scores might point out that there are some issues and are being cautious concerning the course AI ought to take.

And what a few timeline? Did the temper throughout the listening to keep the identical all through? How did the temper of every group of curiosity evolve? To research the timeline, I organized the statements within the order they have been captured and carried out a sentiment evaluation. Since there are over 400 questions or testimonies, I outlined a shifting common of the sentiment scores for every group of curiosity ( Congress, Academia, Non-public) , utilizing a window measurement of 10. Because of this the shifting common is calculated by averaging the sentiment scores over each 10 consecutive statements:

#**************************TIMELINE US SENATE AI HEARING**************************************

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline

# Transferring common for every group
window_size = 10
organizations = dfsenate['Organization'].distinctive()

# Create the road plot
color_palette = sns.color_palette('Set2', len(organizations))

plt.determine(figsize=(12, 6))
for i, org in enumerate(organizations):
df_org = dfsenate[dfsenate['Organization'] == org]

# shifting common
df_org['Sentiment'].fillna(0, inplace=True) # lacking values full of 0
df_org['Moving_Average'] = df_org['Sentiment'].rolling(window=window_size, min_periods=1).imply()

x = np.linspace(df_org.index.min(), df_org.index.max(), 500)
spl = make_interp_spline(df_org.index, df_org['Moving_Average'], ok=3)
y = spl(x)
plt.plot(x, y, linewidth=2, label=f'{org} {window_size}-Level Transferring Common', coloration=color_palette[i])

plt.xlabel('Assertion Quantity', fontsize=12)
plt.ylabel('Sentiment Rating', fontsize=12)
plt.title('Sentiment Rating Evolution throughout the Listening to on Oversight of AI', fontsize=16)
plt.legend(fontsize=12)
plt.grid(coloration='lightgray', linestyle='--', linewidth=0.5)
plt.axhline(0, coloration='black', linewidth=0.5, alpha=0.5)

for org in organizations:
df_org = dfsenate[dfsenate['Organization'] == org]
plt.textual content(df_org.index[-1], df_org['Moving_Average'].iloc[-1], f'{df_org["Moving_Average"].iloc[-1]:.2f}', ha='proper', va='high', fontsize=12, coloration='black')

plt.tight_layout()
plt.present()

Listening to on Oversight of AI: Determine 08

Initially, it appeared just like the session was pleasant and optimistic, with everybody discussing the way forward for AI. However because the session went on, the temper began to alter. The members of Congress grew to become much less optimistic, and their questions grew to become tougher. This affected the panelists’ scores, with some even getting low scores (you possibly can see this in the direction of the top of the session). Apparently, Altman was seen by the mannequin as impartial or barely constructive, even throughout the tense moments with the members of Congress.

It’s essential to keep in mind that the mannequin has its limitations and will border on subjectivity. Whereas sentiment evaluation isn’t flawless, it gives us an fascinating glimpse into the depth of feelings that prevailed on that day in Capitol Hill.

In my view, the teachings behind this US Senate AI Listening to lie within the 5 most repeated phrases: “We want to assume and know the place AI ought to go. It’s noteworthy that phrases like “folks” and “significance” have been unexpectedly current in Sam Altman’s phrase cloud, going past the headline for a “Name for regulation”. Whereas I hoped to seek out extra phrases like “transparency”, “accountability”, “belief”, “governance”, and “equity” in Altman’s NLP evaluation, it was a reduction to seek out a few of them often repeated in Christina Montgomery’s testimony. That is what we’re all anticipating to listen to extra often when AI is on the desk.

Gary Marcus emphasised “system” as a lot as “AI”, maybe inviting us to see Synthetic Intelligence in a broader context. A number of applied sciences are rising proper now, and their mixed affect on society, work, and employment sooner or later will come from the conflict of those a number of applied sciences, not simply from one in all them. Academia performs a significant position in guiding this path, and if some sort of regulation is required.I say this “actually” not “spiritually” (inside joke from the six-month moratorium letter).

Lastly, the phrase “Company” was repeated as a lot as “Regulation” in its totally different types. This implies that the idea of an “Company for AI” and its position will possible be a subject of debate within the close to future. An fascinating reflection on this problem was talked about within the Senate AI Listening to by Sen. Richard Blumenthal:

…Most of my profession has been an enforcement. And I’ll inform you one thing, you possibly can create 10 new businesses, however when you don’t give them the sources, and I’m speaking not nearly {dollars}, I’m speaking about scientific experience, you guys will run circles round ’em. And it isn’t simply the, the fashions or the generative AI that may run fashions round run circles round them, however it’s the scientists in your corporations. For each success story in authorities regulation, you possibly can consider 5 failures…. And I hope our expertise right here can be totally different…

Sen. Richard Blumenthal (D-CT). US Senate Listening to on Oversight of AI ( 2023)

Though reconciling innovation, consciousness, and regulation for me is difficult, I’m all for elevating consciousness about AI’s position in our current and future but additionally understanding that “analysis” and “growth” are various things. The primary one ought to be inspired and promoted, not contained,the second is the place the additional effort within the “considering” and “understanding” is required.

I hope you discovered this NLP evaluation fascinating and I need to thank Justin Hendrix and Tech Coverage Press for permitting me to make use of their transcript on this article. You may entry the whole code on this GitHub repository. (Acknowledgement additionally to ChatGPT for serving to me fine-tune a few of my code for a greater presentation).

Did I miss something? Your options are all the time welcome and hold the dialog going.

[ad_2]

Leave a Comment

Your email address will not be published. Required fields are marked *