News
Undergraduate Scholars #4 and a student intern
July 30, 2024
The Student-Transcribed Corpus of Spoken American English has been extended this year by five participants of Undergraduate Scholars Programme - Yanran Xiong, Bomiao Zhang, Jingwen Zhang, Xuanzi Zhao and Zixun Lin - and a student intern - Sophie Baratta. Most importantly, they added transcripts from a few older speakers born before 1940, such as playwright Arthur Miller or first lady Eleanor Roosevelt. The hard work of the transcribers has propelled the corpus to a size of over 200,000 words now. I am grateful to everybody involved. Thank you!
Slang Words and the Adolescent Peak
August 10, 2023
Work on the Student-Transcribed Corpus of Spoken American English is continuing. This year, seven undergraduate students enlarged the corpus with new transcripts during the Undergraduate Scholars Programme 2023. They were: Reet K Maur, Yan Lee, Georgina Willoughby, Yanhao Li, Kriti Mehrota, Faatima Adam and Grace Carrier. They managed to add no less than 24,000 words. I am very grateful for their hard work and wish them all the best for the future!
A video of the students' final showcase presentation is now available. They first talk about the creation of a professional spoken corpus. After that, they present an investigation into colloquial expressions and whether they show a rise and fall pattern known as "the adolescent peak."
You can watch the presentation here or watch it directly below:
150,000 word mark passed
September 22, 2022
Students of my introductory course "From Text to Linguistic Evidence," autumn semester 2022, have managed to add more than 10 new transcripts to the corpus. Their hard work has increased its total word count to 152,304 tokens, or more than 12 hours of speech - a very substantial increase. Among their additions are commentaries on the Ukraine War, an interview with Harper Lee, author of "To Kill a Mockingbird," and a talk by climate scientist Roger Revelle.
I would like to thank every student for their contribution, commitment and professionalism. I wish everybody all the best for the future!
Second UG Scholars project over
June 30, 2022
This year's undegraduate scholars Sadie Barlow and Xiaohui (Iris) Chen have added another 10,000 words in four new transcripts to the spoken corpus. Thank you!
They then investigated variation between 'very' and 'really' in the corpus. Below, you can find a video of their competent and insightful final presentation. Have fun with it :)
The Spoken Corpus is one today!
September 1, 2021
UG Scholars project concludes successfully
June 23, 2021
Three students have worked on this year's Undergraduate Scholars program. They are: Wenjing Shi, Brandon Cox and Jamie-Lea Carter. They have made fantastically accurate transcripts and in so doing, added many thousands of words to the corpus. Many thanks to their hard work and dedication!
You can watch the final presentation of their research outcomes below. Enjoy!
UG Scholars to enlarge the corpus
February 11, 2021
The UG Scholars program gives students an opportunity to engage in actual research. Three amazing students have volunteered with this program to help increase the size of the Student-Transcribed Corpus of Spoken American English. They will learn how to make transcripts for the corpus, cut up audio-clips, annotate the files in XML and conduct research on spoken language. Their contribution will be greatly appreciated!
The Undergraduate Scholars Program is an opportunity for students to engage in real research and gain extra-curricular experience for their CV
Thank you to all students!
December 17, 2020
'Tis done! The first cohort of students working on the corpus have finished their semester. They jointly added more than 60,000 new word tokens, quite an amazing feat. I'm very impressed by their dedication and attention to detail. Thanks to everybody who contributed to this project. Well done, guys!
Divide and conquer!
October 14, 2020
Get this - the results table is now sortable! I shamelessly stole some code from the internets to implement this functionality. Furthermore, the number of hits to be displayed can now be determined by the user. The programming is a bit sloppy perhaps; every time you click on "Next hits", all the hits will be searched again. This could be an issue down the line if and when the corpus becomes larger, but for now, sorting and cutting up the results seems to be working fine ...
A logo for the Spoken Corpus
September 29, 2020
The corpus now has a logo - a speech bubble to represent its spoken mode filled with the Star-Spangled Banner because it includes speech from native speakers of American English. The logo is also used as a favicon next to the page's title in the browser tabs.
The new logo of the Student-Transcribed Corpus of Spoken American English
The first 10,000 words
September 22, 2020
The Student-Transcribed Corpus of Spoken American English has reached its first milestone - it now includes 10,000 words of transcribed speech (including disfluencies and punctuations). Let's hope it will keep growing at a brisk and steady pace.
Hello World!
September 1, 2020
Happy birthday, Student-Transcribed Corpus of Spoken American English! Your domain has been bought, your website has been set up, a search interface has been programmed. You're off to a good start :-)
Who knows, really, what this project might sometimes be used for? Hopefully it will fulfill some sort of purpose.
“The two most important days in your life are the day you are born and the day you find out why.” - Mark Twain
So, here is to hoping that you will grow nicely and become big, that lots of people around the world find you interesting and useful, for research, for teaching, or just for fun, and that the internet gods will always smile on you and spare you from bugs and crashes. Cheers :)