150,000 word mark passed

September 22, 2020

Keywords: thanks, progress, acknowledgements, update

Students of my introductory course "From Text to Linguistic Evidence," autumn semester 2022, have managed to add more than 10 new transcripts to the corpus. Their hard work has increased its total word count to 152,304 tokens, or more than 12 hours of speech - a very substantial increase. Among their additions are commentaries on the Ukraine War, an interview with Harper Lee, author of "To Kill a Mockingbird," and a talk by climate scientist Roger Revelle.
I would like to thank every student for their contribution, commitment and professionalism. I wish everybody all the best for the future!

Second UG Scholars project over

June 30, 2022

Keywords: progress, research, new texts, update

This year's undegraduate scholars Sadie Barlow and Xiaohui (Iris) Chen have added another 10,000 words in four new transcripts to the spoken corpus. Thank you!
They then investigated variation between 'very' and 'really' in the corpus. Below, you can find a video of their competent and insightful final presentation. Have fun with it :)

The Spoken Corpus is one today!

September 1, 2021

Keywords: progress, milestone, journal, update

The website is now one year old! In this time, the corpus has grown to over 100,000 word tokens of almost 8 hours of transcribed speech! That's good growth. Happy birthday and here is to hoping that the second year will be as productive as the first!

UG Scholars project concludes successfully

June 23, 2021

Keywords: progress, new texts, update, thanks

Three students have worked on this year's Undergraduate Scholars program. They are: Wenjing Shi, Brandon Cox and Jamie-Lea Carter. They have made fantastically accurate transcripts and in so doing, added many thousands of words to the corpus. Many thanks to their hard work and dedication!
You can watch the final presentation of their research outcomes below. Enjoy!

UG Scholars to enlarge the corpus

February 11, 2021

Keywords: progress, new texts, update, thanks

The UG Scholars program gives students an opportunity to engage in actual research. Three amazing students have volunteered with this program to help increase the size of the Student-Transcribed Corpus of Spoken American English. They will learn how to make transcripts for the corpus, cut up audio-clips, annotate the files in XML and conduct research on spoken language. Their contribution will be greatly appreciated!

The Undergraduate Scholars Program is an opportunity for students to engage in real research and gain extra-curricular experience for their CV

Thank you to all students!

December 17, 2020

Keywords: progress, new texts, thanks

'Tis done! The first cohort of students working on the corpus have finished their semester. They jointly added more than 60,000 new word tokens, quite an amazing feat. I'm very impressed by their dedication and attention to detail. Thanks to everybody who contributed to this project. Well done, guys!

Divide and conquer!

October 14, 2020

Keywords: set-up, programming, functionality

Get this - the results table is now sortable! I shamelessly stole some code from the internets to implement this functionality. Furthermore, the number of hits to be displayed can now be determined by the user. The programming is a bit sloppy perhaps; every time you click on "Next hits", all the hits will be searched again. This could be an issue down the line if and when the corpus becomes larger, but for now, sorting and cutting up the results seems to be working fine ...

A logo for the Spoken Corpus

September 29, 2020

Keywords: design, graphics, layout

The corpus now has a logo - a speech bubble to represent its spoken mode filled with the Star-Spangled Banner because it includes speech from native speakers of American English. The logo is also used as a favicon next to the page's title in the browser tabs.

The new logo of the Student-Transcribed Corpus of Spoken American English

The first 10,000 words

September 22, 2020

Keywords: milestone, progress, journal

The Student-Transcribed Corpus of Spoken American English has reached its first milestone - it now includes 10,000 words of transcribed speech (including disfluencies and punctuations). Let's hope it will keep growing at a brisk and steady pace.

Hello World!

September 1, 2020

Keywords: milestone, progress, journal

Happy birthday, Student-Transcribed Corpus of Spoken American English! Your domain has been bought, your website has been set up, a search interface has been programmed. You're off to a good start :-)

Who knows, really, what this project might sometimes be used for? Hopefully it will fulfill some sort of purpose.

“The two most important days in your life are the day you are born and the day you find out why.” - Mark Twain

So, here is to hoping that you will grow nicely and become big, that lots of people around the world find you interesting and useful, for research, for teaching, or just for fun, and that the internet gods will always smile on you and spare you from bugs and crashes. Cheers :)