The Institute for the Basque Language (IBL) is working on a scientific project aimed at creating an online encyclopaedia of the Basque language: grammar, dictionary, historical information, etc. Some parts of this project are already partially or wholly available online, namely:

  • Contemporary Reference Prose (CRP)
  • Dictionary of Standard Basque in Contemporary Prose (DSBCP)
  • The Lexicon, past and present (LPP)
  • Dictionary of Contemporary Basque (DCB)

In addition to these projects, the IBL has conducted workshops (2002,2005, 2008) on terminology and has compiled a range of materials for teaching university courses in Basque. Also available is a grammar written in English

Given their nature, most of our projects are available in Basque. This explanation in English is intended to give the reader an idea of the type of work undertaken by the Institute.

Goenkale is a Basque TV series that has been aired without interruption since 1994 on the Basque TV channel EBT. Episode number 3,000 was shown in 2010, making Goenkale one of the longest-running series in Europe. This corpus has been designed on the basis of sequences of text used in the series since its inception. The corpus contains the following.

  • Number of episodes: 2,418
  • Sequences: 38,821
  • Number of dialogs: 805,796
  • Number of words: 11,000,000
  • Number of words taken from dialogs: 7,700,000

The main interest of this corpus lies in the dialogs. It is very difficult to find large groups of words that correspond to conversation and dialog. Besides, this series has a very special property: its dialogs, written by specialists in Basque, reflect a natural day-to-day language (as its viewers acknowledge). This is an important corpus, with its almost 8 million words of dialog.

