How to create GATE plugins

3 minutes read

Table of Content

Introduction

General Architecture for Text Engineering (GATE) is a Java suite of tools originally developed at the University of Sheffield for  all sorts of natural language processing tasks, including information extraction in many languages. It is open source software capable of solving almost any text processing problem .

GATE is a Java suite of tools originally developed at the University of Sheffield for  all sorts of natural language processing tasks, including information extraction in many languages. It is open source software capable of solving almost any text processing problem .

GATE has been compared to NLTK and RapidMiner. It is very extensible framework. Its architecture is based on components (or resources). Its framework functions as a backbone into which users can plug components.</span>

Each component (i.e., a Java Beans), is a reusable chunks of software with well-defined interfaces that may be deployed in a variety of contexts. You can define applications with processing pipelines using these reusable components. In GATE, these resources are officially named CREOLE (Collection of Reusable Objects for Language Engineering).

This creole plugins along with the Gate Framework can be deployed in an User’s custom application. This post shows how to create gate plugins.

You can download the source code from github

CREOLE Resources

GATE components are one of three types:

  1. Language Resources (LRs) represent entities such as lexicons (e.g. Word-Net), corpora or ontologies
  2. Processing Resources (PRs) represent entities that are primarily algorithmic, such as parsers, generators or n-gram modellers
  3. Visual Resources (VRs) represent visualisation and editing components that participate in GUI

To better organize CREOLE resources, CREOLE plugins are used. In other words, resource implementations can be grouped together as ‘plugins’ and stored at a URL.

CREOLE Plugins

To create a CREOLE plugin, you layout its contents in a directory. Within the directory, it can have a jar which holds its resource implementation, a configuration file (i.e., creole.xml), and external resources such as rules, gazetteer lists, schemas, etc in a resources folder and all the plugin dependent libraries in lib folder.  Lets create a GateTutorial Plugin to illustrate the process of plugin creation. This plugin has been created for Gate 7.1 Developer.  This plugin will count the number of occurence of word “vinci” at sentence level and document level. GateCapture- create gate plugins

Plugin creation steps

The gate plugin can be created from the bootstrap wizard from Gate developer -> Tools If you like you can skip this process  get the GatePlugin tutorial source code from  here

bootstrap - create gate plugins

There are several modifications which need to be done in the build.xml :

1. Replace the lines as shown below :

create gate plugins

2.  Add the location of Gate 7.1 installation in gate.home property

  For example : <property name=”gate.home” location=”D:/Installed_Programs/Gate_7.1″ />

3. Create New Project in Eclipse

 If you are working in eclipse, you can create a new project from existing Ant Buildfile.

create gate pluginscreate gate plugins

If you like all the files in your Eclipse workspace, you can delete the content of the project and then copy all the project files from the original location.  

Eclipse may consider “src” as a normal folder. In that case, you can create a new source folder and name it as “src”.

4. Using CREOLE Resources

In the applications using GATE Embedded, you can contruct an information extraction (or IE) pipeline using CREOLE resources from different CREOLE plugins. For example, in the  GatePluginTutorial example , it constructs a pipeline (i.e.,SerialAnalyserController) using three different PRs:

String[] processingResources = {
				"gate.creole.tokeniser.DefaultTokeniser",
				"gate.creole.splitter.SentenceSplitter",
				"com.gate.plugin.vinci.Vinci" };

Two of them are provided by ANNIE plugin and the third one (i.e., com.gate.plugin.vinci.Vinci) is provided by GateTutorial </span>plugin.


// need resource data for Vinci
		Gate.getCreoleRegister().registerDirectories(
				new File(System.getProperty("user.dir")).toURL());
		// need ANNIE plugin for the Defaulttokeniser and SentenceSplitter
		Gate.getCreoleRegister().registerDirectories(
				new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR)
						.toURL());

In the above statements, we use registerDirectories() API to load plugins from a given CREOLE directory URL. Note that CREOLE directory URLs should point to the parent location of the creole.xml file.

When a plugin is loaded into GATE it looks for a configuration file called creole.xml relative to the plugin URL and uses the contents of this file to determine what resources this plugin declares and where to find the classes that implement the resource types (typically these classes are stored in a JAR file in the plugin directory).

The class “com.gate.plugin.vinci.Vinci” in GateTutorialPlugin.jar provides the implementation of the new PR. Because this PR doesn’t need any gazetteer list or rules, it has an empty resources folder. In its creole.xml, the content is as simple as:

<CREOLE-DIRECTORY>
<JAR SCAN="true">GatePluginTutorial.jar</JAR>
</CREOLE-DIRECTORY>

This tells GATE to load GatePluginTutorial.jar and scan its contents looking for resource classes annotated with@CreoleResource. You can download the source code from github

Updated:

Leave a Comment