Language

The Free and Open Productivity Suite
Released: Apache OpenOffice 4.1.15

Interview: Carlos Eduardo Dantas de Menezes

-Louis Suárez-Potts

2006-05-17

Working in a large team, Carlos Eduardo Dantas de Menezes and his group recently completed a grammar checker for OpenOffice.org, which the Brazilian OpenOffice.org team has now made available for immediate download. It is a pluggable module written in Perl/Java that can be used from version 1.1.x as well as the 2.0.x versions and works with Brazilian Portuguese.

The project, named CoGrOO (Corretor Gramatical acoplável ao OpenOffice), was developed by Prof. Carlos Menezes' team at USP - Universidade de São Paulo and by the Centro Universitário SENAC-SP--and partly funded by the government. (OpenOffice.org is used by millions in Brazil and is strongly promoted by the public sector.) I would hope that this grammar checker represents the first of many contributions by Brazilian developers, and in part to see the status of OpenOffice.org development in Brazil (and in part to stimulate it), I asked to interview Professor Menezes; he generously agreed, and the interview below was conducted over a period of several days via email.

Tell us about yourself...

I'm a 35 years-old professor; I teach Programming, Artificial Intelligence, and others subjects at two private colleges here in São Paulo, Brazil. I just began to train for Olympic Gymnastics (it's very funny!), and I intend to start my doctoral programme as soon as possible. I have been studying NLP for about 11 years, and OOo gave me opportunities to apply my studies.

How did you get involved in OpenOffice.org? And who is included in your team?

Almost three years ago, I watched a Free Software Meeting, called "Semana de Software Livre no Legislativo". A speech which motivated and inspired me to begin this new FOSS project. I was convinced that a grammar checker was a required to migrate to Free Software, at least here in Brazil. And this area of knowledge, Natural Language Processing (NLP), is challenging and very interesting to study and research.

So, I invited two friends to participate in this idea, Prof. Jorge Kinoshita, expert in NLP, and Prof. Laís do Nascimento Salvador, expert in formal languages and compilers. Prof. Kinoshita, because his experience in NLP, figured like project coordinator.

We submitted our project to "Free Software Edictal/2003" of FINEP (a governmental agency that sponsors strategic projects). There were about 300 projects submitted; about 30 were aproved, including CoGrOO.

I guess it's fair to cite all collaborators of project (I know it's boring, but I want all in the team to receive credit):

How did you develop the grammar checker? And did you consider making it proprietary? It's a fair question: here, at least, a grammar checker is desirable by many companies.

CoGrOO follows traditional client-server architecture: server written in Perl language; listens to a socket port and answers its checker requisitions; client is a short piece of Java code that talks with OOo and sends paragraphs to server. Grammar Checker Server's architecture is very modular: pre-processor, part-of-speech tagging, chunker, grammatical relation finder, error-rules applier (see http://cogroo.incubadora.fapesp.br/portal/down/Doc/LREC_2006.pdf). We intend to deliver 2 versions of CoGrOO: one of them will be free, and the other one, proprietary. Some successful free software projects follows this strategy, like Wine/CodeWeavers, Fedora Core/Red Hat Linux, OpenOffice.org/StarOffice, etc. Another plan: we intend CoGrOO will be the grammar checker standard to OOo. We could port it to English language, for example, if we had volunteers and some resources.

What would be required to port it to English or any other language?

CoGrOO has language-dependent and language-independent parts. For example, to port it to English, we need to build a part-of-speech tagger and a chunker of phrases; in CoGrOO, these parts are statistically trained, so, morpho-syntactic annotated corpora are needed, but these resources are very expensive (see, for example, http://catalog.elda.org:8080/index.php?language=en or http://www.ldc.upenn.edu/Catalog/ ). Maybe someone can say: “Use a free corpus”, but there are few and small corpora (http://www.grsampson.net/Resources.html). CoGrOO detects errors by using a handwritten rule set (Brazilian CoGrOO uses about 100 rules), then, we need linguists to write these rules and check their possible side-effects. Finally, CoGrOO needs sponsor to developers.

As a professor, are you teaching OpenOffice.org coding (how to code for OOo) to your students?

Unfortunately, no. To inspect OOo codes requires great maturity in programming. But, 2 students of mine are CoGrOO's programmers.

Have you considered teaching students how to work on addons, plugins, etc?

No. I'm not sure that it's a good approach. Honestly, I have to analyse this idea better.

If so, how have you organized the students and the course work? For instance, do your students work on the OOo project? If so, how? and if not, why not?

I try to invite good students to participate in our project.

Are they interested?

Yes! These 2 students considered OOo programming challenging and accepted the task.

We recently created a new Education Project; Sophie Gautier and I are the leads, and the project is focused on development. We'd be happy to work with you, though the primary language will have to be English, as it is related to development.

Could you explain this to me better?

Yes. The idea is to track work being done in computer science classes and to help those classes in using OOo as a training/learning vehicle. For instance, one could learn C++ or architecture via OOo; or localization strategies; whatever. We could store the course programme (syllabus) on the site for others to add to, etc.

Good!

In conversation at fisl 6.0 last year, you mentioned that there were very few OpenOffice.org developers in Brazil. Is that still the case? What can be done to encourage more OOo developers?

As I said to you last year, there are many Brazilian collaborators working in the OOo project, and they are mainly doing translations (it's a great job!). But, there are very few programmers. I would guess initiatives like "Free Software Edictal/2003" of FINEP and Google Summer of Code can give a boost to this deficiency.

Yes. Can you speak more about Free Software Edictal/2003?

This Edictal made available a budget of R$ 4 millions (or almost US$ 2 millions, now) for free software projects that were of importance to companies. “Companhia do Metropolitano de São Paulo”, a mass transport company here in São Paulo, was pioneer at StarOffice/OOo migration several years ago. This company was interested in CoGrOO. About 30 projects were approved (out of 300 were submitted). We bought some computers and contracted programmers and a linguist with the funds.

What more can OOo or Google do? Is money the real inducement? or is something else needed?

Unfortunately, I suppose money is essential. There are many interested professionals and students, but they need to earn money to live. Because this inducement, Google is successful.

As well, as you probably know, OOo accepts any number of extensions and these can also be Java extensions. Would that change things?

Maybe. Using UNO interface to extend it can do OOo hacking more popular. But, UNO interface is limited. You can not do everything with it. It's one of CoGrOO's difficulties, because there is none API to draw a wavy line under a word, for example. We need to write the API.

Have you raised this issue with the API project?

No. Many developers and I posted this problem at dev@lingucomponent.openoffice.org. The answer of Sun's guys were about: “..excuse me, we are very busy with releasing of OOo 2.0. Wait for OOo 3.0.” And I agreed with them. There was much work to do! So, we decided we had to do something.

Recently--on 1 May--the ISO (International Organization for Standardization) approved a draft of the OpenDocument format, which OOo uses; it will be ISO/IEC 26300. Many governments will only consider ISO standards and so this approval carries a great deal of weight. Has it affected the perception of OOo yet? Do you think it will?

Not yet. But I guess it will.

President Lula has made strong statements about Livre software (FOSS), as have the presidents of Venzuela and Bolivia and others in South America. But is FOSS something known among developers? How is it conceived?

FOSS is an unknown subject to developers, yet. But it's changing, because universities have given students opportunities to know FOSS's philosofy and practice. We can't ignore the Internet strength to spread FOSS's principles.

Do you have any suggestions how we can better advertise FOSS to university students?

Advertise FOSS to professors (they are opinion makers)! It's what the great companies do!

What can we do to encourage more FOSS development and investment in Brazil?

Share the news of successful projects made in Brazil! We had great projects like Window Maker, Kurumin Linux distro; the maintainer of Linux kernel (version 2.4) is Brazilian, etc. We have to share the news of what is happening in Brazil at events and elsewhere!

Thanks!



Return to Articles

Apache Software Foundation

Copyright & License | Privacy | Contact Us | Donate | Thanks

Apache, OpenOffice, OpenOffice.org and the seagull logo are registered trademarks of The Apache Software Foundation. The Apache feather logo is a trademark of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.