The fight with clisp

Saturday 2010-02-13 14:46

I have now spent several days fighting chrash bugs in clisp, the source of the problem is that when my program wants to use a bit more memory clisp runs in to some hardcoded limits. Trying to increas thouse limits causes clisp to segfault and or corupt its memory on some operations so a lot of effort has been spent trying to work around diffrent cases of chrashes in different combinations.

Currently I have the system in such a state that it consistently reaches a certain point (after that it chrashes due to corupted memory) and thus my plan is to run the system to that point, save the results to disk, restart clisp and continue. I hope that will yield a stable system that I can use to acctualy extract som training statistics.

Lexical aqusition

Friday 2009-11-27 14:20

This should have been written Thuesday but here I go. Lexical aqusition seems to be working now. If I see it correctly I now have the chain from a bilingual corpus to a infered SCFG grammar (Thou by using a handcoded MR grammar). Next steps should be either to generate a MR grammar from database schemas or to continue the work by implementing the statistical model to disambiguiate parses.

Writing comming along

Wednesday 2010-03-10 17:25

The writing is comming alogn, not as fasi as I would like but I'm at 27ish pages excluding tabbel of contents and simmilar. The basic plan now is to flesh out the background section as soon as possible and to get som results on a larger corpus. The later of which has run into minor technical problems (I hope they are minor at least).

Corpus Cleanliness

Tuesday 2010-03-02 09:41

When experimenting with the corpus it turns out that the algorithm is hurt quite badly by a dirty corpus, that is a corpus that has wrong or ambigous translations included. Given this experiments with a cleaner corpus shows us musch improved results.

Lambda WASP reaches no known bugs state

Thursday 2010-01-28 21:36

It seams like I have hunted down the worst bugs in lambda-WASP by now. Any bugs that remain does not seem to affect stability at least. Hopefully I will have som testresults when comming into the office tomorow morning. That would meen that all 80 testsets compleated cleanly.

Parsing Halfway Done

Wednesday 2009-11-18 16:21

I got code that parses a bilingual lexicon file and parses it into a list of pairs of natural language strings and tokenized MR strings. The idea is that Mikes chart parser will be able to take this form and turn it into a list of rules to apply in order to recreate the parsed string.

Figuring out how to run GIZA++

Thursday 2009-11-19 18:05

I have been looking at how to run GIZA++ for most of the day. I think I can output correct input data files, now I have to figure out how to call the program and how to parse the output.

Integrated bzr

Thursday 2009-11-12 16:25

I realised that I needed a way to commit local copies of c-phrase and am now going to try to use bzr-svn for that. I have the system setup so no we will see how it works out.

Concluding Update

Thursday 2010-09-30 09:54

Final updates to the web page making clear that the master thesis is finished.

Minor webpage Improvment

Monday 2009-11-30 11:18

I made a minor change to the project webpage to implement paging of the status items. Now at most ten per page will be shown and links to newer older posts will be displayed as apropriate.