johang

Disappointing initial data for typechecking

Friday 2010-02-05 17:44

I got the first results for typechecking just now and they are a bit disappointing.

set_size	prec	prec_var	rec	rec_var	will	will_var
100	0.437379	0.718055	0.123000	0.060220	0.279000	0.134780
150	0.416109	0.529021	0.154000	0.082480	0.364000	0.102080
200	0.431791	0.242908	0.197000	0.095420	0.452000	0.164320
250	0.391103	0.287146	0.191000	0.056380	0.495000	0.060700
300	0.425194	0.203922	0.234000	0.087280	0.548000	0.117120

Hopefully their are some simple fix to the problem but what it could be I dont know of the top of my head.

Running GIZA++

Friday 2009-11-20 14:21

I can call giza from within c-phrase now and I think I know which of the files GIZA++ outputs that I'm intrested in. The remaining problem is to parse the output but at least it looks as if it is documented.

Clisp Subjugated

Sunday 2010-02-14 19:46

I seems to have finaly subjugated clisp, I have managed to do a run of 40 items without any failurs. Currently I'm running a test of an improved typechecking and the initial results is very promissing. I guess I will know in a couple of hours.

Schema Extraction Works

Tuesday 2009-12-08 17:36

The schema extraction proccess is now working, however it revaled that the parser was a bit slow and therfore I need to prune out the unneded rules prior to parsing, if I hook into the tokenization proccess I can get only the rules matching used tokens which will be the ones actualy used. I still have to figure out how to do with tokens matching different rules primarily closing pharantesis.

Project Webpage Created

Monday 2009-11-09 13:33

I have now as can be seen created a project webpage whith support for status updates to make the process easy to follow.

Grammar updated to use variables

Thursday 2010-01-07 11:07

I have updated the grammar to use variables, this should make extending WASP to lambda-WASP easier. In doing so I got the system to generate fewer false derivations, if this is singularly good is yet to be determined. A test framework that gives precision and recall would be quite nice in situations like this actually.

Better tests and finally working and stable parallelism

Saturday 2010-01-23 15:04

I have finally gotten a new parallelism framework in place after spending way to much time on it. On the other hand it is general, fault tolerant and much more stable than the previous system.

With that in place I have been able to gather more reliable test results. The following table is the results mean and variances of 8 run sequences of 100,150,200 and 250 items.

Training set size	Precision	Precision variance	Recall	Recall variance	Willingness	Willingness variance
100	0.621	0.077	0.180	0.008	0.290	0.007
150	0.614	0.122	0.290	0.038	0.470	0.020
200	0.651	0.056	0.305	0.007	0.475	0.036
250	0.635	0.060	0.390	0.060	0.610	0.067

Fixed buggs in lambda wasp that caused all aligning sentences to pass through

Tuesday 2010-02-02 16:11

I have been debugin the lambda wasp code to figure out why it was rejecting sentences. It turned out that my rule equvivalence function was producing incorrect results and therfore caused valid rules to be removed from the rule set. By fixing that the system manages to run without discarding any rules exept thouse that are discarded due to bad alignments.

Extract lexicon works!

Monday 2009-11-16 11:28

I figured out how to circumwent the C-Phrase problem, it was probably my sparse understanding of common lisp that caused me to misinterpret the problem. So now I have a function that takes a row separated list of natural language strings and outputs a bilingual lexicon file.

End of week status

Friday 2009-12-04 17:07

Potentialy the entier WASP chain except schema to grammar conversion is almost working. The current problem is that the MR has a different representation i the derivations and in the original parsed form so somthing has to be devised that finds the correct derivation among the alternatives.

Older

Masters Thesis Project

by Johan Granberg