Disappointing initial data for typechecking
Friday 2010-02-05 17:44
I got the first results for typechecking just now and they are a bit disappointing.
set_size | prec | prec_var | rec | rec_var | will | will_var |
---|---|---|---|---|---|---|
100 | 0.437379 | 0.718055 | 0.123000 | 0.060220 | 0.279000 | 0.134780 |
150 | 0.416109 | 0.529021 | 0.154000 | 0.082480 | 0.364000 | 0.102080 |
200 | 0.431791 | 0.242908 | 0.197000 | 0.095420 | 0.452000 | 0.164320 |
250 | 0.391103 | 0.287146 | 0.191000 | 0.056380 | 0.495000 | 0.060700 |
300 | 0.425194 | 0.203922 | 0.234000 | 0.087280 | 0.548000 | 0.117120 |
Running GIZA++
Friday 2009-11-20 14:21
I can call giza from within c-phrase now and I think I know which of the files GIZA++ outputs that I'm intrested in. The remaining problem is to parse the output but at least it looks as if it is documented.
Clisp Subjugated
Sunday 2010-02-14 19:46
I seems to have finaly subjugated clisp, I have managed to do a run of 40 items without any failurs. Currently I'm running a test of an improved typechecking and the initial results is very promissing. I guess I will know in a couple of hours.
Schema Extraction Works
Tuesday 2009-12-08 17:36
The schema extraction proccess is now working, however it revaled that the parser was a bit slow and therfore I need to prune out the unneded rules prior to parsing, if I hook into the tokenization proccess I can get only the rules matching used tokens which will be the ones actualy used. I still have to figure out how to do with tokens matching different rules primarily closing pharantesis.
Project Webpage Created
Monday 2009-11-09 13:33
I have now as can be seen created a project webpage whith support for status updates to make the process easy to follow.
Grammar updated to use variables
Thursday 2010-01-07 11:07
I have updated the grammar to use variables, this should make extending WASP to lambda-WASP easier. In doing so I got the system to generate fewer false derivations, if this is singularly good is yet to be determined. A test framework that gives precision and recall would be quite nice in situations like this actually.
Better tests and finally working and stable parallelism
Saturday 2010-01-23 15:04
I have finally gotten a new parallelism framework in place after spending way to much time on it. On the other hand it is general, fault tolerant and much more stable than the previous system.
With that in place I have been able to gather more reliable test results. The following table is the results mean and variances of 8 run sequences of 100,150,200 and 250 items.
Training set size | Precision | Precision variance | Recall | Recall variance | Willingness | Willingness variance |
---|---|---|---|---|---|---|
100 | 0.621 | 0.077 | 0.180 | 0.008 | 0.290 | 0.007 |
150 | 0.614 | 0.122 | 0.290 | 0.038 | 0.470 | 0.020 |
200 | 0.651 | 0.056 | 0.305 | 0.007 | 0.475 | 0.036 |
250 | 0.635 | 0.060 | 0.390 | 0.060 | 0.610 | 0.067 |
Fixed buggs in lambda wasp that caused all aligning sentences to pass through
Tuesday 2010-02-02 16:11
I have been debugin the lambda wasp code to figure out why it was rejecting sentences. It turned out that my rule equvivalence function was producing incorrect results and therfore caused valid rules to be removed from the rule set. By fixing that the system manages to run without discarding any rules exept thouse that are discarded due to bad alignments.
Extract lexicon works!
Monday 2009-11-16 11:28
I figured out how to circumwent the C-Phrase problem, it was probably my sparse understanding of common lisp that caused me to misinterpret the problem. So now I have a function that takes a row separated list of natural language strings and outputs a bilingual lexicon file.
End of week status
Friday 2009-12-04 17:07
Potentialy the entier WASP chain except schema to grammar conversion is almost working. The current problem is that the MR has a different representation i the derivations and in the original parsed form so somthing has to be devised that finds the correct derivation among the alternatives.