Talk:Testimonial letter from Eva-Maria Debes

From The Sannyas Wiki
Jump to navigation Jump to search

A number of considerations arose in the creation of this page, having to do with different aspects of process, so i thought it would be good to mention them here. Not only is it a prototype for the translated-Testimonials-without-pdf Template, it is a prototype for these considerations. (Eva-Maria was also the first declared sannyasin, ie in order of Exhibit number, to only use her legal name, so she also became a prototype for that kind of letter-writer fwiw.)

Okay. This is the second Testimonial letter i have cleaned up after the OCR process to create a readable transcript. The first was Testimonial letter from Sw Deva Siddhartha. The issues with that letter were mostly because of a tremendous amount of visual debris, marks on the original letter, likely from a bleed-through from writing on the other side or possibly from improper storage. So many marks that the OCR program had to accept as legitimate and then figure out what text they could possibly be.

That was not the main issue here, though there was a small amount of visual debris. But it is good to mention it here, as it will likely crop up again.

The main source of transcription error here has been likely a faulty typewriter used by Eva-Marie, failing frequently to make decent ü's, cap-I's, g's, cap-F's and more. Compounding this appears to have been an awkward photographic process whereby the left edge of the text has been distorted. Add in some visual debris and hand-corrections of typed material and there are a lot of handicaps for an OCR program to overcome.

The translator had no problem reading Eva-Maria's letter. A program cannot compete with a human fluent in German. Thus, there were few problems transcribing the translation, only the usual stated 1% OCR error rate.

All this detail is in aid of understanding that for various reasons, seriously faulty transcriptions will occur from time to time. It will be good to have procedures to deal with them, perhaps, say, treating them like handwritten. Or it may be that an OCR text-based pdf is not as good as an image pdf in such cases.

Other matters:

1. The first image among Eva-Maria's seven pages is of the "cover page" of the German notary's certification. It is not Eva-Maria's first page of writing. Thus the image we have is not of her first page. This image is selected formulaically by the template. It could be hand-altered to 0027-02 (from 01) but then we also would have to upload 02. Settle for 01?
2. All other-languages letters will have a translation, and that translation will have been done "professionally" ie neatly, easy to read and, it must be said, sympathetically. By that i mean that the translation will have been done carefully by a sannyasin translator to reflect in the most positive way on Osho and his people. The translator may or may not be the one who certifies that it is a "true and correct" translation. And then another sannyasin will be the Oregon-licensed notary who administers the sworn statement of the certifier.
Thus, this will be a smooth process, that produces almost always a decent-quality and readable document. That said, there are instances of non-idiomatic English in the translation and even spelling mistakes. Oh well, makes it more authentic?
3. Not a concern in the wiki, but the last word in Eva-Maria's letter is "Führer", referring to Osho. Since Osho is applying for permanent residency as a religious leader, and "Führer" is a proper translation of "leader", it is accurate and appropriate enough, but it must, especially back then, have given some German writers pause, as the word had serious baggage from being associated with Hitler. Eva-Maria has grasped the nettle, but perhaps many found some other way to say what they wanted to say .... -- doofus-9 04:36, 22 August 2022 (UTC)

major OCR failures

These are not just the 1% or less "acceptable" OCR errors but systemic, global or major failures which can make a letter unreadable, or at least a significant part of it. And they come in a variety of forms, perhaps so much variety that we cannot possibly anticipate it all ...

At any rate, here is what we have so far, with links and descriptions of the issues, mostly fixed:

1 A-17, Siddhartha original letter full of debris, got OCR'd as multiple random characters
2 A-27, Eva-Maria original letter had many faulty characters, from bad typewriter + photo distorted left edge, visual debris and hand corrections. Much more than 1% error rate, unreadable.
3 A-30, Devananda letter's signature overlaid end of last paragraph, losing a few words + many words in last para jumbled, whole words but out of order.
4 A-34, Ojas two smallish things: photo cut bottom half of one line on page three, lots of faulty characters, much more than 1% error, but still readable; plus a "big" one: registration of last line of para on p. 6 jumped up after first word, causing word jumbling and first word gibberish.
5 A-38, Ülo Luuka letter is written, in effect, in two languages, English and German. German has, with its umlauts and other usages, enough to throw up a lot of errors if the program is looking only for English. Perhaps, though the letter was basically in English, the OCR should have been set for German.
6 A-41, Sangharsh this is a new problem, but it will likely crop up often, and that is a script/cursive font that the OCR program has trouble with. There are many such letters, most done at the Ranch. Most of the OCR errors are of a few kinds but enough are different to make any formulaic replacement not very helpful. Perhaps a different OCR setting can help?

You can use two or more languages at the same time for OCRing. Go to language field and choose "more languages", then "specify OCR languages manually" and mark the needed languages.--DhyanAntar 02:14, 28 August 2022 (UTC)