0

I have dozens pdf files where I need to change part of field name one to another.
The problem is that developers use mapping with cyrillic names (russian).
I use itext7 library.
I get field names with GetFieldName(), then SetFieldName().
But field names have wrong encoding in new files.
I tried to use Encoding like this:


    Byte[] newNameBytes = Encoding.GetEncoding(1251).GetBytes(newName);
    string utf8NewName = Encoding.GetEncoding(1200).GetString(newNameBytes);
    textField.SetFieldName(utf8NewName);

Tried different types (UTF8, Unicode, CP1251, default) and nothing.
Everything that I achieved is different kinds of unreadable field names in new file.
I found how to set styles, fonts but it's about text inside field.

I guess iText doesn't recognize any chars except latin...
Any suggestions are welcome.

Old file:

old file

New file with renamed fields:

new file

What I mean:

pdf property

Code I wrote:


    string source = @"C:\Test\old.pdf";
    string destination = @"C:\Test\new.pdf";
    
    // Old and new field name
    var names = new Dictionary<string, string>()
    {
        { "Заемщик>Рабочий телефон", "Заемщик>Телефон организации" },
        { "Заемщик>Компания", "Заемщик>Название организации" },
    };
    
    var document = new PdfDocument(new PdfReader(source), new PdfWriter(destination));
    var form = PdfAcroForm.GetAcroForm(document, false);
    var fields = form.GetFormFields();
    
    foreach (var name in names)
    {
        // Find all names contain same value;
        var fieldsByName = (from f in fields
                            where f.Key.Contains(name.Key)
                            select f).ToList();
    
        foreach (var field in fieldsByName)
        {
            // if multiple fields exist with same name. I don't know how to operate properly kids (child fields).
            if (field.Value is PdfTextFormField textField)
            {
                string oldName = textField.GetFieldName().ToString();
                string newName = oldName.Replace(name.Key, name.Value);
                textField.SetFieldName(newName);
            }
        }
    }
    document.Close();

Amedee Van Gasse
  • 7,280
  • 5
  • 55
  • 101
  • .NET strings are Unicode so trying to "convert" them is meaningless. The only thing the first snippet can do is mangle any character that isn't the same in both codepages. *At best* `utf8NewName` will be identical to `newName`. the Are you sure that *PDF*, the file format, allows non-Latin field names? Or perhaps you need to set the encoding at the file level? – Panagiotis Kanavos Mar 11 '21 at 11:48
  • Looks like [this was asked before](https://stackoverflow.com/questions/27920375/special-characters-in-pdf-form-fields-and-global-and-fieldbased-dr) and the PDF spec is unclear, so the author of iText *added* support for this. The font you use must support the codepage though – Panagiotis Kanavos Mar 11 '21 at 11:52
  • @PanagiotisKanavos It's my fault. I wasn't clear enough. Now i added screenshot how it looks in Acrobat Reader. –  Mar 11 '21 at 13:17

1 Answers1

1

There is a bug in SetFieldName apparently.

textField.SetFieldName("Заемщик>Телефон организации");
textField.GetFieldName(); // returns '05<I8:>"5;5D>= >@30=870F88'

The solution is to use lower level functions from PdfDictionary.

Replace

textField.SetFieldName(newName);

with

textField.GetPdfObject().Put("T", new PdfString(newName, "UnicodeBig"));

or

var aPdfDictionary = textField.GetPdfObject();
var aPdfString = new PdfString(newName, "UnicodeBig");
aPdfDictionary.Put("T", aPdfString);

and

textField.GetFieldName()

to test result.

z0r6a
  • 26
  • 1