0

I`m trying to setup my calibre (calibre-ebook.com) to automatic get data from imported pdf files into library. Usually i name my files this way:

Author. Title. Local. Publisher. Published. ISBN.pdf

Example:

C:\Test\RANCIÊRE, Jacques. O mestre ignorante. Belo Horizonte. Autêntica. 2010. 978-85-7526-045-6.pdf


I`m stuck trying get the first paramenter: Author, using the regex:

([^\\]+)\.

I`m getting this value:

RANCIÊRE, Jacques. O mestre ignorante. Belo Horizonte. Autêntica. 2010. 978-85-7526-045-6


Since regex read from left to right isn`t to stop on first dot (.) from .?

The desired value on this example is:

RANCIÊRE, Jacques

Any hint for the other fields? Example for Title the desired value is:

O mestre ignorante

Thanks in advice!!!

Wisdom
  • 121
  • 1
  • 1
  • 13
  • Is the folder name known? is it always one level down? if not, I suggest to use a more complex regex that will capture Drive (i.e. C) then Path, then the Filename. Then in a separate regex process Filename – Noam Rathaus Nov 20 '13 at 07:44
  • I`m not sure how the program work, in the "Settings" / "Adding Books" there an place where i need inform the full path of one file to test, and set the regular expression, for the Author name the program use this: (?PMy_Regular_Expression) – Wisdom Nov 20 '13 at 07:58

2 Answers2

0

Regex capturing is greedy, meaning it tries to get the largest match as possible. Try the non-greedy version:

([^\\]+?)\.

Note the only difference is the addition of a ?.

Afterwards, you should be able to retrieve the author's name ("RANCIÊRE, Jacques") with just \1.

Kevin Ji
  • 10,479
  • 4
  • 40
  • 63
  • On this example i get 6 Match possible groups: Match 1 RANCIÊRE, Jacques Match 2 O mestre ignorante Match 3 Belo Horizonte Match 4 Autêntica Match 5 2010 Match 6 978-85-7526-045-6 I need get only one value (since i will pass this for an fuction of the calibre program), for the author need be the first match 1: RANCIÊRE, Jacques There any way to select the desired match groups like an array? If the matches group become an array: ["RANCIÊRE, Jacques", "O mestre ignorante", "Belo Horizonte", "Autêntica", "2010", "978-85-7526-045-6"] I`m willing get the: Array[0] Tested on rubular.com – Wisdom Nov 20 '13 at 07:50
0

^.+?\. will get you the C:\Test\RANCIÊRE, Jacques.

it means get the all characters before the first dot.

if you want only RANCIÊRE, Jacques than use:

(?!(.*\\))(.+?\.)

will give you RANCIÊRE, Jacques.

Dmitry Zagorulkin
  • 8,370
  • 4
  • 37
  • 60
  • I see this while was testing many other syntax (i`m using the rubular.com to test and need get only one value / match), but i need get specific data, for the Author in this example the result need be: RANCIÊRE, Jacques – Wisdom Nov 20 '13 at 07:54
  • but if you need split string on few different substring, regexp not will work here. try use simple `String.split()` – Dmitry Zagorulkin Nov 20 '13 at 07:59
  • Yup i see, but i don`t need the "C:\Test\" data, only the Author name ;) – Wisdom Nov 20 '13 at 08:00
  • I`m reading the Python documentation for see how the str.split works ;) – Wisdom Nov 20 '13 at 08:07
  • @Wisdown could you expalain what you really want? i could help you with python – Dmitry Zagorulkin Nov 20 '13 at 08:18
  • I`m willing setup an default regex for import my pdfs to library inside of the program "calibre". Change the test syntax to: "C:\Test\RANCIÊRE, Jacques. O mestre ignorante. Autêntica. 978-85-7526-045-6.pdf" and using this code from @mc10 inside of "calibre": "(?P([^\\]+?)\.)(?P([^\\]+?)\.)(?P<publisher>([^\\]+?)\.)" works pretty well, but if i add one more statement to get the ISBN all previus data get messed, example: (?P<isbn>([^\\]+?)\.)</isbn></publisher> – Wisdom Nov 20 '13 at 08:32