07.11.2024

recutils

Очаровательная маленькая база данных на человекочитаемых и -редактируемых текстовых файлах.

https://www.gnu.org/software/recutils/manual/ - мануал
https://www.gnu.org/software/recutils/rec-mode-manual/rec-mode.html - мануал к емаксовому моду.
https://www.gnu.org/software/recutils/rec-mode-manual/recutils.html
https://dev.to/jcolag/recutils-the-plain-text-database-52ma - симпатичный рассказ на английском об основах
https://github.com/dbohdan/structured-text-tools - вбоквеллом, список всяких инструментов для работы со «структурированным текстом», где recutils тоже есть. (вообще, тема структурированного текста у меня как-то не раскрыта вообще :)))

Определённо не sql, а вот называется ли это nosql — мне сложно сказать. :)

Удобна, кроме человекочитаемости, тем, что не надо слишком много придумывать заранее: поля можно добавлять в произвольный момент, добавленное к одной записи поле не обязано образовываться сразу у всех. И прекрасным ощущением простоты.

Может быть не очень удобна по сравнению с документ-ориентированными на json и sexp тем, что и структура тут возможна только очень простая — список полей со значениями и всё тут (табличка, ага). Но если это станет прям надо, ну, можно будет куда-то перебраться, наверное. Правда, пока понятия не имею, куда :)

Явно не подходит для прям больших объёмов инфы — просто не для этого сделана.

Зачем я вообще затеялась. У меня есть некоторое количество инфы, которую с одной стороны хочется хранить вместе с дополнительными сведениями (почты, ссылки, всякое такое — кто эти люди, о чём эти ссылки, как и когда я с этим взаимодействовала, что получилось), а с другой временами получать отобранные по некоторому признаку (который — часть тех дополнительных сведений) списки, просто списки вот этого сохраняемого. И понятно, пока этого всего мало и задача собирания разовая — списки делаются тупо руками. Но когда это повторяющаяся задача, инфа обновляется и дополняется (и там ещё и дублей не надо, даа), списки нужны — ну, раз в несколько месяцев таки да… Мне хочется это всё делать человечнее к себе самой.

Мне нравится возможность держать такую базу как часть оргмодного файла, в src-блоках между текстом, или текстом между блоками, это как ещё посмотреть :) И при желании-необходимости - танглить и выполнять какие там бишь понадобятся операции именно с базой.

recsel -C -i -e "place = 'мск' && school = 'yes'" base.rec

Пример простого запроса к базе с записями одного типа. Отбор по двум полям, из которых одно с текстовым значением, другое - булево. Выводить компактно. Искать, не различая прописные и строчные.

проверка файла

recfix --check base.rec

Требует ручной работы по ошибкам, но в общем, норм.
Жаловался на пустые поля, где должно быть bool (отсутствие такого поля, учитывая, что я не указывала mandatory, не печалило), такшта не надо пустоты :)
Прекрасно помог вычистить дубли.

Начатки о формате

The separator between the field name and the field value is a colon followed by a blank character (space and tabs, but not newlines). The name of the field shall begin in the first column of the line.

A field name is a sequence of alphanumeric characters plus underscores (_), starting with a letter or the character %. The regular expression denoting a field name is: [a-zA-Z%][a-zA-Z0-9_]*. Field names are case-sensitive. Foo and foo are different field names.

The value of a field is a sequence of characters terminated by a single newline character (\n).

It is possible to physically split a logical line by escaping a newline with a backslash character, as in:

LongLine: This is a quite long value \
comprising a single unique logical line \
split in several physical lines.

The sequence \n (newline) + (PLUS) and an optional _ (SPACE) is interpreted as a newline when found in a field value. For example, the C string bar1\nbar2\n bar3 would be encoded in the following way in a field value:

Foo: bar1
+ bar2
+  bar3

Свойства набора записей

Цитата из мануала с моими вставками на русском. И да, очевидно, какими возможностями я не особо пользовалась :))

record descriptors can be used to describe other properties of those records. This can be done by using special fields, which have special names from a predefined set. Consider for example the following database, where record descriptors are used to specify a (optional) numeric ‘Id’ and a mandatory ‘Title’ field:

%rec: Item %type: Id int %mandatory: Title

Id: 10 Title: Notebook

Id: 11 Title: Fountain Pen

Note that the names of special fields always start with the character %. Also note that it is also possible to use non-special fields in a record descriptor, but such fields will have no effect on the described record set.

Every record set must contain one, and only one, field named %rec. It is not mandated that that field must occupy the first position in the record. However, it is considered a good style to place it as the first field in the record set, in order for the casual reader to easily identify the type of the records.

The following list briefly describes the special fields defined in the recutils format, along with references to the sections of this manual describing their usage in depth.

%rec
Naming record types. Also, they allow using external and remote descriptors. See Remote Descriptors.

%mandatory, %allowed and %prohibit
Requiring or forbidding specific fields.

mandatory - если нужно какое-то поле обязательно в каждой записи набора.

allowed нужно тогда, когда хотим ограничить набор возможных полей (тогда разрешены поля, названные в allowed, mandatory и key, любое другое будет ошибкой).

prohibited - когда какие-то поля хочется запретить. Может, привычные опечатки, может зарезервировать на будущее…

%unique and %key
Working with keys.

ключ и есть ключ, обязательный для каждой записи, уникальный среди всех записей этого набора, и единственное поле такого типа на набор. Несколько ключей в наборе быть не может. (А если у меня ещё где нежелательны дубли, ыыыы? ладно, пока, вроде, не мешало.)

unique - только одно такое поле в записи. Условно, в карточке про персону может быть несколько телефонов и даже имен-фамилий, но вряд ли несколько годов рождения. При этом у разных людей в записной книжке год рождения может и совпадать. :))

%doc
описание типа записей, просто текстовая строка.

%typedef and %type
Field types. See Field Types.

%auto
Auto-counters and time-stamps. See Auto-Generated Fields.

%sort
Keeping your record sets sorted. See Sorted Output.

%size
Restricting the size of your database. See Size Constraints.

%constraint
Enforcing arbitrary constraints. See Arbitrary Constraints.

%confidential
Storing confidential information. See Encryption.

Немного о поисках

for example, the expression:

recsel -e 'Email ~ "\\.org"'

matches any record in which there is a field named ‘Email’ whose value terminates in (the literal string) ‘.org’. If we are interested in the value of some specific email, we can specify its relative position in the containing record by using subscripts. Consider, for example:

Email[0] ~ "\\.org"

(которое выберет только записи, где в первом поле Email есть .org)

The boolean operators and (&&), or (||) and not (!) are supported with the same semantics as their C counterparts.

csv2rec (и есть, наоборот, rec2csv)

sv2rec reads the given comma-separated-values file (or the data from standard input if no file is specified) and prints out the converted rec data, if possible. Synopsis:

csv2rec [option]… [csv_file]

In addition to the common options the program accepts the following options.

-t type, –type=type: Type of the converted records. If no type is specified then no type is used.
-s, –strict: Be strict parsing the csv file.
-e, –omit-empty: Omit empty fields.

Common options

Certain options are available in all of these programs. Rather than writing identical descriptions for each of the programs, they are listed here.

–version: Print the version number, then exit successfully.
–help: Print a help message, then exit successfully.
--: Delimit the option list. Later arguments, if any, are treated as operands even if they begin with -. For example, recsel -- -p reads from the file named -p (загадочная ситуация, но лучше и такое предусмотреть, да :)).

foreign keys

https://www.gnu.org/software/recutils/manual/Foreign-Keys.html

Foreign Keys

A better way would be to separate the addresses and people into different record sets. The first record set might look like this:

%rec: Person %type: Dob date %type: Abode rec Residence

Name: Alfred Nebel Dob: 20 April 2010 Email: alf@example.com Abode: 42AbbeterWay

Name: Mandy Nebel Dob: 21 February 1972 Email: mandy@example.com Mobile: 0555 342123 Abode: 42AbbeterWay

Name: Bertram Nebel Dob: 3 January 1966 Email: bert@example.com Abode: 42AbbeterWay

Name: Charles Spencer Dob: 4 July 1997 Email: charlie@example.com Abode: 2SerpeRise

Name: Dirk Spencer Dob: 29 June 1945 Email: dirk@example.com Mobile: 0555 342123 Abode: 2SerpeRise

Name: Ernest Wright Dob: 26 April 1978 Abode: ChezGrampa

and the second (following in the same file), like this:

%rec: Residence %key: Id

Address: 42 Abbeter Way, Inprooving, WORCS Telephone: 01234 5676789 Id: 42AbbeterWay

Address: 2 Serpe Rise, Little Worning, SURREY Telephone: 09876 5432109 Id: 2SerpeRise

Address: 1 Wanter Rise, Greater Inncombe, BUCKS Id: ChezGrampa

Here you can see that there are two record sets viz: Person and Residence. There are six people, but only three residences, because some residences accommodate more than one person. Note also that the Residence descriptor has the entry %key: Id whilst the Person descriptor has %type: Abode rec Residence. This is because Abode is the foreign key which identifies the residence where a person lives.

We could have declared the Id field as %auto. This would have had the advantage that we need not manually update it. However, we decided that the Abode field values in the Person records are better as alphanumeric fields, so that they can contain human readable values. In this way, it is self-evident by reading a Person record where that person lives. Yet since the Id field is declared using the %key special field name, you can be sure that you don’t accidentally reuse an existing key.

И там есть

recsel -t Person -j Abode acquaintances.rec

которое выведет инфу примерно в том виде, как если б к каждой записи первого набора записей добавили соответствующую инфу из второго набора.

И есть

recsel -t Person -j Abode -p Name,Abode_Address acquaintances.rec

которое позволяет получить из такого объединенного набора записей только нужные поля. При этом поля второго набора пишутся как ForeignKey_Поле (Abode_Address в примере).

https://www.gnu.org/software/recutils/manual/Joining-Records.html

remote descriptors

https://www.gnu.org/software/recutils/manual/Remote-Descriptors.html The %rec special field is used for two main purposes: to identify a record as a record descriptor, and to provide a name for the described record set. The synopsis of the usage of the field is the following: %rec: type [url_or_file]. Only one %rec field should be in a record descriptor. External descriptor. External descriptors can be built appending a file path to the %rec field value, like: %rec: FSD_Entry /path/to/file.rec. The previous example indicates that a record descriptor describing the FSD_Entry records shall be read from the file /path/to/file.rec. A record descriptor for FSD_Entry may not exist in the external file. Both relative and absolute paths can be specified there.

здешние регэкспы

Regular Expressions

The character ‘.’ matches any single character except the null character.

‘+’

match one or more occurrences of the previous atom or regexp. ‘?’

match zero or one occurrences of the previous atom or regexp. ‘\+’

matches a ‘+’ ‘\?’

matches a ‘?’.

Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example ‘[z-a]’, are invalid. Within square brackets, ‘\’ is taken literally. Character classes are supported; for example [[:digit:]] matches a single decimal digit.

GNU extensions are supported:

‘\w’

matches a character within a word ‘\W’

matches a character which is not within a word ‘\<’

matches the beginning of a word ‘\>’

matches the end of a word ‘\b’

matches a word boundary ‘\B’

matches characters which are not a word boundary ‘\`’

matches the beginning of the whole input ‘\'’

matches the end of the whole input

Grouping is performed with parentheses ‘()’. An unmatched ‘)’ matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example, ‘\2’ matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis ‘(’.

The alternation operator is ‘|’.

The characters ‘^’ and ‘$’ always represent the beginning and end of a string respectively, except within square brackets. Within brackets, an initial ‘^’ inverts the character class being matched.

‘*’, ‘+’ and ‘?’ are special at any point in a regular expression except the following places, where they are not allowed:

At the beginning of a regular expression After an open-group, ‘(’ After the alternation operator, ‘|’

Intervals are specified by ‘{’ and ‘}’. Invalid intervals such as ‘a{1z’ are not accepted.

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to sub-expressions within groups.

Все посты