Data sources and webscraping
This functionnality is built in a python library: https://gitlab.com/vindarel/bookshops
It provides also a shell command line tool.
Where do we get the data of books and CDs from ?
We get the detailed information about books on the internet, where we
can find it. The data is incomplete for a professional bookstore, who
must subscribe to a Dilicom account. It is meant for tests and for individuals.
How to import an ods LibreOffice sheet
It’s on the command line only and is still a work in progress.
The ods (or csv) file can be of these forms:
- it has a row with an “isbn” and “quantity” columns (this is the
easiest and most precise way)
- it has a row containing the name of the columns. In that case, it
must have a “title” column or a “isbn” one.
- it contains only data, it has no row to declare the column names. In
that case, we use a settings.py file to declare them.
In short:
make odsimport odsfile=myfile.ods
This functionnality relies on 2 scripts:
- search/datasources/odslookup/odslookup.py is responsible for
extracting the data from your ods and fetching the data for each
row. It returns a big list of dictionnaries with, supposedly, all
the information we need to register a Card to the database. When it
fetches results it must check if they are accurate. Beware the false
positives !
- scripts/odsimport.py calls the script above and adds everything in
the database. It adds the cards with their quantity, and creates
places, editors and distributors if needed.
There’s more info in them if you want to develop (and want to cache
http requests or store and retrieve a set of results).
The ods file needs at least the following information with the
corresponding english or french label (case is not important):
- the card’s title (“title”, “titre”),
- the publisher (“éditeur”),
- the distributor (will be the publisher by default),
- its discount (“remise”),
- the public price (first column with “price” or “prix” in it) ,
- the quantity (“stock”, “quantité”).
There’s a little test suite:
cd search/datasources/odslookup
make test
Upcoming infos: the category and historical information.
Note
Known limitations:
- the script will include a few false positive results. It
can not make the difference between “a title t.1” and “a
title t.2”.