Digital resources, scientific integrity and artificial intelligence

DIGITAL RESOURCES, SCIENTIFIC INTEGRITY AND ARTIFICIAL INTELLIGENCE

The advent of generative artificial intelligence marks a decisive turning point in research and academic publication practices. In this context of rapid transformation, equitable access to traditional digital resources—electronic libraries, specialized databases, and digitized corpora—emerges as an essential bulwark for maintaining scientific integrity. This relationship, often overlooked in current debates, deserves particular attention as its implications touch the very foundations of knowledge production. Comparative analysis of contrasting institutional contexts allows us to measure its concrete stakes.

Artificial intelligence tools, particularly large language models, now offer an apparently simplified path toward documentary synthesis and scientific writing. However, these technologies present well-documented structural limitations: factual hallucinations, representation biases in training data, and lack of transparency regarding the sources utilized. Faced with these pitfalls, direct access to primary academic resources constitutes the only reliable means of verifying and validating information. A researcher with complete access to JSTOR, Web of Science, or institutional archives can confront AI-generated claims with original publications, thus preserving the methodological rigor that characterizes scientific inquiry.

Ivy League universities illustrate a model of documentary saturation where this verification becomes systematic. Harvard or Yale, with their library budgets often exceeding one hundred million dollars annually, offer virtually unlimited access to specialized databases, digitized historical archives, and emerging publication platforms. Their researchers also benefit from sophisticated library support services to navigate these resources. In this context of abundance, generative AI becomes a complementary tool rather than a substitute, enabling initial exploration that is quickly confronted with primary sources.

The Quebec university system presents a substantially different reality. Although the Consortium of Quebec University Libraries (CREPUQ) has historically enabled resource pooling, budgetary constraints impose strategic choices. The University of Montreal or Laval University certainly have respectable documentary infrastructures, but their researchers face limitations in accessing certain costly databases or specialized linguistic corpora. This intermediate situation creates a potentially increased dependence on generative tools to fill access gaps, thus heightening risks to scientific integrity when systematic verification becomes materially difficult.

This asymmetry reveals a digital divide with troubling epistemological consequences. While richly endowed institutions can maintain high verification standards, underfunded academic communities risk increased dependence on generative tools whose reliability remains uncertain. Scientific integrity, traditionally guaranteed by shared methodological protocols, thus finds itself potentially compromised by infrastructural access inequalities. This qualitative stratification of academic production threatens the universality of scientific norms and could create an implicit hierarchization of academic credibility according to institutional origin.

Scientific integrity also rests on complete source traceability and research reproducibility. Academic digital resources, with their persistent identifiers and standardized metadata, guarantee this traceability in a way that generative AI cannot currently ensure. In contexts where documentary access remains robust, researchers can build arguments solidly anchored in existing literature and document their intellectual borrowings with precision.

The comparison between institutional models reveals that investment in universal access to academic digital resources represents less an expense than a necessary condition for preserving scientific integrity in the face of challenges posed by artificial intelligence.

Références

Ahari, J. (2024, 23 avril). Generative AI and Scholarly Publishing. Ithaka S+R (blog). https://sr.ithaka.org/blog/generative-ai-and-scholarly-publishing/.

Akhtar, M. et al. (2024). Croissant: A Metadata Format for ML-Ready Datasets. Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning, 1‑6. DEEM ’24. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3650203.3663326.

Asubiaro, T., Onaolapo, S. & Mills, D. (2024). Regional disparities in Web of Science and Scopus journal coverage. Scientometrics, 129 (3), 1469–1491. https://doi.org/10.1007/s11192-024-04948-x.

Azeroual, O. & Schöpfel, J. (2025). New Developments in Research Data Management - The Potential of AI. Dans D. Baker, L. Ellis (éds.). Encyclopedia of Libraries, Librarianship, and Information Science, p. 206‑211. Oxford Academic Press. https://doi.org/10.1016/B978-0-323-95689-5.00253-4.

Bergstrom, T. et al. (2024). The Second Digital Transformation of Scholarly Publishing: Strategic Context and Shared Infrastructure. Ithaka S+R. https://doi.org/10.18665/sr.320210.

Bishop, B. (2023, 21 août). AI and New Standards Promise to Make Scientific Data More Useful by Making It Reusable and Accessible. The Conversation. http://theconversation.com/ai-and-new-standards-promise-to-make-scientific-data-more-useful-by-making-it-reusable-and-accessible-211080.

Chubb, J., Cowling, P. et Reed, D. (2022). Speeding up to keep up: exploring the use of AI in the research process. AI & SOCIETY, 37 (4), 1439‑1457. https://doi.org/10.1007/s00146-021-01259-0.

Council of Atlantic Academic Libraries (2025). Data Cleaning in (early) 2025: Feasibility of AI Tools. https://www.youtube.com/watch?v=A5x3jVV5UdY.

Directorate-General for Research and Innovation (2024). Living Guidelines on the Responsible Use of Generative AI in Research (Version 1). European Commission. https://research-and-innovation.ec.europa.eu/document/download/2b6cf7e5-36ac-41cb-aab5-0d32050143dc_en?filename=ec_rtd_ai-guidelines.pdf.

Dobrin, S. (2023). Talking about Generative AI: A Guide for Educators. Broadview Press. https://sites.broadviewpress.com/ai/talking/.

Finnegan, M.-K. (2024). Research Guides: Data Management & Sharing : Generative Artificial Intelligence (AI) and Research Data Management (RDM). https://csus.libguides.com/RDM/AI.

Gaillard, V. (2022, Sept.). Encouraging/Supporting Sustainability in the Diamond Action Plan Community. Presented at the 2022 Diamond Open Access Conference. https://www.scienceeurope.org/media/yg3ho4tp/doa-conf-vinciane-gaillard.pdf

Google Cloud Tech (2023). Introduction to Generative AI. https://www.youtube.com/watch?v=G2fqAlgmoPo.

Groupe spécial d’experts externes sur l’IA générative (2024, 12 janvier). Avis du groupe spécial d’experts externes sur l’IA générative. Innovation, Sciences et Développement économique Canada. https://science.gc.ca/site/science/fr/financement-interorganismes-recherche/politiques-lignes-directrices/lutilisation-lintelligence-artificielle-generative-dans-lelaboration-levaluation-propositions/avis-groupe-special-dexperts-externes-lia-generative.

Huerta, E. A. et al. (2023). FAIR for AI: An Interdisciplinary and International Community Building Perspective. Scientific Data, 10(1), 487. https://doi.org/10.1038/s41597-023-02298-6.

Lawrence, N. et Montgomery, J. (2024). Accelerating AI for science: open data science for science. Royal Society Open Science, 11(8), 231130. https://doi.org/10.1098/rsos.231130.

Lehtiö, L. (s. d.). UTUGuides: Librarian’s Guide to Artificial Intelligence: AI in Research and Research Data Management. https://utuguides.fi/c.php?g=712454&p=5147020.

Ping, H., Stoyanovich, J. et Howe, B. (2017). DataSynthesizer: Privacy-Preserving Synthetic Datasets. Proceedings of the 29th International Conference on Scientific and Statistical Database Management, 1‑5. SSDBM’17. Association for Computing Machinery. https://doi.org/10.1145/3085504.3091117.

Rekatsinas, T. et al. (2019). Opportunities for data management research in the era of horizontal AI/ML. Proceedings of the VLDB Endowment, 12(12), 2323‑2324. https://doi.org/10.14778/3352063.3352149.

Semeler, A. et al. (2024). ALGORITHMIC LITERACY: Generative Artificial Intelligence Technologies for Data Librarians. ICST Transactions on Scalable Information Systems, 11(2). https://doi.org/10.4108/eetsis.4067.

Shen, C. et Ball, J. (2024. 6 june). DOAJ’s Role in Supporting Trust in Scholarly Journals: Current Challenges and Future Solutions. The Scholarly Kitchen (blog). https://scholarlykitchen.sspnet.org/2024/06/06/guest-post-doajs-role-in-supporting-trust-in-scholarly-journals-current-challenges-and-future-solutions/.

Srivastava, A. (2023). Transformative Data Management Technique: Redefining Artificial Intelligence (AI). Management Insight, 19(01), 59‑70. https://doi.org/10.21844/mijia.19.1.6.

Stern, B. et al. (2023). Towards Responsible Publishing : Seeking Input from the Research Community to a Draft Proposal from cOAlition S. sOApbox: A Plan S Blog (blog). https://doi.org/10.5281/ZENODO.8398480.

Stern, B. et Rooryck, J. (2023, 31 octobre). Introducing the ‘Towards Responsible Publishing’ Proposal from cOAlition S/Plan S. sOApbox: A Plan S Blog (blog). https://www.coalition-s.org/blog/introducing-the-towards-responsible-publishing-proposal-from-coalition-s/.

Wilkinson, M. et al. (2016). The FAIR Guiding Principles for Scientific Data Management and Stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18.