# A tibble: 6 × 6
erisim_numarasi scan_complete_time scan_start_time slide_count_scanner
<chr> <dttm> <dttm> <int>
1 "\u001dR" 2025-07-08 06:53:08 2025-07-08 06:52:57 1
2 "&86%62" 2025-10-04 05:32:00 2025-10-04 05:31:32 2
3 "&I,_14_9_SS45134… 2025-10-22 05:53:43 2025-10-22 05:53:43 1
4 "&I,_14_9_SS45134… 2025-10-22 05:53:43 2025-10-22 05:53:35 1
5 "&n_12_13_SS45328… 2025-08-13 00:15:43 2025-08-13 00:15:43 1
6 "&n_12_13_SS45328… 2025-08-13 00:15:43 2025-08-13 00:15:19 1
# ℹ 2 more variables: scanner_name_log <chr>, copied_files <chr>
5 Prepare Scanner Data
5.1 Load Scanner Logs
We load the raw scanner logs using the load_scanner_logs function. This function parses the text logs to extract timestamps, actions (START_COPY, END_COPY), and case IDs.
For atypical cases, we use the following approach:
2025-06-03 23:05:59 - INFO - Dosya kopyalanıyor: 28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs
2025-06-03 23:06:23 - INFO - Dosya başarıyla taşındı: I:/28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs
2025-06-03 23:06:24 - INFO - Kaynak dosya silindi: K:\28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs
we get the whole file name from the line that contains “Dosya kopyalanıyor”. 28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs and put it in filename column. we get timestamp from the line that contains “Dosya kopyalanıyor”. 2025-06-03 23:05:59 and put it in scan_finished_time column. we get time from the line that contains “Kaynak dosya silindi”. 2025-06-03 23:06:24 and put it in file_transferred_time column. If “Kaynak dosya silindi” is missing use the timestamp from the line that contains “Dosya kopyalanıyor”. If “Dosya kopyalanıyor” is missing use the timestamp from the line that contains “Dosya başarıyla taşındı”. If “Dosya başarıyla taşındı” is missing use the timestamp from the line that contains “Dosya kopyalanıyor”.
Then we extract erisim_numarasi from filename column.
5.2 Data Cleaning and Summary
We summarize the logs to find the start and end times for each case. We focus on “END_COPY” events as the completion of scanning/transfer. We approximate the start time using the minimum timestamp for the case.
5.3 Column Definitions
The processed scanner summary contains the following columns:
erisim_numarasi: Unique Case ID (e.g., 12345-24).scan_complete_time: The timestamp when the last file in the case finished transferring (“Kaynak dosya silindi”). This represents the time the case was fully available on the network.scan_start_time: The timestamp when the first file in the case started copying (“Dosya kopyalanıyor”). This represents the approximate start of the scanning/transfer process for the case.slide_count_scanner: The total number of files (slides) associated with this case in the logs.scanner_name_log: The name of the scanner that processed the case (e.g., SS7833), extracted from the log file path.copied_files: A semicolon-separated list of all filenames associated with the case.
5.4 Save Processed Data
We save the processed scanner data in both Case Level and Slide Level formats.