0

I'm trying to iterate over a VCF file and create vectors with the data to build a DataFrame.

However, the rust compilator is raising an error saying that the borrowed value does not live enough.

I'm cloning the value because, in that case, I'm borrowing the vcf record as immutable and as mutable at the same time.

I don't know how to proceed. Below there are the snippet of my code and the error.

use flate2::read::MultiGzDecoder;
use std::fs::File;
use std::io::BufReader;
mod vcf_reader;
use vcf::{VCFReader, U8Vec, VCFHeaderFilterAlt, VCFError, VCFRecord};

use vcf_reader::reader::VCFSamples;
use polars_core::prelude::*;
use std::time::Instant;
    
fn main() -> Result<()>  {
    let now = Instant::now();
    let mut reader = VCFReader::new(BufReader::new(File::open(
       "C:\\Users\\Desktop\\rust_lectures\\vcf_reader\\vcf_reader\\*.vcf"
    )?)).unwrap();


    // prepare VCFRecord object
    let mut vcf_record = VCFRecord::new(reader.header());

    // read one record
    let mut chromosome = Vec::new();
    let mut position = Vec::new();
    let mut id = Vec::new();
    let mut reference = Vec::new();
    let mut alternative = Vec::new();
    let result: bool = reader.next_record(&mut vcf_record).unwrap();
    loop {
        let row = vcf_record.clone();
        if result == false {
            let df = df!(
                "Chromosome" => chromosome,
                "Position" => position,
                "Id" => id,
                "Reference" => reference,
                "Alternative" => alternative,
        
            );
            println!("{:?}", df);
            let elapsed = now.elapsed();
            println!("Elapsed: {:.2?}", elapsed);
            return Ok(())
        } else {
            
            chromosome.push(String::from_utf8_lossy(&row.chromosome));
            position.push(&row.position);
            id.push(String::from_utf8_lossy(&row.id[0]));
            reference.push(String::from_utf8_lossy(&row.reference));
            alternative.push(String::from_utf8_lossy(&row.alternative[0]));
            
        }
        reader.next_record(&mut vcf_record).unwrap();
        
    }
error[E0597]: `row.chromosome` does not live long enough
  --> src\main.rs:44:53
   |
44 |             chromosome.push(String::from_utf8_lossy(&row.chromosome));
   |             ----------------------------------------^^^^^^^^^^^^^^^^^^^^--
   |             |                                       |
   |             |                                       borrowed value does not live long enough
   |             borrow later used here
...
50 |         }
   |         - `row.chromosome` dropped here while still borrowed
Herohtar
  • 5,347
  • 4
  • 31
  • 41
Evandro Lippert
  • 336
  • 2
  • 11

1 Answers1

1

I'm cloning the value because

The problem is that you're cloning the record but then you're storing references into your arrays. Since those are references to the cloned record, they only live until the end of the block.

So either:

  • move the attributes out of the clone (essentially explode it)
  • or rather than working with a copy of the record, copy individual fields out of the base vcf_record

Either way you're also misusing from_utf8_lossy: it always returns a reference-ish, because it avoids allocating if the input is valid utf8 (in that case it essentially just returns a reference to the original data).

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • How can I use correclty ```from_utf8_lossy```? The vars are U8Vec, and I'd like to convert them to String. – Evandro Lippert May 24 '22 at 13:58
  • 1
    The simplest would be to call [`into_owned()`](https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.into_owned) on the cow, but it's not clear that it's *correct*. If you know for sure that the data is valid, then you can probably use `String::from_utf8` (if you can get a `Vec`) or `std::str::from_utf8` (then copy that to a `String` proper using `to_owned` or `to_string`). – Masklinn May 24 '22 at 14:22
  • If the data may in fact be invalid UTF8 *and* you can have a lot of it, *and* you need to maximise efficiency, *and* you can get an owned `Vec` as input (you can move the data out of the vcf record), then your best bet would be to match the `Cow`, if it's already `Owned` (invalid UTF8 data which got munged) then get that, otherwise call `from_utf8_unchecked` on the original (owned) data. You know it's valid because `from_utf8_lossy` told you, so you can reuse the existing allocation. – Masklinn May 24 '22 at 14:24