![Databricks](/img/default-banner.jpg)
- Видео 2 956
- Просмотров 16 105 448
Databricks
США
Добавлен 1 июл 2014
Databricks is the Data and AI company. More than 10,000 organizations worldwide - including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 - rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.
LakeFlow Demo
Databricks LakeFlow is a new solution that contains everything you need to build and operate production data pipelines. It includes new native, highly scalable connectors for databases including MySQL, Postgres, SQL Server and Oracle and enterprise applications like Salesforce, Microsoft Dynamics, NetSuite, Workday, ServiceNow and Google Analytics. Users can transform data in batch and streaming using standard SQL and Python.
Learn about Data Engineering: www.databricks.com/solutions/data-engineering
Learn about Data Engineering: www.databricks.com/solutions/data-engineering
Просмотров: 1 665
Видео
Say goodbye to messy JSON headaches with VARIANT
Просмотров 2,2 тыс.22 часа назад
Try it out today on Databricks: docs.databricks.com/en/semi-structured/variant.html Read more about it on our blog: www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark If you're curious about the implementation check out the talk: ruclips.net/video/jtjOfggD4YY/видео.html Or read about it on GitHub: github.com/apache/spark/blob/master/common/variant/README.md
Data Intelligence Day Seoul 2024
Просмотров 514День назад
Data Intelligence Day Seoul, Korea took place on 23 April 2024 and gathered over 1,200 industry leaders and data and AI experts. Watch Data Intelligence Day Seoul On Demand: events.databricks.com/KoreaDIDays2024
An Introduction to DBRX
Просмотров 3,7 тыс.День назад
Learn from Naveen Rao, VP of Generative AI at Databricks, as he explains DBRX, a new, open source foundation model that sets the standard for production quality and price/performance. With up to 3x faster inference, DBRX - outperforms all other open models in quality benchmarks - and that allows enterprises to quickly build your own custom LLM efficiently and with full control. Read more about ...
Demo: How Do I Use DBRX?
Просмотров 1,5 тыс.День назад
Watch how DBRX uses Databricks to build and customize GenAI applications using your own enterprise data Read more about DBRX here: www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms?
What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0
Просмотров 6 тыс.День назад
Reynold Xin, Co-founder and Chief Architect, Databricks shares the latest innovation coming out of the Apache Spark™ open source project including a preview of the anticipated release of Spark 4.0 Speakers: Reynold Xin, Co-founder and Chief Architect, Databricks Tareef Kawaf, President, Posit Sofware, PBC
The Evolution of Delta Lake from Data + AI Summit 2024
Просмотров 1,8 тыс.День назад
Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format. Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - D...
Setting up PAT and Secret Scope
Просмотров 340День назад
Quick video on how to setup a Personal Access Token and Secret Scope and Secret with Azure Key Vault.
Increase your column sizes without rewriting the entire table
Просмотров 721День назад
Docs: docs.databricks.com/en/delta/type-widening.html
Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit
Просмотров 4,4 тыс.День назад
Shant Hovsepian, CTO of Data Warehousing at Databricks announced the biggest Delta Lake release to date, Delta 4.0, during the Data AI Summit 2024 in San Francisco. Speaker: Shant Hovsepian, Chief Technology Officer of Data Warehousing, Databricks
Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data + AI Summit 2024
Просмотров 887День назад
Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Matei Zaharia, Original Creator of Apache Spark™ and MLflow and Chief Technologist at Databricks open sourced Unity Catalog live onstage at the Data AI Summit 2024 in San Francisco.
Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam
Просмотров 7 тыс.День назад
Speaker: Bilal Aslam, Sr. Director of Product Management, Databricks Bilal explains that everything starts with good data and outlines the three steps to good data including, ingesting, transforming and orchestrating your data. Then Bilal announces Databricks LakeFlow - a unified solution for data engineering. With LakeFlow you can ingest data from databases, enterprise apps and cloud sources, ...
Recap of Announcements at Data + AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks
Просмотров 826День назад
Ali Ghodsi, Co-founder and CEO of Databricks closes the 2024 Data AI Summit with a recap of Databricks and open source innovation announced during the 4-day conference in San Francisco. Speaker: Ali Ghodsi, Co-founder and CEO, Databricks @Databricks
Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar
Просмотров 993День назад
Speakers: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Darshana Sivakumar, Staff Product Manager, Databricks Organizations are looking for ways to securely exchange their data and collaborate with external partners to foster data-driven innovations. In the past, organizations had limited data sharing solutions, relinquishing control over how their ...
Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit
Просмотров 327День назад
Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Summary: Data sharing and collaboration are important aspects of the data space. Matei Zaharia explains the evolution of the Databricks data platform to facilitate data sharing and collaboration for customers and their partners. Delta Sharing allows you to share parts of your table with third pa...
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data + AI Summit
Просмотров 863День назад
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data AI Summit
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data + AI Summit 2024
Просмотров 2,6 тыс.День назад
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data AI Summit 2024
Unity Catalog Demo of New Features with Zeashan Pappa at Data + AI Summit 2024
Просмотров 905День назад
Unity Catalog Demo of New Features with Zeashan Pappa at Data AI Summit 2024
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data + AI Summit
Просмотров 324День назад
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data AI Summit
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit
Просмотров 531День назад
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data AI Summit
How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024
Просмотров 3,3 тыс.День назад
How to Make Small Language Models Work. Yejin Choi Presents at Data AI Summit 2024
Data + AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Просмотров 238День назад
Data AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Просмотров 1,2 тыс.День назад
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
Просмотров 1,4 тыс.День назад
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
The Best Data Warehouse is a Lakehouse
Просмотров 4,1 тыс.День назад
The Best Data Warehouse is a Lakehouse
Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics
Просмотров 616День назад
Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics
Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data + AI Summit 2024
Просмотров 16 тыс.День назад
Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data AI Summit 2024
Building an Insights Factory at General Motors - Data + AI Summit 2024
Просмотров 434День назад
Building an Insights Factory at General Motors - Data AI Summit 2024
Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks
Просмотров 37 тыс.День назад
Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks
Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements
Просмотров 323День назад
Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements
懂了,赛博号脉!
Very inspiring! My mind is going att 1000 miles an hour with ideas for our startup and clients from this!
Where can access your code or workbook? Would be nie to run your code.
Holden and team are incredibly engaging and very easy to understand!
Great feature, please also include low code features in order to be more beneficial as Data factory also has for ETL
awesome
Excellent presentation, beginning 3.5-4.0 Billion years ago and explaining all the way to now (AI, non-physical-spatial). Excellent. Thank you. 👏
Who's the speaker?
Holly Smith - FYI it's also me in the comments for my videos so fire away with any technical follow on questions - Holly
@@Databricks Awesome thanks
AI can do everything you need to do in times of studying and understanding AI.
Awesome 👏🏾
🎉
How did parse_json handle schema evolution and from my kowledge, prod table do not recommend parse schema on the fly, it's more safer to define schema first
I agree, but with a lot of JSON data you don't know the schema upfront and so can't define it. It's worth noting this is different from inferring the schema which looks at the first 1000 rows and is brittle to upstream changes - Holly
@@Databricks We used parse_json for dev and exploration purposes as well, thank for the clarification
@@gravenguan No worries! Hope this clarifies for other users too
this is clearly copied from snowflake
Variants in their various forms have been around for many decades. We're big fans of open source so anyone can use the implementation in other projects or products.
That's awesome.
How can I specify the required runtime version when using serverless sql warehouse?
Variant types will be coming to serverless early/mid July, no need to select a runtime - Holly
Only note for these videos, since they're not Shorts, ia that it would be more beneficial to use the full wide (1920 x 1080) format, so it's more readable at all resolutions.
I completely hear you, trying to figure out the best way to film for multiple platforms at once when some define 'short' as <10 mins and RUclips graces me with a mere 60 seconds - Holly
Can't you get the score (ranking score | similarity score) while fetching items from the Vector DB? ..
can someone explain to me, how come you calculate USER embedding when training. And when searching for similar embeddings, you actually get ITEMS embeddings???
Still didn't explain what it is exactly
The shovel company telling you how valuable the gold is
She is creepy because she is not an honest person. She keeps stealing others works and ideas to pretend she is an expert. To make her greater, she belittles others, including her student (5:30).
Is 4.0 a release or preview today?
Insightful!
powerfull
Great video and very very useful! While implementing, I got stuck uploading the pdf to a Volume in the Unity Catalog. I am the "Owner" of my Databricks Workspace and Azure account although I don't seem to have the option to add a Volume to a Catalog and thus don't have the option to add the pdf to a Volume. This seems to have to do with permissions and possibly setting up a metastore between DataBricks and Azure Blob Storage? Might you have any insights, ideas, solutions or workarounds? Thanks again for a great video and all the resources to implement this super useful technology!
Couple of things, you need USE SCHEMA and CREATE VOLUME permissions on the Schema and USE CATALOG on the catalog. Also you need CREATE EXTERNAL VOLUME permissions on the External Location you plan on using for your Volume.
While I appreciate the contributions Databricks's makes to the open source community, *this video is incredibly misleading*. DBRX is *not* the highest production quality open-source model nor the best in price per performance. The graph you showed is incredibly misleading, not least because you compared your models to LLaMa2-70B. No one in their right mind at the time of this video's recording is using LLaMa2-70B. Everyone has moved on to LLaMa3, with many providers even disabling LLaMa2 on their platforms because it is more expensive and less performant than LLaMa3. A fairer comparison would be between DBRX and LLaMa3-70B and LLaMa3-8B. You didn’t show that because DBRX gets roasted in these comparisons. (Your talked about the cost associated with training your LLMs and how the cost has come down substantially. Really, this is an argument that the $10M Mosaic/Databricks have spent on DBRX is already redundant. You guys are losing credibility by posting stuff like this. Databricks does some great work. Don’t tarnish your reputation with borderline fraudulent content like this.
Thanks - for the open sourcing, and for the summit.
I have a question: why we try to research Small-LM just to avoid using GPUs? If we want to save the money for training, we can do the research for how to make GPU or model more effectively, not to avoid using higher techs.
GPU= power consumption
It's not just for GPUs...Small-LM has its own market for on-device or on-edge processing, where there are concerns of privacy and customers would not want their data to go to clouds, and secondly in many industrial use-cases where internet and cloud access isn't accessible due to the remote nature of the use-case, and model inference needs to be done on device...The demand for SLMs is increasing in such use cases...Many big tech companies are not just working on LLMs but also on SLMs under the hood as both of them have to co-exist to cater to different user requirements.
@@tulikabose5120 Thank you, I see. It is useful for small devices with limited calculation hardware and the privacy. That's true. So many LLM need a huge data to train and it should collect people's private info to become stronger. That's hated by most of people.
Boomshakalaka!
:D Show us how to do more complex data transformations than just a simple join you demo-ed and what the actual limitations are (because that's where the reality meets the demo). While you are at it, tell us how to automate (schedule) this pipeline and set up notifications and data quality checks. Next, let us know how to QA that dashboard you let GenAI created (to make sure it's not hallucinating and spitting out bullshit while destroying our firm's reputation), and how to surface it to customers via URL in a secure way (without paying you through our noses). Finally, tell us how much it costs to process GBs of data per month. This is the unbearably condescending demo that assumes the attendees are stupid and don't know what entails in serious, real-world data wrangling. And I know a couple of my clients who are leaving Databricks because they are freaking expensive.
Exactly. I’m tired of these hype machines. Everything is in beta. Customers are the beta testers. Only thing these guys did good is the Unity Catalog. Of course Spark and Delta as well.
Nice one, Darshana.
jensen from china wenzhou
bullshit. He was born in Taiwan
Wow, this is a huge improvement.
wow
2 reactions: - by querying the table with duckdb, the authentication and permission is handled only by Unity Catalog, and not by the underlying storage solution (AWS S3, Azure ADLS, ...). right ? - Applying column masks will only work for hosted compute like the databricks clusters, because querying with a local self hosted compute like DuckDB requires to download the parquet files (containing the PII data) locally then only execute the query... meaning you actually have PII data downloaded on your local machine. right ?
had a laugh, thank you
Examples are not clear about Delta lake ACID properties.
One of my favourite AI legends. Her passion for humanity and how AI can be leveraged to help improve people’s lives is admirable and astounding.
I still can't read Iceberg in Databricks, stop hoping for adoption and just fix that...
And from nothing, a college professor just evolved from the bacteria. Rubbish.
Pls provide notebook. It is not available in dbdemos
As a new RUclipsr you doing very well. He teaching us option trading nicely. Just need to be consistent with this process of trading on binary options...
I followed you tutor.I get stuck at 9.38. I type databricks-bge-large-en as the embedding model but the create button is disable not sure why
You shouldn't have to type it in, rather it should be an option in the drop down. If you go to Serving do you see it listed as a Foundational model?